Use gprof to check your codes for performance issues
August 29th, 2007 mysurface Posted in Developer, gcc, gprof | Hits: 18550 |
By reading the article Speed your code with the GNU profiler from IBM DevelopWorks, I have gain the knowledge of using gprof to easy my work to identify my module’s performance’s bottleneck. Here, I would like to share my experience on how I discover the clog of my codes.
Let us first look at the simple steps on how GNU profiler works.
In order to make use of gprof, the c/c++ codes must be compiled by gcc with -pg options. Assume the source code to be compiled is gp-test.c.
gcc -pg -g2 -o gp-test{,.c}
-pg is to enable gprof, -g2 is to enable debugging mode 2, -o is to specified the output of the binaries and I am using the curly brackets to shorten my typing.
Next, run the binaries and gmon.out will be generated.
./gp-test
With gmon.out, now you can extract the profiling info of your codes by running gprof.
gprof gp-test gmon.out > result.txt
I like to save the results to a text file ‘result.txt’ for further comparison and analysis.
Lets look at a sample c code, and try to catch the choke point.
#include<stdio.h>
int twoD[10000][10000]={0};
int update_d1()
{
int i,k=0;
for (i=0;i<10000;i++)
twoD[i][1]=k++;
}
int update_d2()
{
int i,k=0;
for (i=0;i<10000;i++)
twoD[1][i]=k++;
}
int main(int argc, char * argv[])
{
int i,j,k=0;
if (argc!=2)
return -1;
if (*(argv[1])=='1')
update_d1();
else if (*(argv[1])=='2')
update_d2();
else
printf("\nInvalid value %s\n",argv[1]);
return 1;
}
Both function update_d1() and update_d2() are accessing the 2D array with same amount of loops. Assume the 3D array twoD[row][column], update_d1() accessing row, where update_d2() accessing column. We discovered that the amount of time used to complete the function are in great differences. Lets compile and profile it with gprof.
gcc -pg -g -o gp-test{,.c}
./gp-test 1
gprof gp-test gmon.out > t1
./gp-test 2
gprof gp-test gmon.out > t2
Observed the extracted results
using update_d1() :
% cumulative self self total
time seconds seconds calls ms/call ms/call name
100.52 0.06 0.06 1 60.31 60.31 update_d1
using update_d2() :
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 1 0.00 0.00 update_d2
update_d1() uses 0.06 seconds, and update_d2() uses less than 0.01 seconds, Why?
Look at the 2D array again, twoD[row][column]. The twoD array is physically map to large one chunk of memory instead of rows and columns. The first block of memory is begins with row 0 column 1, the first column of row 1 is actually located at 10001th block.
Imagine how update_d1() accessing the memory. By accessing each row, it has to leap over 10000 blocks, where update_d2() consequently access 10000 blocks without leaping. Thats the reason of the delays.
Live Chat!









October 18th, 2007 at 7:01 pm
simple and good example for gprof command
October 18th, 2007 at 7:04 pm
gud work……
October 18th, 2007 at 7:32 pm
great tip. I like the idea that you can call mount your root partition with simply: LABEL=Root too
October 18th, 2007 at 7:33 pm
i like your blog, continue
February 27th, 2008 at 7:01 am
Nice! Now that I know WHY I keep finding a gmon.out in my home directory and what it IS, I just need to figure out what the heck binary is making it :)
Thank you for your site; it has helped me quickly and concisely.