You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
521 lines
18 KiB
521 lines
18 KiB
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> |
|
<HTML> |
|
|
|
<HEAD> |
|
<link rel="stylesheet" href="designstyle.css"> |
|
<title>Gperftools CPU Profiler</title> |
|
</HEAD> |
|
|
|
<BODY> |
|
|
|
<p align=right> |
|
<i>Last modified |
|
<script type=text/javascript> |
|
var lm = new Date(document.lastModified); |
|
document.write(lm.toDateString()); |
|
</script></i> |
|
</p> |
|
|
|
<p>This is the CPU profiler we use at Google. There are three parts |
|
to using it: linking the library into an application, running the |
|
code, and analyzing the output.</p> |
|
|
|
<p>On the off-chance that you should need to understand it, the CPU |
|
profiler data file format is documented separately, |
|
<a href="cpuprofile-fileformat.html">here</a>. |
|
|
|
|
|
<H1>Linking in the Library</H1> |
|
|
|
<p>To install the CPU profiler into your executable, add |
|
<code>-lprofiler</code> to the link-time step for your executable. |
|
(It's also probably possible to add in the profiler at run-time using |
|
<code>LD_PRELOAD</code>, e.g. |
|
<code>% env LD_PRELOAD="/usr/lib/libprofiler.so" <binary></code>, |
|
but this isn't necessarily recommended.)</p> |
|
|
|
<p>This does <i>not</i> turn on CPU profiling; it just inserts the |
|
code. For that reason, it's practical to just always link |
|
<code>-lprofiler</code> into a binary while developing; that's what we |
|
do at Google. (However, since any user can turn on the profiler by |
|
setting an environment variable, it's not necessarily recommended to |
|
install profiler-linked binaries into a production, running |
|
system.)</p> |
|
|
|
|
|
<H1>Running the Code</H1> |
|
|
|
<p>There are several alternatives to actually turn on CPU profiling |
|
for a given run of an executable:</p> |
|
|
|
<ol> |
|
<li> <p>Define the environment variable CPUPROFILE to the filename |
|
to dump the profile to. For instance, if you had a version of |
|
<code>/bin/ls</code> that had been linked against libprofiler, |
|
you could run:</p> |
|
<pre>% env CPUPROFILE=ls.prof /bin/ls</pre> |
|
|
|
<li> <p>In your code, bracket the code you want profiled in calls to |
|
<code>ProfilerStart()</code> and <code>ProfilerStop()</code>. |
|
(These functions are declared in <code><gperftools/profiler.h></code>.) |
|
<code>ProfilerStart()</code> will take |
|
the profile-filename as an argument.</p> |
|
</ol> |
|
|
|
<p>In Linux 2.6 and above, profiling works correctly with threads, |
|
automatically profiling all threads. In Linux 2.4, profiling only |
|
profiles the main thread (due to a kernel bug involving itimers and |
|
threads). Profiling works correctly with sub-processes: each child |
|
process gets its own profile with its own name (generated by combining |
|
CPUPROFILE with the child's process id).</p> |
|
|
|
<p>For security reasons, CPU profiling will not write to a file -- and |
|
is thus not usable -- for setuid programs.</p> |
|
|
|
<p>See the include-file <code>gperftools/profiler.h</code> for |
|
advanced-use functions, including <code>ProfilerFlush()</code> and |
|
<code>ProfilerStartWithOptions()</code>.</p> |
|
|
|
|
|
<H2>Modifying Runtime Behavior</H2> |
|
|
|
<p>You can more finely control the behavior of the CPU profiler via |
|
environment variables.</p> |
|
|
|
<table frame=box rules=sides cellpadding=5 width=100%> |
|
|
|
<tr valign=top> |
|
<td><code>CPUPROFILE_FREQUENCY=<i>x</i></code></td> |
|
<td>default: 100</td> |
|
<td> |
|
How many interrupts/second the cpu-profiler samples. |
|
</td> |
|
</tr> |
|
|
|
<tr valign=top> |
|
<td><code>CPUPROFILE_REALTIME=1</code></td> |
|
<td>default: [not set]</td> |
|
<td> |
|
If set to any value (including 0 or the empty string), use |
|
ITIMER_REAL instead of ITIMER_PROF to gather profiles. In |
|
general, ITIMER_REAL is not as accurate as ITIMER_PROF, and also |
|
interacts badly with use of alarm(), so prefer ITIMER_PROF unless |
|
you have a reason prefer ITIMER_REAL. |
|
</td> |
|
</tr> |
|
|
|
</table> |
|
|
|
|
|
<h1><a name="pprof">Analyzing the Output</a></h1> |
|
|
|
<p><code>pprof</code> is the script used to analyze a profile. It has |
|
many output modes, both textual and graphical. Some give just raw |
|
numbers, much like the <code>-pg</code> output of <code>gcc</code>, |
|
and others show the data in the form of a dependency graph.</p> |
|
|
|
<p>pprof <b>requires</b> <code>perl5</code> to be installed to run. |
|
It also requires <code>dot</code> to be installed for any of the |
|
graphical output routines, and <code>gv</code> to be installed for |
|
<code>--gv</code> mode (described below). |
|
</p> |
|
|
|
<p>Here are some ways to call pprof. These are described in more |
|
detail below.</p> |
|
|
|
<pre> |
|
% pprof /bin/ls ls.prof |
|
Enters "interactive" mode |
|
% pprof --text /bin/ls ls.prof |
|
Outputs one line per procedure |
|
% pprof --gv /bin/ls ls.prof |
|
Displays annotated call-graph via 'gv' |
|
% pprof --gv --focus=Mutex /bin/ls ls.prof |
|
Restricts to code paths including a .*Mutex.* entry |
|
% pprof --gv --focus=Mutex --ignore=string /bin/ls ls.prof |
|
Code paths including Mutex but not string |
|
% pprof --list=getdir /bin/ls ls.prof |
|
(Per-line) annotated source listing for getdir() |
|
% pprof --disasm=getdir /bin/ls ls.prof |
|
(Per-PC) annotated disassembly for getdir() |
|
% pprof --text localhost:1234 |
|
Outputs one line per procedure for localhost:1234 |
|
% pprof --callgrind /bin/ls ls.prof |
|
Outputs the call information in callgrind format |
|
</pre> |
|
|
|
|
|
<h3>Analyzing Text Output</h3> |
|
|
|
<p>Text mode has lines of output that look like this:</p> |
|
<pre> |
|
14 2.1% 17.2% 58 8.7% std::_Rb_tree::find |
|
</pre> |
|
|
|
<p>Here is how to interpret the columns:</p> |
|
<ol> |
|
<li> Number of profiling samples in this function |
|
<li> Percentage of profiling samples in this function |
|
<li> Percentage of profiling samples in the functions printed so far |
|
<li> Number of profiling samples in this function and its callees |
|
<li> Percentage of profiling samples in this function and its callees |
|
<li> Function name |
|
</ol> |
|
|
|
<h3>Analyzing Callgrind Output</h3> |
|
|
|
<p>Use <a href="http://kcachegrind.sourceforge.net">kcachegrind</a> to |
|
analyze your callgrind output:</p> |
|
<pre> |
|
% pprof --callgrind /bin/ls ls.prof > ls.callgrind |
|
% kcachegrind ls.callgrind |
|
</pre> |
|
|
|
<p>The cost is specified in 'hits', i.e. how many times a function |
|
appears in the recorded call stack information. The 'calls' from |
|
function a to b record how many times function b was found in the |
|
stack traces directly below function a.</p> |
|
|
|
<p>Tip: if you use a debug build the output will include file and line |
|
number information and kcachegrind will show an annotated source |
|
code view.</p> |
|
|
|
<h3>Node Information</h3> |
|
|
|
<p>In the various graphical modes of pprof, the output is a call graph |
|
annotated with timing information, like so:</p> |
|
|
|
<A HREF="pprof-test-big.gif"> |
|
<center><table><tr><td> |
|
<img src="pprof-test.gif"> |
|
</td></tr></table></center> |
|
</A> |
|
|
|
<p>Each node represents a procedure. The directed edges indicate |
|
caller to callee relations. Each node is formatted as follows:</p> |
|
|
|
<center><pre> |
|
Class Name |
|
Method Name |
|
local (percentage) |
|
<b>of</b> cumulative (percentage) |
|
</pre></center> |
|
|
|
<p>The last one or two lines contains the timing information. (The |
|
profiling is done via a sampling method, where by default we take 100 |
|
samples a second. Therefor one unit of time in the output corresponds |
|
to about 10 milliseconds of execution time.) The "local" time is the |
|
time spent executing the instructions directly contained in the |
|
procedure (and in any other procedures that were inlined into the |
|
procedure). The "cumulative" time is the sum of the "local" time and |
|
the time spent in any callees. If the cumulative time is the same as |
|
the local time, it is not printed.</p> |
|
|
|
<p>For instance, the timing information for test_main_thread() |
|
indicates that 155 units (about 1.55 seconds) were spent executing the |
|
code in <code>test_main_thread()</code> and 200 units were spent while |
|
executing <code>test_main_thread()</code> and its callees such as |
|
<code>snprintf()</code>.</p> |
|
|
|
<p>The size of the node is proportional to the local count. The |
|
percentage displayed in the node corresponds to the count divided by |
|
the total run time of the program (that is, the cumulative count for |
|
<code>main()</code>).</p> |
|
|
|
<h3>Edge Information</h3> |
|
|
|
<p>An edge from one node to another indicates a caller to callee |
|
relationship. Each edge is labelled with the time spent by the callee |
|
on behalf of the caller. E.g, the edge from |
|
<code>test_main_thread()</code> to <code>snprintf()</code> indicates |
|
that of the 200 samples in <code>test_main_thread()</code>, 37 are |
|
because of calls to <code>snprintf()</code>.</p> |
|
|
|
<p>Note that <code>test_main_thread()</code> has an edge to |
|
<code>vsnprintf()</code>, even though <code>test_main_thread()</code> |
|
doesn't call that function directly. This is because the code was |
|
compiled with <code>-O2</code>; the profile reflects the optimized |
|
control flow.</p> |
|
|
|
<h3>Meta Information</h3> |
|
|
|
<p>The top of the display should contain some meta information |
|
like:</p> |
|
<pre> |
|
/tmp/profiler2_unittest |
|
Total samples: 202 |
|
Focusing on: 202 |
|
Dropped nodes with <= 1 abs(samples) |
|
Dropped edges with <= 0 samples |
|
</pre> |
|
|
|
<p>This section contains the name of the program, and the total |
|
samples collected during the profiling run. If the |
|
<code>--focus</code> option is on (see the <a href="#focus">Focus</a> |
|
section below), the legend also contains the number of samples being |
|
shown in the focused display. Furthermore, some unimportant nodes and |
|
edges are dropped to reduce clutter. The characteristics of the |
|
dropped nodes and edges are also displayed in the legend.</p> |
|
|
|
<h3><a name=focus>Focus and Ignore</a></h3> |
|
|
|
<p>You can ask pprof to generate a display focused on a particular |
|
piece of the program. You specify a regular expression. Any portion |
|
of the call-graph that is on a path which contains at least one node |
|
matching the regular expression is preserved. The rest of the |
|
call-graph is dropped on the floor. For example, you can focus on the |
|
<code>vsnprintf()</code> libc call in <code>profiler2_unittest</code> |
|
as follows:</p> |
|
|
|
<pre> |
|
% pprof --gv --focus=vsnprintf /tmp/profiler2_unittest test.prof |
|
</pre> |
|
<A HREF="pprof-vsnprintf-big.gif"> |
|
<center><table><tr><td> |
|
<img src="pprof-vsnprintf.gif"> |
|
</td></tr></table></center> |
|
</A> |
|
|
|
<p>Similarly, you can supply the <code>--ignore</code> option to |
|
ignore samples that match a specified regular expression. E.g., if |
|
you are interested in everything except calls to |
|
<code>snprintf()</code>, you can say:</p> |
|
<pre> |
|
% pprof --gv --ignore=snprintf /tmp/profiler2_unittest test.prof |
|
</pre> |
|
|
|
|
|
<h3>Interactive mode</a></h3> |
|
|
|
<p>By default -- if you don't specify any flags to the contrary -- |
|
pprof runs in interactive mode. At the <code>(pprof)</code> prompt, |
|
you can run many of the commands described above. You can type |
|
<code>help</code> for a list of what commands are available in |
|
interactive mode.</p> |
|
|
|
<h3><a name=options>pprof Options</a></h3> |
|
|
|
For a complete list of pprof options, you can run <code>pprof |
|
--help</code>. |
|
|
|
<h4>Output Type</h4> |
|
|
|
<p> |
|
<center> |
|
<table frame=box rules=sides cellpadding=5 width=100%> |
|
<tr valign=top> |
|
<td><code>--text</code></td> |
|
<td> |
|
Produces a textual listing. (Note: If you have an X display, and |
|
<code>dot</code> and <code>gv</code> installed, you will probably |
|
be happier with the <code>--gv</code> output.) |
|
</td> |
|
</tr> |
|
<tr valign=top> |
|
<td><code>--gv</code></td> |
|
<td> |
|
Generates annotated call-graph, converts to postscript, and |
|
displays via gv (requres <code>dot</code> and <code>gv</code> be |
|
installed). |
|
</td> |
|
</tr> |
|
<tr valign=top> |
|
<td><code>--dot</code></td> |
|
<td> |
|
Generates the annotated call-graph in dot format and |
|
emits to stdout (requres <code>dot</code> be installed). |
|
</td> |
|
</tr> |
|
<tr valign=top> |
|
<td><code>--ps</code></td> |
|
<td> |
|
Generates the annotated call-graph in Postscript format and |
|
emits to stdout (requres <code>dot</code> be installed). |
|
</td> |
|
</tr> |
|
<tr valign=top> |
|
<td><code>--pdf</code></td> |
|
<td> |
|
Generates the annotated call-graph in PDF format and emits to |
|
stdout (requires <code>dot</code> and <code>ps2pdf</code> be |
|
installed). |
|
</td> |
|
</tr> |
|
<tr valign=top> |
|
<td><code>--gif</code></td> |
|
<td> |
|
Generates the annotated call-graph in GIF format and |
|
emits to stdout (requres <code>dot</code> be installed). |
|
</td> |
|
</tr> |
|
<tr valign=top> |
|
<td><code>--list=<<i>regexp</i>></code></td> |
|
<td> |
|
<p>Outputs source-code listing of routines whose |
|
name matches <regexp>. Each line |
|
in the listing is annotated with flat and cumulative |
|
sample counts.</p> |
|
|
|
<p>In the presence of inlined calls, the samples |
|
associated with inlined code tend to get assigned |
|
to a line that follows the location of the |
|
inlined call. A more precise accounting can be |
|
obtained by disassembling the routine using the |
|
--disasm flag.</p> |
|
</td> |
|
</tr> |
|
<tr valign=top> |
|
<td><code>--disasm=<<i>regexp</i>></code></td> |
|
<td> |
|
Generates disassembly of routines that match |
|
<regexp>, annotated with flat and |
|
cumulative sample counts and emits to stdout. |
|
</td> |
|
</tr> |
|
</table> |
|
</center> |
|
|
|
<h4>Reporting Granularity</h4> |
|
|
|
<p>By default, pprof produces one entry per procedure. However you can |
|
use one of the following options to change the granularity of the |
|
output. The <code>--files</code> option seems to be particularly |
|
useless, and may be removed eventually.</p> |
|
|
|
<center> |
|
<table frame=box rules=sides cellpadding=5 width=100%> |
|
<tr valign=top> |
|
<td><code>--addresses</code></td> |
|
<td> |
|
Produce one node per program address. |
|
</td> |
|
</tr> |
|
<td><code>--lines</code></td> |
|
<td> |
|
Produce one node per source line. |
|
</td> |
|
</tr> |
|
<td><code>--functions</code></td> |
|
<td> |
|
Produce one node per function (this is the default). |
|
</td> |
|
</tr> |
|
<td><code>--files</code></td> |
|
<td> |
|
Produce one node per source file. |
|
</td> |
|
</tr> |
|
</table> |
|
</center> |
|
|
|
<h4>Controlling the Call Graph Display</h4> |
|
|
|
<p>Some nodes and edges are dropped to reduce clutter in the output |
|
display. The following options control this effect:</p> |
|
|
|
<center> |
|
<table frame=box rules=sides cellpadding=5 width=100%> |
|
<tr valign=top> |
|
<td><code>--nodecount=<n></code></td> |
|
<td> |
|
This option controls the number of displayed nodes. The nodes |
|
are first sorted by decreasing cumulative count, and then only |
|
the top N nodes are kept. The default value is 80. |
|
</td> |
|
</tr> |
|
<tr valign=top> |
|
<td><code>--nodefraction=<f></code></td> |
|
<td> |
|
This option provides another mechanism for discarding nodes |
|
from the display. If the cumulative count for a node is |
|
less than this option's value multiplied by the total count |
|
for the profile, the node is dropped. The default value |
|
is 0.005; i.e. nodes that account for less than |
|
half a percent of the total time are dropped. A node |
|
is dropped if either this condition is satisfied, or the |
|
--nodecount condition is satisfied. |
|
</td> |
|
</tr> |
|
<tr valign=top> |
|
<td><code>--edgefraction=<f></code></td> |
|
<td> |
|
This option controls the number of displayed edges. First of all, |
|
an edge is dropped if either its source or destination node is |
|
dropped. Otherwise, the edge is dropped if the sample |
|
count along the edge is less than this option's value multiplied |
|
by the total count for the profile. The default value is |
|
0.001; i.e., edges that account for less than |
|
0.1% of the total time are dropped. |
|
</td> |
|
</tr> |
|
<tr valign=top> |
|
<td><code>--focus=<re></code></td> |
|
<td> |
|
This option controls what region of the graph is displayed |
|
based on the regular expression supplied with the option. |
|
For any path in the callgraph, we check all nodes in the path |
|
against the supplied regular expression. If none of the nodes |
|
match, the path is dropped from the output. |
|
</td> |
|
</tr> |
|
<tr valign=top> |
|
<td><code>--ignore=<re></code></td> |
|
<td> |
|
This option controls what region of the graph is displayed |
|
based on the regular expression supplied with the option. |
|
For any path in the callgraph, we check all nodes in the path |
|
against the supplied regular expression. If any of the nodes |
|
match, the path is dropped from the output. |
|
</td> |
|
</tr> |
|
</table> |
|
</center> |
|
|
|
<p>The dropped edges and nodes account for some count mismatches in |
|
the display. For example, the cumulative count for |
|
<code>snprintf()</code> in the first diagram above was 41. However |
|
the local count (1) and the count along the outgoing edges (12+1+20+6) |
|
add up to only 40.</p> |
|
|
|
|
|
<h1>Caveats</h1> |
|
|
|
<ul> |
|
<li> If the program exits because of a signal, the generated profile |
|
will be <font color=red>incomplete, and may perhaps be |
|
completely empty</font>. |
|
<li> The displayed graph may have disconnected regions because |
|
of the edge-dropping heuristics described above. |
|
<li> If the program linked in a library that was not compiled |
|
with enough symbolic information, all samples associated |
|
with the library may be charged to the last symbol found |
|
in the program before the library. This will artificially |
|
inflate the count for that symbol. |
|
<li> If you run the program on one machine, and profile it on |
|
another, and the shared libraries are different on the two |
|
machines, the profiling output may be confusing: samples that |
|
fall within shared libaries may be assigned to arbitrary |
|
procedures. |
|
<li> If your program forks, the children will also be profiled |
|
(since they inherit the same CPUPROFILE setting). Each process |
|
is profiled separately; to distinguish the child profiles from |
|
the parent profile and from each other, all children will have |
|
their process-id appended to the CPUPROFILE name. |
|
<li> Due to a hack we make to work around a possible gcc bug, your |
|
profiles may end up named strangely if the first character of |
|
your CPUPROFILE variable has ascii value greater than 127. |
|
This should be exceedingly rare, but if you need to use such a |
|
name, just set prepend <code>./</code> to your filename: |
|
<code>CPUPROFILE=./Ägypten</code>. |
|
</ul> |
|
|
|
|
|
<hr> |
|
<address>Sanjay Ghemawat<br> |
|
<!-- Created: Tue Dec 19 10:43:14 PST 2000 --> |
|
<!-- hhmts start --> |
|
Last modified: Fri May 9 14:41:29 PDT 2008 |
|
<!-- hhmts end --> |
|
</address> |
|
</BODY> |
|
</HTML>
|
|
|