You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
373 lines
13 KiB
373 lines
13 KiB
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> |
|
<HTML> |
|
|
|
<HEAD> |
|
<link rel="stylesheet" href="designstyle.css"> |
|
<title>Gperftools Heap Profiler</title> |
|
</HEAD> |
|
|
|
<BODY> |
|
|
|
<p align=right> |
|
<i>Last modified |
|
<script type=text/javascript> |
|
var lm = new Date(document.lastModified); |
|
document.write(lm.toDateString()); |
|
</script></i> |
|
</p> |
|
|
|
<p>This is the heap profiler we use at Google, to explore how C++ |
|
programs manage memory. This facility can be useful for</p> |
|
<ul> |
|
<li> Figuring out what is in the program heap at any given time |
|
<li> Locating memory leaks |
|
<li> Finding places that do a lot of allocation |
|
</ul> |
|
|
|
<p>The profiling system instruments all allocations and frees. It |
|
keeps track of various pieces of information per allocation site. An |
|
allocation site is defined as the active stack trace at the call to |
|
<code>malloc</code>, <code>calloc</code>, <code>realloc</code>, or, |
|
<code>new</code>.</p> |
|
|
|
<p>There are three parts to using it: linking the library into an |
|
application, running the code, and analyzing the output.</p> |
|
|
|
|
|
<h1>Linking in the Library</h1> |
|
|
|
<p>To install the heap profiler into your executable, add |
|
<code>-ltcmalloc</code> to the link-time step for your executable. |
|
Also, while we don't necessarily recommend this form of usage, it's |
|
possible to add in the profiler at run-time using |
|
<code>LD_PRELOAD</code>: |
|
<pre>% env LD_PRELOAD="/usr/lib/libtcmalloc.so" <binary></pre> |
|
|
|
<p>This does <i>not</i> turn on heap profiling; it just inserts the |
|
code. For that reason, it's practical to just always link |
|
<code>-ltcmalloc</code> into a binary while developing; that's what we |
|
do at Google. (However, since any user can turn on the profiler by |
|
setting an environment variable, it's not necessarily recommended to |
|
install profiler-linked binaries into a production, running |
|
system.) Note that if you wish to use the heap profiler, you must |
|
also use the tcmalloc memory-allocation library. There is no way |
|
currently to use the heap profiler separate from tcmalloc.</p> |
|
|
|
|
|
<h1>Running the Code</h1> |
|
|
|
<p>There are several alternatives to actually turn on heap profiling |
|
for a given run of an executable:</p> |
|
|
|
<ol> |
|
<li> <p>Define the environment variable HEAPPROFILE to the filename |
|
to dump the profile to. For instance, to profile |
|
<code>/usr/local/bin/my_binary_compiled_with_tcmalloc</code>:</p> |
|
<pre>% env HEAPPROFILE=/tmp/mybin.hprof /usr/local/bin/my_binary_compiled_with_tcmalloc</pre> |
|
<li> <p>In your code, bracket the code you want profiled in calls to |
|
<code>HeapProfilerStart()</code> and <code>HeapProfilerStop()</code>. |
|
(These functions are declared in <code><gperftools/heap-profiler.h></code>.) |
|
<code>HeapProfilerStart()</code> will take the |
|
profile-filename-prefix as an argument. Then, as often as |
|
you'd like before calling <code>HeapProfilerStop()</code>, you |
|
can use <code>HeapProfilerDump()</code> or |
|
<code>GetHeapProfile()</code> to examine the profile. In case |
|
it's useful, <code>IsHeapProfilerRunning()</code> will tell you |
|
whether you've already called HeapProfilerStart() or not.</p> |
|
</ol> |
|
|
|
|
|
<p>For security reasons, heap profiling will not write to a file -- |
|
and is thus not usable -- for setuid programs.</p> |
|
|
|
<H2>Modifying Runtime Behavior</H2> |
|
|
|
<p>You can more finely control the behavior of the heap profiler via |
|
environment variables.</p> |
|
|
|
<table frame=box rules=sides cellpadding=5 width=100%> |
|
|
|
<tr valign=top> |
|
<td><code>HEAP_PROFILE_ALLOCATION_INTERVAL</code></td> |
|
<td>default: 1073741824 (1 Gb)</td> |
|
<td> |
|
Dump heap profiling information once every specified number of |
|
bytes has been allocated by the program. |
|
</td> |
|
</tr> |
|
|
|
<tr valign=top> |
|
<td><code>HEAP_PROFILE_INUSE_INTERVAL</code></td> |
|
<td>default: 104857600 (100 Mb)</td> |
|
<td> |
|
Dump heap profiling information whenever the high-water memory |
|
usage mark increases by the specified number of bytes. |
|
</td> |
|
</tr> |
|
|
|
<tr valign=top> |
|
<td><code>HEAP_PROFILE_MMAP</code></td> |
|
<td>default: false</td> |
|
<td> |
|
Profile <code>mmap</code>, <code>mremap</code> and <code>sbrk</code> |
|
calls in addition |
|
to <code>malloc</code>, <code>calloc</code>, <code>realloc</code>, |
|
and <code>new</code>. <b>NOTE:</b> this causes the profiler to |
|
profile calls internal to tcmalloc, since tcmalloc and friends use |
|
mmap and sbrk internally for allocations. One partial solution is |
|
to filter these allocations out when running <code>pprof</code>, |
|
with something like |
|
<code>pprof --ignore='DoAllocWithArena|SbrkSysAllocator::Alloc|MmapSysAllocator::Alloc</code>. |
|
</td> |
|
</tr> |
|
|
|
<tr valign=top> |
|
<td><code>HEAP_PROFILE_MMAP_ONLY</code></td> |
|
<td>default: false</td> |
|
<td> |
|
Only profile <code>mmap</code>, <code>mremap</code>, and <code>sbrk</code> |
|
calls; do not profile |
|
<code>malloc</code>, <code>calloc</code>, <code>realloc</code>, |
|
or <code>new</code>. |
|
</td> |
|
</tr> |
|
|
|
<tr valign=top> |
|
<td><code>HEAP_PROFILE_MMAP_LOG</code></td> |
|
<td>default: false</td> |
|
<td> |
|
Log <code>mmap</code>/<code>munmap</code> calls. |
|
</td> |
|
</tr> |
|
|
|
</table> |
|
|
|
<H2>Checking for Leaks</H2> |
|
|
|
<p>You can use the heap profiler to manually check for leaks, for |
|
instance by reading the profiler output and looking for large |
|
allocations. However, for that task, it's easier to use the <A |
|
HREF="heap_checker.html">automatic heap-checking facility</A> built |
|
into tcmalloc.</p> |
|
|
|
|
|
<h1><a name="pprof">Analyzing the Output</a></h1> |
|
|
|
<p>If heap-profiling is turned on in a program, the program will |
|
periodically write profiles to the filesystem. The sequence of |
|
profiles will be named:</p> |
|
<pre> |
|
<prefix>.0000.heap |
|
<prefix>.0001.heap |
|
<prefix>.0002.heap |
|
... |
|
</pre> |
|
<p>where <code><prefix></code> is the filename-prefix supplied |
|
when running the code (e.g. via the <code>HEAPPROFILE</code> |
|
environment variable). Note that if the supplied prefix |
|
does not start with a <code>/</code>, the profile files will be |
|
written to the program's working directory.</p> |
|
|
|
<p>The profile output can be viewed by passing it to the |
|
<code>pprof</code> tool -- the same tool that's used to analyze <A |
|
HREF="cpuprofile.html">CPU profiles</A>. |
|
|
|
<p>Here are some examples. These examples assume the binary is named |
|
<code>gfs_master</code>, and a sequence of heap profile files can be |
|
found in files named:</p> |
|
<pre> |
|
/tmp/profile.0001.heap |
|
/tmp/profile.0002.heap |
|
... |
|
/tmp/profile.0100.heap |
|
</pre> |
|
|
|
<h3>Why is a process so big</h3> |
|
|
|
<pre> |
|
% pprof --gv gfs_master /tmp/profile.0100.heap |
|
</pre> |
|
|
|
<p>This command will pop-up a <code>gv</code> window that displays |
|
the profile information as a directed graph. Here is a portion |
|
of the resulting output:</p> |
|
|
|
<p><center> |
|
<img src="heap-example1.png"> |
|
</center></p> |
|
|
|
A few explanations: |
|
<ul> |
|
<li> <code>GFS_MasterChunk::AddServer</code> accounts for 255.6 MB |
|
of the live memory, which is 25% of the total live memory. |
|
<li> <code>GFS_MasterChunkTable::UpdateState</code> is directly |
|
accountable for 176.2 MB of the live memory (i.e., it directly |
|
allocated 176.2 MB that has not been freed yet). Furthermore, |
|
it and its callees are responsible for 729.9 MB. The |
|
labels on the outgoing edges give a good indication of the |
|
amount allocated by each callee. |
|
</ul> |
|
|
|
<h3>Comparing Profiles</h3> |
|
|
|
<p>You often want to skip allocations during the initialization phase |
|
of a program so you can find gradual memory leaks. One simple way to |
|
do this is to compare two profiles -- both collected after the program |
|
has been running for a while. Specify the name of the first profile |
|
using the <code>--base</code> option. For example:</p> |
|
<pre> |
|
% pprof --base=/tmp/profile.0004.heap gfs_master /tmp/profile.0100.heap |
|
</pre> |
|
|
|
<p>The memory-usage in <code>/tmp/profile.0004.heap</code> will be |
|
subtracted from the memory-usage in |
|
<code>/tmp/profile.0100.heap</code> and the result will be |
|
displayed.</p> |
|
|
|
<h3>Text display</h3> |
|
|
|
<pre> |
|
% pprof --text gfs_master /tmp/profile.0100.heap |
|
255.6 24.7% 24.7% 255.6 24.7% GFS_MasterChunk::AddServer |
|
184.6 17.8% 42.5% 298.8 28.8% GFS_MasterChunkTable::Create |
|
176.2 17.0% 59.5% 729.9 70.5% GFS_MasterChunkTable::UpdateState |
|
169.8 16.4% 75.9% 169.8 16.4% PendingClone::PendingClone |
|
76.3 7.4% 83.3% 76.3 7.4% __default_alloc_template::_S_chunk_alloc |
|
49.5 4.8% 88.0% 49.5 4.8% hashtable::resize |
|
... |
|
</pre> |
|
|
|
<p> |
|
<ul> |
|
<li> The first column contains the direct memory use in MB. |
|
<li> The fourth column contains memory use by the procedure |
|
and all of its callees. |
|
<li> The second and fifth columns are just percentage |
|
representations of the numbers in the first and fourth columns. |
|
<li> The third column is a cumulative sum of the second column |
|
(i.e., the <code>k</code>th entry in the third column is the |
|
sum of the first <code>k</code> entries in the second column.) |
|
</ul> |
|
|
|
<h3>Ignoring or focusing on specific regions</h3> |
|
|
|
<p>The following command will give a graphical display of a subset of |
|
the call-graph. Only paths in the call-graph that match the regular |
|
expression <code>DataBuffer</code> are included:</p> |
|
<pre> |
|
% pprof --gv --focus=DataBuffer gfs_master /tmp/profile.0100.heap |
|
</pre> |
|
|
|
<p>Similarly, the following command will omit all paths subset of the |
|
call-graph. All paths in the call-graph that match the regular |
|
expression <code>DataBuffer</code> are discarded:</p> |
|
<pre> |
|
% pprof --gv --ignore=DataBuffer gfs_master /tmp/profile.0100.heap |
|
</pre> |
|
|
|
<h3>Total allocations + object-level information</h3> |
|
|
|
<p>All of the previous examples have displayed the amount of in-use |
|
space. I.e., the number of bytes that have been allocated but not |
|
freed. You can also get other types of information by supplying a |
|
flag to <code>pprof</code>:</p> |
|
|
|
<center> |
|
<table frame=box rules=sides cellpadding=5 width=100%> |
|
|
|
<tr valign=top> |
|
<td><code>--inuse_space</code></td> |
|
<td> |
|
Display the number of in-use megabytes (i.e. space that has |
|
been allocated but not freed). This is the default. |
|
</td> |
|
</tr> |
|
|
|
<tr valign=top> |
|
<td><code>--inuse_objects</code></td> |
|
<td> |
|
Display the number of in-use objects (i.e. number of |
|
objects that have been allocated but not freed). |
|
</td> |
|
</tr> |
|
|
|
<tr valign=top> |
|
<td><code>--alloc_space</code></td> |
|
<td> |
|
Display the number of allocated megabytes. This includes |
|
the space that has since been de-allocated. Use this |
|
if you want to find the main allocation sites in the |
|
program. |
|
</td> |
|
</tr> |
|
|
|
<tr valign=top> |
|
<td><code>--alloc_objects</code></td> |
|
<td> |
|
Display the number of allocated objects. This includes |
|
the objects that have since been de-allocated. Use this |
|
if you want to find the main allocation sites in the |
|
program. |
|
</td> |
|
|
|
</table> |
|
</center> |
|
|
|
|
|
<h3>Interactive mode</a></h3> |
|
|
|
<p>By default -- if you don't specify any flags to the contrary -- |
|
pprof runs in interactive mode. At the <code>(pprof)</code> prompt, |
|
you can run many of the commands described above. You can type |
|
<code>help</code> for a list of what commands are available in |
|
interactive mode.</p> |
|
|
|
|
|
<h1>Caveats</h1> |
|
|
|
<ul> |
|
<li> Heap profiling requires the use of libtcmalloc. This |
|
requirement may be removed in a future version of the heap |
|
profiler, and the heap profiler separated out into its own |
|
library. |
|
|
|
<li> If the program linked in a library that was not compiled |
|
with enough symbolic information, all samples associated |
|
with the library may be charged to the last symbol found |
|
in the program before the libary. This will artificially |
|
inflate the count for that symbol. |
|
|
|
<li> If you run the program on one machine, and profile it on |
|
another, and the shared libraries are different on the two |
|
machines, the profiling output may be confusing: samples that |
|
fall within the shared libaries may be assigned to arbitrary |
|
procedures. |
|
|
|
<li> Several libraries, such as some STL implementations, do their |
|
own memory management. This may cause strange profiling |
|
results. We have code in libtcmalloc to cause STL to use |
|
tcmalloc for memory management (which in our tests is better |
|
than STL's internal management), though it only works for some |
|
STL implementations. |
|
|
|
<li> If your program forks, the children will also be profiled |
|
(since they inherit the same HEAPPROFILE setting). Each |
|
process is profiled separately; to distinguish the child |
|
profiles from the parent profile and from each other, all |
|
children will have their process-id attached to the HEAPPROFILE |
|
name. |
|
|
|
<li> Due to a hack we make to work around a possible gcc bug, your |
|
profiles may end up named strangely if the first character of |
|
your HEAPPROFILE variable has ascii value greater than 127. |
|
This should be exceedingly rare, but if you need to use such a |
|
name, just set prepend <code>./</code> to your filename: |
|
<code>HEAPPROFILE=./Ägypten</code>. |
|
</ul> |
|
|
|
<hr> |
|
<address>Sanjay Ghemawat |
|
<!-- Created: Tue Dec 19 10:43:14 PST 2000 --> |
|
</address> |
|
</body> |
|
</html>
|
|
|