From dd3092075c3263fca7a3e98b143ad2296aab71b4 Mon Sep 17 00:00:00 2001 From: Namhyung Kim Date: Wed, 22 Apr 2015 15:33:45 +0900 Subject: [PATCH] perf tools: Document --children option in more detail As the --children option changes the output of perf report (and perf top) it sometimes confuses users. Add more words and examples to help understanding of the option's behavior - and how to disable it ;-). Signed-off-by: Namhyung Kim Reviewed-by: Ingo Molnar Cc: David Ahern Cc: Jiri Olsa Cc: Peter Zijlstra Cc: Taeung Song Link: http://lkml.kernel.org/r/1429684425-14987-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- .../callchain-overhead-calculation.txt | 108 ++++++++++++++++++ tools/perf/Documentation/perf-report.txt | 4 + tools/perf/Documentation/perf-top.txt | 3 +- 3 files changed, 114 insertions(+), 1 deletion(-) create mode 100644 tools/perf/Documentation/callchain-overhead-calculation.txt diff --git a/tools/perf/Documentation/callchain-overhead-calculation.txt b/tools/perf/Documentation/callchain-overhead-calculation.txt new file mode 100644 index 000000000000..1a757927195e --- /dev/null +++ b/tools/perf/Documentation/callchain-overhead-calculation.txt @@ -0,0 +1,108 @@ +Overhead calculation +-------------------- +The overhead can be shown in two columns as 'Children' and 'Self' when +perf collects callchains. The 'self' overhead is simply calculated by +adding all period values of the entry - usually a function (symbol). +This is the value that perf shows traditionally and sum of all the +'self' overhead values should be 100%. + +The 'children' overhead is calculated by adding all period values of +the child functions so that it can show the total overhead of the +higher level functions even if they don't directly execute much. +'Children' here means functions that are called from another (parent) +function. + +It might be confusing that the sum of all the 'children' overhead +values exceeds 100% since each of them is already an accumulation of +'self' overhead of its child functions. But with this enabled, users +can find which function has the most overhead even if samples are +spread over the children. + +Consider the following example; there are three functions like below. + +----------------------- +void foo(void) { + /* do something */ +} + +void bar(void) { + /* do something */ + foo(); +} + +int main(void) { + bar() + return 0; +} +----------------------- + +In this case 'foo' is a child of 'bar', and 'bar' is an immediate +child of 'main' so 'foo' also is a child of 'main'. In other words, +'main' is a parent of 'foo' and 'bar', and 'bar' is a parent of 'foo'. + +Suppose all samples are recorded in 'foo' and 'bar' only. When it's +recorded with callchains the output will show something like below +in the usual (self-overhead-only) output of perf report: + +---------------------------------- +Overhead Symbol +........ ..................... + 60.00% foo + | + --- foo + bar + main + __libc_start_main + + 40.00% bar + | + --- bar + main + __libc_start_main +---------------------------------- + +When the --children option is enabled, the 'self' overhead values of +child functions (i.e. 'foo' and 'bar') are added to the parents to +calculate the 'children' overhead. In this case the report could be +displayed as: + +------------------------------------------- +Children Self Symbol +........ ........ .................... + 100.00% 0.00% __libc_start_main + | + --- __libc_start_main + + 100.00% 0.00% main + | + --- main + __libc_start_main + + 100.00% 40.00% bar + | + --- bar + main + __libc_start_main + + 60.00% 60.00% foo + | + --- foo + bar + main + __libc_start_main +------------------------------------------- + +In the above output, the 'self' overhead of 'foo' (60%) was add to the +'children' overhead of 'bar', 'main' and '\_\_libc_start_main'. +Likewise, the 'self' overhead of 'bar' (40%) was added to the +'children' overhead of 'main' and '\_\_libc_start_main'. + +So '\_\_libc_start_main' and 'main' are shown first since they have +same (100%) 'children' overhead (even though they have zero 'self' +overhead) and they are the parents of 'foo' and 'bar'. + +Since v3.16 the 'children' overhead is shown by default and the output +is sorted by its values. The 'children' overhead is disabled by +specifying --no-children option on the command line or by adding +'report.children = false' or 'top.children = false' in the perf config +file. diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 4879cf638824..896672badba3 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -193,6 +193,7 @@ OPTIONS Accumulate callchain of children to parent entry so that then can show up in the output. The output will have a new "Children" column and will be sorted on the data. It requires callchains are recorded. + See the `overhead calculation' section for more details. --max-stack:: Set the stack depth limit when parsing the callchain, anything @@ -323,6 +324,9 @@ OPTIONS --header-only:: Show only perf.data header (forces --stdio). + +include::callchain-overhead-calculation.txt[] + SEE ALSO -------- linkperf:perf-stat[1], linkperf:perf-annotate[1] diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt index 3265b1070518..9e5b07eb7d35 100644 --- a/tools/perf/Documentation/perf-top.txt +++ b/tools/perf/Documentation/perf-top.txt @@ -168,7 +168,7 @@ Default is to monitor all CPUS. Accumulate callchain of children to parent entry so that then can show up in the output. The output will have a new "Children" column and will be sorted on the data. It requires -g/--call-graph option - enabled. + enabled. See the `overhead calculation' section for more details. --max-stack:: Set the stack depth limit when parsing the callchain, anything @@ -234,6 +234,7 @@ INTERACTIVE PROMPTING KEYS Pressing any unmapped key displays a menu, and prompts for input. +include::callchain-overhead-calculation.txt[] SEE ALSO --------