alistair23-linux/tools/perf/Documentation/perf-stat.txt

284 lines
8.8 KiB
Plaintext
Raw Normal View History

perf-stat(1)
============
NAME
----
perf-stat - Run a command and gather performance counter statistics
SYNOPSIS
--------
[verse]
'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command>
'perf stat' [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>]
perf stat record: Add record command Add 'perf stat record' command support. It creates simple (header only) perf.data file ATM. The record command could be specified anywhere among stat options. All stat command options are valid for stat record command with '-o' option exception. If specified for record command it denotes the perf data file name. Committer note: Set sample_type to PERF_SAMPLE_IDENTIFIER, which should be harmless while avoiding that older tools show confusing messages, for instance, with sample_type = 0, we get: $ perf stat record usleep 1 Performance counter stats for 'usleep 1': 0.630237 task-clock (msec) # 0.528 CPUs utilized 1 context-switches # 0.002 M/sec 0 cpu-migrations # 0.000 K/sec 52 page-faults # 0.083 M/sec 978,312 cycles # 1.552 GHz 671,931 stalled-cycles-frontend # 68.68% frontend cycles idle <not supported> stalled-cycles-backend 646,379 instructions # 0.66 insns per cycle # 1.04 stalled cycles per insn 131,046 branches # 207.931 M/sec 7,073 branch-misses # 5.40% of all branches 0.001193240 seconds time elapsed $ oldperf evlist WARNING: The perf.data file's data size field is 0 which is unexpected. Was the 'perf record' command properly terminated? non matching sample_type $ While with sample_type set to PERF_SAMPLE_IDENTIFIER, after we re-run 'perf stat record usleep' we get: $ oldperf evlist WARNING: The perf.data file's data size field is 0 which is unexpected. Was the 'perf record' command properly terminated? task-clock context-switches cpu-migrations page-faults cycles stalled-cycles-frontend stalled-cycles-backend instructions branches branch-misses $ Which at least shows the names of the events in the perf.data file. Additionally, such files, when passed to 'perf report' will produce: $ oldperf report --stdio WARNING: The perf.data file's data size field is 0 which is unexpected. Was the 'perf record' command properly terminated? Warning: Kernel address maps (/proc/{kallsyms,modules}) were restricted. Check /proc/sys/kernel/kptr_restrict before running 'perf record'. As no suitable kallsyms nor vmlinux was found, kernel samples can't be resolved. Samples in kernel modules can't be resolved as well. Error: The perf.data file has no samples! # To display the perf.data header info, please use --header/--header-only options. # $ Which is confusing and can be solved by just adding the kernel mmap record, which will also remove that warning about the data size field being equal to zero, after generating the mmap record: $ perf stat record usleep 1 Performance counter stats for 'usleep 1': 0.600796 task-clock (msec) # 0.478 CPUs utilized 1 context-switches # 0.002 M/sec 0 cpu-migrations # 0.000 K/sec 54 page-faults # 0.090 M/sec 886,844 cycles # 1.476 GHz 582,169 stalled-cycles-frontend # 65.65% frontend cycles idle <not supported> stalled-cycles-backend 638,344 instructions # 0.72 insns per cycle # 0.91 stalled cycles per insn 130,204 branches # 216.719 M/sec 7,500 branch-misses # 5.76% of all branches 0.001255897 seconds time elapsed $ oldperf evlist task-clock context-switches cpu-migrations page-faults cycles stalled-cycles-frontend stalled-cycles-backend instructions branches branch-misses $ oldperf report --stdio Error: The perf.data file has no samples! # To display the perf.data header info, please use --header/--header-only options. # [acme@zoo linux]$ No warnings, sensible output about what are the events in the perf.data file and also a "file has no samples" message, which indeed it doesn't. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: htp://lkml.kernel.org/r/1446734469-11352-3-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-11-05 07:40:46 -07:00
'perf stat' [-e <EVENT> | --event=EVENT] [-a] record [-o file] -- <command> [<options>]
'perf stat' report [-i file]
DESCRIPTION
-----------
This command runs a command and gathers performance counter statistics
from it.
OPTIONS
-------
<command>...::
Any command you can specify in a shell.
perf stat record: Add record command Add 'perf stat record' command support. It creates simple (header only) perf.data file ATM. The record command could be specified anywhere among stat options. All stat command options are valid for stat record command with '-o' option exception. If specified for record command it denotes the perf data file name. Committer note: Set sample_type to PERF_SAMPLE_IDENTIFIER, which should be harmless while avoiding that older tools show confusing messages, for instance, with sample_type = 0, we get: $ perf stat record usleep 1 Performance counter stats for 'usleep 1': 0.630237 task-clock (msec) # 0.528 CPUs utilized 1 context-switches # 0.002 M/sec 0 cpu-migrations # 0.000 K/sec 52 page-faults # 0.083 M/sec 978,312 cycles # 1.552 GHz 671,931 stalled-cycles-frontend # 68.68% frontend cycles idle <not supported> stalled-cycles-backend 646,379 instructions # 0.66 insns per cycle # 1.04 stalled cycles per insn 131,046 branches # 207.931 M/sec 7,073 branch-misses # 5.40% of all branches 0.001193240 seconds time elapsed $ oldperf evlist WARNING: The perf.data file's data size field is 0 which is unexpected. Was the 'perf record' command properly terminated? non matching sample_type $ While with sample_type set to PERF_SAMPLE_IDENTIFIER, after we re-run 'perf stat record usleep' we get: $ oldperf evlist WARNING: The perf.data file's data size field is 0 which is unexpected. Was the 'perf record' command properly terminated? task-clock context-switches cpu-migrations page-faults cycles stalled-cycles-frontend stalled-cycles-backend instructions branches branch-misses $ Which at least shows the names of the events in the perf.data file. Additionally, such files, when passed to 'perf report' will produce: $ oldperf report --stdio WARNING: The perf.data file's data size field is 0 which is unexpected. Was the 'perf record' command properly terminated? Warning: Kernel address maps (/proc/{kallsyms,modules}) were restricted. Check /proc/sys/kernel/kptr_restrict before running 'perf record'. As no suitable kallsyms nor vmlinux was found, kernel samples can't be resolved. Samples in kernel modules can't be resolved as well. Error: The perf.data file has no samples! # To display the perf.data header info, please use --header/--header-only options. # $ Which is confusing and can be solved by just adding the kernel mmap record, which will also remove that warning about the data size field being equal to zero, after generating the mmap record: $ perf stat record usleep 1 Performance counter stats for 'usleep 1': 0.600796 task-clock (msec) # 0.478 CPUs utilized 1 context-switches # 0.002 M/sec 0 cpu-migrations # 0.000 K/sec 54 page-faults # 0.090 M/sec 886,844 cycles # 1.476 GHz 582,169 stalled-cycles-frontend # 65.65% frontend cycles idle <not supported> stalled-cycles-backend 638,344 instructions # 0.72 insns per cycle # 0.91 stalled cycles per insn 130,204 branches # 216.719 M/sec 7,500 branch-misses # 5.76% of all branches 0.001255897 seconds time elapsed $ oldperf evlist task-clock context-switches cpu-migrations page-faults cycles stalled-cycles-frontend stalled-cycles-backend instructions branches branch-misses $ oldperf report --stdio Error: The perf.data file has no samples! # To display the perf.data header info, please use --header/--header-only options. # [acme@zoo linux]$ No warnings, sensible output about what are the events in the perf.data file and also a "file has no samples" message, which indeed it doesn't. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: htp://lkml.kernel.org/r/1446734469-11352-3-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-11-05 07:40:46 -07:00
record::
See STAT RECORD.
report::
See STAT REPORT.
-e::
--event=::
Select the PMU event. Selection can be:
- a symbolic event name (use 'perf list' to list all events)
- a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a
hexadecimal event descriptor.
- a symbolically formed event like 'pmu/param1=0x3,param2/' where
param1 and param2 are defined as formats for the PMU in
/sys/bus/event_sources/devices/<pmu>/format/*
- a symbolically formed event like 'pmu/config=M,config1=N,config2=K/'
where M, N, K are numbers (in decimal, hex, octal format).
Acceptable values for each of 'config', 'config1' and 'config2'
parameters are defined by corresponding entries in
/sys/bus/event_sources/devices/<pmu>/format/*
-i::
--no-inherit::
child tasks do not inherit counters
-p::
--pid=<pid>::
stat events on existing process id (comma separated list)
-t::
--tid=<tid>::
stat events on existing thread id (comma separated list)
-a::
--all-cpus::
system-wide collection from all CPUs
-c::
--scale::
scale/normalize counter values
-d::
--detailed::
print more detailed statistics, can be specified up to 3 times
-d: detailed events, L1 and LLC data cache
-d -d: more detailed events, dTLB and iTLB events
-d -d -d: very detailed events, adding prefetch events
-r::
--repeat=<n>::
repeat command and print average + stddev (max: 100). 0 means forever.
perf stat: add perf stat -B to pretty print large numbers It is hard to read very large numbers so provide an option to perf stat to separate thousands using a separator. The patch leverages the locale support of stdio. You need to set your LC_NUMERIC appropriately, for instance LC_NUMERIC=en_US.UTF8. You need to pass -B to activate this feature. This way existing scripts parsing the output do not need to be changed. Here is an example. $ perf stat noploop 2 noploop for 2 seconds Performance counter stats for 'noploop 2': 1998.347031 task-clock-msecs # 0.998 CPUs 61 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 118 page-faults # 0.000 M/sec 4,138,410,900 cycles # 2070.917 M/sec (scaled from 70.01%) 2,062,650,268 instructions # 0.498 IPC (scaled from 70.01%) 2,057,653,466 branches # 1029.678 M/sec (scaled from 70.01%) 40,267 branch-misses # 0.002 % (scaled from 30.04%) 2,055,961,348 cache-references # 1028.831 M/sec (scaled from 30.03%) 53,725 cache-misses # 0.027 M/sec (scaled from 30.02%) 2.001393933 seconds time elapsed $ perf stat -B noploop 2 noploop for 2 seconds Performance counter stats for 'noploop 2': 1998.297883 task-clock-msecs # 0.998 CPUs 59 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 119 page-faults # 0.000 M/sec 4,131,380,160 cycles # 2067.450 M/sec (scaled from 70.01%) 2,059,096,507 instructions # 0.498 IPC (scaled from 70.01%) 2,054,681,303 branches # 1028.216 M/sec (scaled from 70.01%) 25,650 branch-misses # 0.001 % (scaled from 30.05%) 2,056,283,014 cache-references # 1029.017 M/sec (scaled from 30.03%) 47,097 cache-misses # 0.024 M/sec (scaled from 30.02%) 2.001391016 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <4bf28fe8.914ed80a.01ca.fffff5f5@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-05-18 07:00:01 -06:00
-B::
--big-num::
perf stat: add perf stat -B to pretty print large numbers It is hard to read very large numbers so provide an option to perf stat to separate thousands using a separator. The patch leverages the locale support of stdio. You need to set your LC_NUMERIC appropriately, for instance LC_NUMERIC=en_US.UTF8. You need to pass -B to activate this feature. This way existing scripts parsing the output do not need to be changed. Here is an example. $ perf stat noploop 2 noploop for 2 seconds Performance counter stats for 'noploop 2': 1998.347031 task-clock-msecs # 0.998 CPUs 61 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 118 page-faults # 0.000 M/sec 4,138,410,900 cycles # 2070.917 M/sec (scaled from 70.01%) 2,062,650,268 instructions # 0.498 IPC (scaled from 70.01%) 2,057,653,466 branches # 1029.678 M/sec (scaled from 70.01%) 40,267 branch-misses # 0.002 % (scaled from 30.04%) 2,055,961,348 cache-references # 1028.831 M/sec (scaled from 30.03%) 53,725 cache-misses # 0.027 M/sec (scaled from 30.02%) 2.001393933 seconds time elapsed $ perf stat -B noploop 2 noploop for 2 seconds Performance counter stats for 'noploop 2': 1998.297883 task-clock-msecs # 0.998 CPUs 59 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 119 page-faults # 0.000 M/sec 4,131,380,160 cycles # 2067.450 M/sec (scaled from 70.01%) 2,059,096,507 instructions # 0.498 IPC (scaled from 70.01%) 2,054,681,303 branches # 1028.216 M/sec (scaled from 70.01%) 25,650 branch-misses # 0.001 % (scaled from 30.05%) 2,056,283,014 cache-references # 1029.017 M/sec (scaled from 30.03%) 47,097 cache-misses # 0.024 M/sec (scaled from 30.02%) 2.001391016 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <4bf28fe8.914ed80a.01ca.fffff5f5@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-05-18 07:00:01 -06:00
print large numbers with thousands' separators according to locale
-C::
--cpu=::
Count only on the list of CPUs provided. Multiple CPUs can be provided as a
comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
In per-thread mode, this option is ignored. The -a option is still necessary
to activate system-wide monitoring. Default is to count on all CPUs.
perf stat: Add no-aggregation mode to -a This patch adds a new -A option to perf stat. If specified then perf stat does not aggregate counts across all monitored CPUs in system-wide mode, i.e., when using -a. This option is not supported in per-thread mode. Being able to get a per-cpu breakdown is useful to detect imbalances between CPUs when running a uniform workload than spans all monitored CPUs. The second version corrects the missing cpumap[] support, so that it works when the -C option is used. The third version fixes a missing cpumap[] in print_counter() and removes a stray patch in builtin-trace.c. Examples on a 4-way system: # perf stat -a -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': 9592808135 cycles 3490380006 instructions # 0.364 IPC 1.001584632 seconds time elapsed # perf stat -a -A -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': CPU0 2398163767 cycles CPU1 2398180817 cycles CPU2 2398217115 cycles CPU3 2398247483 cycles CPU0 872282046 instructions # 0.364 IPC CPU1 873481776 instructions # 0.364 IPC CPU2 872638127 instructions # 0.364 IPC CPU3 872437789 instructions # 0.364 IPC 1.001556052 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-16 02:05:01 -07:00
-A::
--no-aggr::
Do not aggregate counts across all monitored CPUs in system-wide mode (-a).
This option is only valid in system-wide mode.
-n::
--null::
null run - don't start any counters
-v::
--verbose::
be more verbose (show counter open errors, etc)
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-01 09:49:05 -07:00
-x SEP::
--field-separator SEP::
print counts using a CSV-style output to make it easy to import directly into
spreadsheets. Columns are separated by the string specified in SEP.
-G name::
--cgroup name::
monitor only in the container (cgroup) called "name". This option is available only
in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
container "name" are monitored when they run on the monitored CPUs. Multiple cgroups
can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup
to first event, second cgroup to second event and so on. It is possible to provide
an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
corresponding events, i.e., they always refer to events defined earlier on the command
line.
-o file::
--output file::
Print the output into the designated file.
--append::
Append to the output file designated with the -o option. Ignored if -o is not specified.
--log-fd::
Log output to fd, instead of stderr. Complementary to --output, and mutually exclusive
with it. --append may be used here. Examples:
3>results perf stat --log-fd 3 -- $cmd
3>>results perf stat --log-fd 3 --append -- $cmd
--pre::
--post::
Pre and post measurement hooks, e.g.:
perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' -- make -s -j64 O=defconfig-build/ bzImage
2013-01-29 04:47:44 -07:00
-I msecs::
--interval-print msecs::
perf stat: Reduce min --interval-print to 10ms The --interval-print parameter was limited to 100ms. However, for example, 10ms is required to do sophisticated bandwidth analysis using uncore events. The test shows that the overhead of the system-wide uncore monitoring with 10ms interval is only ~2%. So this patch reduces the minimal interval-print allowd to 10ms. But 10ms may not work well for all cases. For example, when the cpus/threads number is very large, for system-wide core event monitoring the overhead could be high. To handle this issue, a warning will be displayed when the interval-print is set between 10ms to 100ms. So users can make a decision according to their specific cases. # perf stat -e uncore_imc_1/cas_count_read/ -a --interval-print 10 -- sleep 1 print interval < 100ms. The overhead percentage could be high in some cases. Please proceed with caution. # time counts unit events 0.010200451 0.10 MiB uncore_imc_1/cas_count_read/ 0.020475117 0.02 MiB uncore_imc_1/cas_count_read/ 0.030692800 0.01 MiB uncore_imc_1/cas_count_read/ 0.040948161 0.02 MiB uncore_imc_1/cas_count_read/ 0.051159564 0.00 MiB uncore_imc_1/cas_count_read/ Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1443776674-42511-1-git-send-email-kan.liang@intel.com [ Added warning about overhead when using sub 100ms intervals to the man page ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-02 03:04:34 -06:00
Print count deltas every N milliseconds (minimum: 10ms)
The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals. Use with caution.
example: 'perf stat -I 1000 -e cycles -a sleep 5'
perf stat: Implement --metric-only mode Add a new mode to only print metrics. Sometimes we don't care about the raw values, just want the computed metrics. This allows more compact printing, so with -I each sample is only a single line. This also allows easier plotting and processing with other tools. The main target is with using --topdown, but it also works with -T and standard perf stat. A few metrics are not supported. To avoiding having to hardcode all the metrics in the code it uses a two pass approach: first compute dummy metrics and only print the headers in the print_metric callback. Then use the callback to print the actual values. There are some additional changes in the stat printout code to handle all metrics being on a single line. One issue is that the column code doesn't know in advance what events are not supported by the CPU, and it would be hard to find out as this could change based on dynamic conditions. That causes empty columns in some cases. The output can be fairly wide, often you may need more than 80 columns. Example: % perf stat -a -I 1000 --metric-only 1.001452803 frontend cycles idle insn per cycle stalled cycles per insn branch-misses of all branches 1.001452803 158.91% 0.66 2.39 2.92% 2.002192321 180.63% 0.76 2.08 2.96% 3.003088282 150.59% 0.62 2.57 2.84% 4.004369835 196.20% 0.98 1.62 3.79% 5.005227314 231.98% 0.84 1.90 4.71% v2: Lots of updates. v3: Use slightly narrower columns v4: Add comment Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1457049458-28956-6-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-03-03 16:57:36 -07:00
--metric-only::
Only print computed metrics. Print them in a single line.
Don't show any raw values. Not supported with --per-thread.
perf stat: Implement --metric-only mode Add a new mode to only print metrics. Sometimes we don't care about the raw values, just want the computed metrics. This allows more compact printing, so with -I each sample is only a single line. This also allows easier plotting and processing with other tools. The main target is with using --topdown, but it also works with -T and standard perf stat. A few metrics are not supported. To avoiding having to hardcode all the metrics in the code it uses a two pass approach: first compute dummy metrics and only print the headers in the print_metric callback. Then use the callback to print the actual values. There are some additional changes in the stat printout code to handle all metrics being on a single line. One issue is that the column code doesn't know in advance what events are not supported by the CPU, and it would be hard to find out as this could change based on dynamic conditions. That causes empty columns in some cases. The output can be fairly wide, often you may need more than 80 columns. Example: % perf stat -a -I 1000 --metric-only 1.001452803 frontend cycles idle insn per cycle stalled cycles per insn branch-misses of all branches 1.001452803 158.91% 0.66 2.39 2.92% 2.002192321 180.63% 0.76 2.08 2.96% 3.003088282 150.59% 0.62 2.57 2.84% 4.004369835 196.20% 0.98 1.62 3.79% 5.005227314 231.98% 0.84 1.90 4.71% v2: Lots of updates. v3: Use slightly narrower columns v4: Add comment Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1457049458-28956-6-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-03-03 16:57:36 -07:00
--per-socket::
Aggregate counts per processor socket for system-wide mode measurements. This
is a useful mode to detect imbalance between sockets. To enable this mode,
use --per-socket in addition to -a. (system-wide). The output includes the
socket number and the number of online processors on that socket. This is
useful to gauge the amount of aggregation.
--per-core::
Aggregate counts per physical processor for system-wide mode measurements. This
is a useful mode to detect imbalance between physical cores. To enable this mode,
use --per-core in addition to -a. (system-wide). The output includes the
core number and the number of online logical processors on that physical processor.
perf stat: Introduce --per-thread option Currently all the -p option PID arguments tasks values get aggregated and printed as single values. Adding --per-tasks option to print values per task. $ perf stat -e cycles,instructions --per-thread -p 30190,30242 ^C Performance counter stats for process id '30190,30242': cat-30190 0 cycles yes-30242 3,842,525,421 cycles cat-30190 0 instructions yes-30242 10,370,817,010 instructions 1.143155657 seconds time elapsed Also works under interval mode: $ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000 # time comm-pid counts unit events 1.000073435 cat-30190 89,058 cycles 1.000073435 yes-30242 3,360,786,902 cycles (100.00%) 1.000073435 cat-30190 14,066 instructions 1.000073435 yes-30242 9,069,937,462 instructions 2.000204830 cat-30190 0 cycles 2.000204830 yes-30242 3,351,667,626 cycles 2.000204830 cat-30190 0 instructions 2.000204830 yes-30242 9,045,796,885 instructions ^C 2.771286639 cat-30190 0 cycles 2.771286639 yes-30242 2,593,884,166 cycles 2.771286639 cat-30190 0 instructions 2.771286639 yes-30242 7,001,171,191 instructions It works only with -t and -p options, otherwise following error is printed: $ perf stat -e cycles --per-thread -I 1000 ls The --per-thread option is only available when monitoring via -p -t options. -p, --pid <pid> stat events on existing process id -t, --tid <tid> stat events on existing thread id Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 03:29:27 -06:00
--per-thread::
Aggregate counts per monitored threads, when monitoring threads (-t option)
or processes (-p option).
perf stat: Add support for --initial-delay option When measuring workloads the startup phase -- doing page faults, dynamic linking, opening files -- is often very different from the rest of the workload. Especially with smaller kernels and using counter multiplexing this can give significant measurement errors. Multiplexing assumes that the workload is mostly the same over longer periods. But at startup there is typically some spike of activity which is relatively short. If many groups are multiplexing the one group seeing the spike, and which is then scaled up over the time to run all groups, may see a significant error. Also in general it's often not useful to measure the startup, because it is so different from the rest. One way around this is to use interval mode and discard the first sample, but this can be awkward because interval mode doesn't support intervals of less than 100ms, and also a useful interval is not necessarily the same as a useful startup delay. This patch adds a new --initial-delay / -D option to skip measuring for the startup phase. The time can be specified in ms Here's a simple example: perf stat -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done' ... 3,721 page-faults ... If we just wait 20 ms the number of page faults is 1/3 less: perf stat -D 20 -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done' ... 2,823 page-faults ... So we filtered out most of the startup noise from bash. Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375490473-1503-4-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-08-02 18:41:11 -06:00
-D msecs::
--delay msecs::
perf stat: Add support for --initial-delay option When measuring workloads the startup phase -- doing page faults, dynamic linking, opening files -- is often very different from the rest of the workload. Especially with smaller kernels and using counter multiplexing this can give significant measurement errors. Multiplexing assumes that the workload is mostly the same over longer periods. But at startup there is typically some spike of activity which is relatively short. If many groups are multiplexing the one group seeing the spike, and which is then scaled up over the time to run all groups, may see a significant error. Also in general it's often not useful to measure the startup, because it is so different from the rest. One way around this is to use interval mode and discard the first sample, but this can be awkward because interval mode doesn't support intervals of less than 100ms, and also a useful interval is not necessarily the same as a useful startup delay. This patch adds a new --initial-delay / -D option to skip measuring for the startup phase. The time can be specified in ms Here's a simple example: perf stat -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done' ... 3,721 page-faults ... If we just wait 20 ms the number of page faults is 1/3 less: perf stat -D 20 -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done' ... 2,823 page-faults ... So we filtered out most of the startup noise from bash. Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375490473-1503-4-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-08-02 18:41:11 -06:00
After starting the program, wait msecs before measuring. This is useful to
filter out the startup phase of the program, which is often very different.
-T::
--transaction::
Print statistics of transactional execution if supported.
perf stat record: Add record command Add 'perf stat record' command support. It creates simple (header only) perf.data file ATM. The record command could be specified anywhere among stat options. All stat command options are valid for stat record command with '-o' option exception. If specified for record command it denotes the perf data file name. Committer note: Set sample_type to PERF_SAMPLE_IDENTIFIER, which should be harmless while avoiding that older tools show confusing messages, for instance, with sample_type = 0, we get: $ perf stat record usleep 1 Performance counter stats for 'usleep 1': 0.630237 task-clock (msec) # 0.528 CPUs utilized 1 context-switches # 0.002 M/sec 0 cpu-migrations # 0.000 K/sec 52 page-faults # 0.083 M/sec 978,312 cycles # 1.552 GHz 671,931 stalled-cycles-frontend # 68.68% frontend cycles idle <not supported> stalled-cycles-backend 646,379 instructions # 0.66 insns per cycle # 1.04 stalled cycles per insn 131,046 branches # 207.931 M/sec 7,073 branch-misses # 5.40% of all branches 0.001193240 seconds time elapsed $ oldperf evlist WARNING: The perf.data file's data size field is 0 which is unexpected. Was the 'perf record' command properly terminated? non matching sample_type $ While with sample_type set to PERF_SAMPLE_IDENTIFIER, after we re-run 'perf stat record usleep' we get: $ oldperf evlist WARNING: The perf.data file's data size field is 0 which is unexpected. Was the 'perf record' command properly terminated? task-clock context-switches cpu-migrations page-faults cycles stalled-cycles-frontend stalled-cycles-backend instructions branches branch-misses $ Which at least shows the names of the events in the perf.data file. Additionally, such files, when passed to 'perf report' will produce: $ oldperf report --stdio WARNING: The perf.data file's data size field is 0 which is unexpected. Was the 'perf record' command properly terminated? Warning: Kernel address maps (/proc/{kallsyms,modules}) were restricted. Check /proc/sys/kernel/kptr_restrict before running 'perf record'. As no suitable kallsyms nor vmlinux was found, kernel samples can't be resolved. Samples in kernel modules can't be resolved as well. Error: The perf.data file has no samples! # To display the perf.data header info, please use --header/--header-only options. # $ Which is confusing and can be solved by just adding the kernel mmap record, which will also remove that warning about the data size field being equal to zero, after generating the mmap record: $ perf stat record usleep 1 Performance counter stats for 'usleep 1': 0.600796 task-clock (msec) # 0.478 CPUs utilized 1 context-switches # 0.002 M/sec 0 cpu-migrations # 0.000 K/sec 54 page-faults # 0.090 M/sec 886,844 cycles # 1.476 GHz 582,169 stalled-cycles-frontend # 65.65% frontend cycles idle <not supported> stalled-cycles-backend 638,344 instructions # 0.72 insns per cycle # 0.91 stalled cycles per insn 130,204 branches # 216.719 M/sec 7,500 branch-misses # 5.76% of all branches 0.001255897 seconds time elapsed $ oldperf evlist task-clock context-switches cpu-migrations page-faults cycles stalled-cycles-frontend stalled-cycles-backend instructions branches branch-misses $ oldperf report --stdio Error: The perf.data file has no samples! # To display the perf.data header info, please use --header/--header-only options. # [acme@zoo linux]$ No warnings, sensible output about what are the events in the perf.data file and also a "file has no samples" message, which indeed it doesn't. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: htp://lkml.kernel.org/r/1446734469-11352-3-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-11-05 07:40:46 -07:00
STAT RECORD
-----------
Stores stat data into perf data file.
-o file::
--output file::
Output file name.
STAT REPORT
-----------
Reads and reports stat data from perf data file.
-i file::
--input file::
Input file name.
--per-socket::
Aggregate counts per processor socket for system-wide mode measurements.
--per-core::
Aggregate counts per physical processor for system-wide mode measurements.
-A::
--no-aggr::
Do not aggregate counts across all monitored CPUs.
perf stat: Basic support for TopDown in perf stat Add basic plumbing for TopDown in perf stat TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. Add a new --topdown options to enable events. When --topdown is specified set up events for all topdown events supported by the kernel. Add topdown-* as a special case to the event parser, as is needed for all events containing -. The actual code to compute the metrics is in follow-on patches. v2: Use standard sysctl read function. v3: Move x86 specific code to arch/ v4: Enable --metric-only implicitly for topdown. v5: Add --single-thread option to not force per core mode v6: Fix output order of topdown metrics v7: Allow combining with -d v8: Remove --single-thread again v9: Rename functions, adding arch_ and topdown_. v10: Expand man page and describe TopDown better Paste intro into commit description. Print error when malloc fails. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1464119559-17203-1-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-30 09:49:42 -06:00
--topdown::
Print top down level 1 metrics if supported by the CPU. This allows to
determine bottle necks in the CPU pipeline for CPU bound workloads,
by breaking the cycles consumed down into frontend bound, backend bound,
bad speculation and retiring.
Frontend bound means that the CPU cannot fetch and decode instructions fast
enough. Backend bound means that computation or memory access is the bottle
neck. Bad Speculation means that the CPU wasted cycles due to branch
mispredictions and similar issues. Retiring means that the CPU computed without
an apparently bottleneck. The bottleneck is only the real bottleneck
if the workload is actually bound by the CPU and not by something else.
For best results it is usually a good idea to use it with interval
mode like -I 1000, as the bottleneck of workloads can change often.
The top down metrics are collected per core instead of per
CPU thread. Per core mode is automatically enabled
and -a (global monitoring) is needed, requiring root rights or
perf.perf_event_paranoid=-1.
Topdown uses the full Performance Monitoring Unit, and needs
disabling of the NMI watchdog (as root):
echo 0 > /proc/sys/kernel/nmi_watchdog
for best results. Otherwise the bottlenecks may be inconsistent
on workload with changing phases.
This enables --metric-only, unless overriden with --no-metric-only.
To interpret the results it is usually needed to know on which
CPUs the workload runs on. If needed the CPUs can be forced using
taskset.
perf stat record: Add record command Add 'perf stat record' command support. It creates simple (header only) perf.data file ATM. The record command could be specified anywhere among stat options. All stat command options are valid for stat record command with '-o' option exception. If specified for record command it denotes the perf data file name. Committer note: Set sample_type to PERF_SAMPLE_IDENTIFIER, which should be harmless while avoiding that older tools show confusing messages, for instance, with sample_type = 0, we get: $ perf stat record usleep 1 Performance counter stats for 'usleep 1': 0.630237 task-clock (msec) # 0.528 CPUs utilized 1 context-switches # 0.002 M/sec 0 cpu-migrations # 0.000 K/sec 52 page-faults # 0.083 M/sec 978,312 cycles # 1.552 GHz 671,931 stalled-cycles-frontend # 68.68% frontend cycles idle <not supported> stalled-cycles-backend 646,379 instructions # 0.66 insns per cycle # 1.04 stalled cycles per insn 131,046 branches # 207.931 M/sec 7,073 branch-misses # 5.40% of all branches 0.001193240 seconds time elapsed $ oldperf evlist WARNING: The perf.data file's data size field is 0 which is unexpected. Was the 'perf record' command properly terminated? non matching sample_type $ While with sample_type set to PERF_SAMPLE_IDENTIFIER, after we re-run 'perf stat record usleep' we get: $ oldperf evlist WARNING: The perf.data file's data size field is 0 which is unexpected. Was the 'perf record' command properly terminated? task-clock context-switches cpu-migrations page-faults cycles stalled-cycles-frontend stalled-cycles-backend instructions branches branch-misses $ Which at least shows the names of the events in the perf.data file. Additionally, such files, when passed to 'perf report' will produce: $ oldperf report --stdio WARNING: The perf.data file's data size field is 0 which is unexpected. Was the 'perf record' command properly terminated? Warning: Kernel address maps (/proc/{kallsyms,modules}) were restricted. Check /proc/sys/kernel/kptr_restrict before running 'perf record'. As no suitable kallsyms nor vmlinux was found, kernel samples can't be resolved. Samples in kernel modules can't be resolved as well. Error: The perf.data file has no samples! # To display the perf.data header info, please use --header/--header-only options. # $ Which is confusing and can be solved by just adding the kernel mmap record, which will also remove that warning about the data size field being equal to zero, after generating the mmap record: $ perf stat record usleep 1 Performance counter stats for 'usleep 1': 0.600796 task-clock (msec) # 0.478 CPUs utilized 1 context-switches # 0.002 M/sec 0 cpu-migrations # 0.000 K/sec 54 page-faults # 0.090 M/sec 886,844 cycles # 1.476 GHz 582,169 stalled-cycles-frontend # 65.65% frontend cycles idle <not supported> stalled-cycles-backend 638,344 instructions # 0.72 insns per cycle # 0.91 stalled cycles per insn 130,204 branches # 216.719 M/sec 7,500 branch-misses # 5.76% of all branches 0.001255897 seconds time elapsed $ oldperf evlist task-clock context-switches cpu-migrations page-faults cycles stalled-cycles-frontend stalled-cycles-backend instructions branches branch-misses $ oldperf report --stdio Error: The perf.data file has no samples! # To display the perf.data header info, please use --header/--header-only options. # [acme@zoo linux]$ No warnings, sensible output about what are the events in the perf.data file and also a "file has no samples" message, which indeed it doesn't. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: htp://lkml.kernel.org/r/1446734469-11352-3-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-11-05 07:40:46 -07:00
EXAMPLES
--------
$ perf stat -- make -j
Performance counter stats for 'make -j':
8117.370256 task clock ticks # 11.281 CPU utilization factor
678 context switches # 0.000 M/sec
133 CPU migrations # 0.000 M/sec
235724 pagefaults # 0.029 M/sec
24821162526 CPU cycles # 3057.784 M/sec
18687303457 instructions # 2302.138 M/sec
172158895 cache references # 21.209 M/sec
27075259 cache misses # 3.335 M/sec
Wall-clock time elapsed: 719.554352 msecs
CSV FORMAT
----------
With -x, perf stat is able to output a not-quite-CSV format output
Commas in the output are not put into "". To make it easy to parse
it is recommended to use a different character like -x \;
The fields are in this order:
- optional usec time stamp in fractions of second (with -I xxx)
- optional CPU, core, or socket identifier
- optional number of logical CPUs aggregated
- counter value
- unit of the counter value or empty
- event name
- run time of counter
- percentage of measurement time the counter was running
- optional variance if multiple values are collected with -r
- optional metric value
- optional unit of metric
Additional metrics may be printed with all earlier fields being empty.
SEE ALSO
--------
linkperf:perf-top[1], linkperf:perf-list[1]