alistair23-linux

redonkable

Author	SHA1	Message	Date
Arnaldo Carvalho de Melo	a9c5e6c1e9	perf trace: Introduce per-event maximum number of events property Call it 'nr', as in this context it should be expressive enough, i.e.: # perf trace -e sched:waking/nr=8,call-graph=fp/ 0.000 :0/0 sched:sched_waking:comm=rcu_sched pid=10 prio=120 target_cpu=001 try_to_wake_up ([kernel.kallsyms]) sched_clock ([kernel.kallsyms]) 3.933 :0/0 sched:sched_waking:comm=rcu_sched pid=10 prio=120 target_cpu=001 try_to_wake_up ([kernel.kallsyms]) sched_clock ([kernel.kallsyms]) 3.970 IPDL Backgroun/3622 sched:sched_waking:comm=Gecko_IOThread pid=3569 prio=120 target_cpu=003 try_to_wake_up ([kernel.kallsyms]) __libc_write (/usr/lib64/libpthread-2.26.so) 20.069 IPDL Backgroun/3622 sched:sched_waking:comm=Gecko_IOThread pid=3569 prio=120 target_cpu=003 try_to_wake_up ([kernel.kallsyms]) __libc_write (/usr/lib64/libpthread-2.26.so) 37.170 IPDL Backgroun/3622 sched:sched_waking:comm=Gecko_IOThread pid=3569 prio=120 target_cpu=003 try_to_wake_up ([kernel.kallsyms]) __libc_write (/usr/lib64/libpthread-2.26.so) 53.267 IPDL Backgroun/3622 sched:sched_waking:comm=Gecko_IOThread pid=3569 prio=120 target_cpu=003 try_to_wake_up ([kernel.kallsyms]) __libc_write (/usr/lib64/libpthread-2.26.so) 70.365 IPDL Backgroun/3622 sched:sched_waking:comm=Gecko_IOThread pid=3569 prio=120 target_cpu=003 try_to_wake_up ([kernel.kallsyms]) __libc_write (/usr/lib64/libpthread-2.26.so) 75.781 Web Content/3649 sched:sched_waking:comm=JS Helper pid=3670 prio=120 target_cpu=000 try_to_wake_up ([kernel.kallsyms]) try_to_wake_up ([kernel.kallsyms]) wake_up_q ([kernel.kallsyms]) futex_wake ([kernel.kallsyms]) do_futex ([kernel.kallsyms]) __x64_sys_futex ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) pthread_cond_signal@@GLIBC_2.3.2 (/usr/lib64/libpthread-2.26.so) # # perf trace -e sched:switch/nr=2/,block:_plug/nr=4/,block:_unplug/nr=1/,net:dev_queue/nr=3,max-stack=16/ 0.000 :0/0 sched:sched_switch:swapper/0:0 [120] S ==> trace:3367 [120] 0.046 :0/0 sched:sched_switch:swapper/1:0 [120] S ==> kworker/u16:58:2722 [120] 570.670 irq/50-iwlwifi/680 net:net_dev_queue:dev=wlp3s0 skbaddr=0xffff93498051ef00 len=66 __dev_queue_xmit ([kernel.kallsyms]) 1106.141 jbd2/dm-0-8/476 block:block_plug:[jbd2/dm-0-8] 1106.175 jbd2/dm-0-8/476 block:block_unplug:[jbd2/dm-0-8] 1 1618.088 kworker/u16:30/2694 block:block_plug:[kworker/u16:30] 1810.000 :0/0 net:net_dev_queue:dev=vnet0 skbaddr=0xffff93498051ef00 len=52 __dev_queue_xmit ([kernel.kallsyms]) 3857.974 :0/0 net:net_dev_queue:dev=vnet0 skbaddr=0xffff93498051f900 len=52 __dev_queue_xmit ([kernel.kallsyms]) 4790.277 jbd2/dm-2-8/748 block:block_plug:[jbd2/dm-2-8] 4790.448 jbd2/dm-2-8/748 block:block_plug:[jbd2/dm-2-8] # The global --max-events has precendence: # trace --max-events 3 -e sched:switch/nr=2/,block:_plug/nr=4/,block:_unplug/nr=1/,net:*dev_queue/nr=3,max-stack=16/ 0.000 :0/0 sched:sched_switch:swapper/0:0 [120] S ==> qemu-system-x86:2252 [120] 0.029 qemu-system-x8/2252 sched:sched_switch:qemu-system-x86:2252 [120] D ==> swapper/0:0 [120] 58.047 DNS Res~er #14/31661 net:net_dev_queue:dev=wlp3s0 skbaddr=0xffff9346966af100 len=84 __dev_queue_xmit ([kernel.kallsyms]) __libc_send (/usr/lib64/libpthread-2.26.so) # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-s4jswltvh660ughvg9nwngah@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-10-22 14:27:12 -03:00
Arnaldo Carvalho de Melo	a937c6658b	perf trace: Drop thread refcount in trace__event_handler() We must pair: thread = machine__findnew_thread(); with thread__put(thread). Fix it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: `c4191e55b8` ("perf trace: Show comm and tid for tracepoint events") Link: https://lkml.kernel.org/n/tip-dkxsb8cwg87rmkrzrbns1o4z@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-10-22 12:37:52 -03:00
Arnaldo Carvalho de Melo	4291bf5cb9	perf trace: Drop addr_location refcounts When we use machine__resolve() we grab a reference to addr_location.thread (and in the future to other elements there) via machine__findnew_thread(), so we must pair that with addr_location__put(), else we'll never drop that thread when it exits and no other remaining data structures have pointers to it. Fix it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-ivg9hifzeuokb1f5jxc2wob4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-10-22 12:37:51 -03:00
Arnaldo Carvalho de Melo	5067a8cdd4	perf trace: Introduce --max-events Allow stopping tracing after a number of events take place, considering strace-like syscalls formatting as one event per enter/exit pair or when in a multi-process tracing session a syscall is interrupted and printed ending with '...'. Examples included in the documentation: Trace the first 4 open, openat or open_by_handle_at syscalls (in the future more syscalls may match here): $ perf trace -e open* --max-events 4 [root@jouet perf]# trace -e open* --max-events 4 2272.992 ( 0.037 ms): gnome-shell/1370 openat(dfd: CWD, filename: /proc/self/stat) = 31 2277.481 ( 0.139 ms): gnome-shell/3039 openat(dfd: CWD, filename: /proc/self/stat) = 65 3026.398 ( 0.076 ms): gnome-shell/3039 openat(dfd: CWD, filename: /proc/self/stat) = 65 4294.665 ( 0.015 ms): sed/15879 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) = 3 $ Trace the first minor page fault when running a workload: # perf trace -F min --max-stack=7 --max-events 1 sleep 1 0.000 ( 0.000 ms): sleep/18006 minfault [__clear_user+0x1a] => 0x5626efa56080 (?k) __clear_user ([kernel.kallsyms]) load_elf_binary ([kernel.kallsyms]) search_binary_handler ([kernel.kallsyms]) __do_execve_file.isra.33 ([kernel.kallsyms]) __x64_sys_execve ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) entry_SYSCALL_64 ([kernel.kallsyms]) # Trace the next min page page fault to take place on the first CPU: # perf trace -F min --call-graph=dwarf --max-events 1 --cpu 0 0.000 ( 0.000 ms): Web Content/17136 minfault [js::gc::Chunk::fetchNextDecommittedArena+0x4b] => 0x7fbe6181b000 (?.) js::gc::FreeSpan::initAsEmpty (inlined) js::gc::Arena::setAsNotAllocated (inlined) js::gc::Chunk::fetchNextDecommittedArena (/usr/lib64/firefox/libxul.so) js::gc::Chunk::allocateArena (/usr/lib64/firefox/libxul.so) js::gc::GCRuntime::allocateArena (/usr/lib64/firefox/libxul.so) js::gc::ArenaLists::allocateFromArena (/usr/lib64/firefox/libxul.so) js::gc::GCRuntime::tryNewTenuredThing<JSString, (js::AllowGC)1> (inlined) js::AllocateString<JSString, (js::AllowGC)1> (/usr/lib64/firefox/libxul.so) js::Allocate<JSThinInlineString, (js::AllowGC)1> (inlined) JSThinInlineString::new_<(js::AllowGC)1> (inlined) AllocateInlineString<(js::AllowGC)1, unsigned char> (inlined) js::ConcatStrings<(js::AllowGC)1> (/usr/lib64/firefox/libxul.so) [0x18b26e6bc2bd] (/tmp/perf-17136.map) Tracing the next four ext4 operations on a specific CPU: # perf trace -e ext4:*/call-graph=fp/ --max-events 4 --cpu 3 0.000 mutt/3849 ext4:ext4_es_lookup_extent_enter:dev 253,2 ino 57277 lblk 0 ext4_es_lookup_extent ([kernel.kallsyms]) read (/usr/lib64/libc-2.26.so) 0.097 mutt/3849 ext4:ext4_es_lookup_extent_exit:dev 253,2 ino 57277 found 0 [0/0) 0 ext4_es_lookup_extent ([kernel.kallsyms]) read (/usr/lib64/libc-2.26.so) 0.141 mutt/3849 ext4:ext4_ext_map_blocks_enter:dev 253,2 ino 57277 lblk 0 len 1 flags ext4_ext_map_blocks ([kernel.kallsyms]) read (/usr/lib64/libc-2.26.so) 0.184 mutt/3849 ext4:ext4_ext_load_extent:dev 253,2 ino 57277 lblk 1516511 pblk 18446744071750013657 __read_extent_tree_block ([kernel.kallsyms]) __read_extent_tree_block ([kernel.kallsyms]) ext4_find_extent ([kernel.kallsyms]) ext4_ext_map_blocks ([kernel.kallsyms]) ext4_map_blocks ([kernel.kallsyms]) ext4_mpage_readpages ([kernel.kallsyms]) read_pages ([kernel.kallsyms]) __do_page_cache_readahead ([kernel.kallsyms]) ondemand_readahead ([kernel.kallsyms]) generic_file_read_iter ([kernel.kallsyms]) __vfs_read ([kernel.kallsyms]) vfs_read ([kernel.kallsyms]) ksys_read ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) entry_SYSCALL_64 ([kernel.kallsyms]) read (/usr/lib64/libc-2.26.so) # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Rudá Moura <ruda.moura@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-sweh107bs7ol5bzls0m4tqdz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-10-19 11:58:33 -03:00
Tzvetomir Stoyanov (VMware)	bb39ccb204	tools lib traceevent, perf tools: Rename enum format_flags to enum tep_format_flags In order to make libtraceevent into a proper library, variables, data structures and functions require a unique prefix to prevent name space conflicts. That prefix will be "tep_". This renames enum format_flags to enum tep_format_flags and adds prefix TEP_ to all of its members. Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Cc: linux-trace-devel@vger.kernel.org Link: http://lkml.kernel.org/r/20180919185722.803127871@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-09-19 17:14:13 -03:00
Tzvetomir Stoyanov (VMware)	2c92f9828b	tools lib traceevent, perf tools: Rename struct format{_field} to struct tep_format{_field} In order to make libtraceevent into a proper library, variables, data structures and functions require a unique prefix to prevent name space conflicts. That prefix will be "tep_". This renames struct format to struct tep_format and struct format_field to struct tep_format_field Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Cc: linux-trace-devel@vger.kernel.org Link: http://lkml.kernel.org/r/20180919185722.661319373@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-09-19 17:13:15 -03:00
Tzvetomir Stoyanov (VMware)	4963b0f88b	tools lib traceevent, perf tools: Rename struct event_format to struct tep_event_format In order to make libtraceevent into a proper library, variables, data structures and functions require a unique prefix to prevent name space conflicts. That prefix will be "tep_". This renames struct event_format to struct tep_event_format Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Cc: linux-trace-devel@vger.kernel.org Link: http://lkml.kernel.org/r/20180919185722.495820809@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-09-19 17:11:50 -03:00
Arnaldo Carvalho de Melo	b1a9e2535e	perf trace: Use the raw_syscalls:sys_enter for the augmented syscalls Now we combine what comes from the "bpf-output" event, i.e. what is added in the augmented_syscalls.c BPF program via the __augmented_syscalls__ BPF map, i.e. the payload we get with raw_syscalls:sys_enter tracepoints plus the pointer contents, right after that payload, with the raw_syscall:sys_exit also added, without augmentation, in the augmented_syscalls.c program. The end result is that for the hooked syscalls, we get strace like output with pointer expansion, something that wasn't possible before with just raw_syscalls:sys_enter + raw_syscalls:sys_exit. E.g.: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c ping -c 2 ::1 0.000 ( 0.008 ms): ping/19573 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) = 3 0.036 ( 0.006 ms): ping/19573 openat(dfd: CWD, filename: /lib64/libcap.so.2, flags: CLOEXEC) = 3 0.070 ( 0.004 ms): ping/19573 openat(dfd: CWD, filename: /lib64/libidn.so.11, flags: CLOEXEC) = 3 0.095 ( 0.004 ms): ping/19573 openat(dfd: CWD, filename: /lib64/libcrypto.so.1.1, flags: CLOEXEC) = 3 0.127 ( 0.004 ms): ping/19573 openat(dfd: CWD, filename: /lib64/libresolv.so.2, flags: CLOEXEC) = 3 0.156 ( 0.004 ms): ping/19573 openat(dfd: CWD, filename: /lib64/libm.so.6, flags: CLOEXEC) = 3 0.181 ( 0.004 ms): ping/19573 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) = 3 0.212 ( 0.004 ms): ping/19573 openat(dfd: CWD, filename: /lib64/libz.so.1, flags: CLOEXEC) = 3 0.242 ( 0.004 ms): ping/19573 openat(dfd: CWD, filename: /lib64/libdl.so.2, flags: CLOEXEC) = 3 0.266 ( 0.003 ms): ping/19573 openat(dfd: CWD, filename: /lib64/libpthread.so.0, flags: CLOEXEC) = 3 0.709 ( 0.006 ms): ping/19573 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3 PING ::1(::1) 56 data bytes 1.133 ( 0.011 ms): ping/19573 connect(fd: 5, uservaddr: { .family: INET6, port: 1025, addr: ::1 }, addrlen: 28) = 0 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.033 ms 1.234 ( 0.036 ms): ping/19573 sendto(fd: 4<socket:[1498931]>, buff: 0x555e5b975720, len: 64, addr: { .family: INET6, port: 58, addr: ::1 }, addr_len: 28) = 64 64 bytes from ::1: icmp_seq=2 ttl=64 time=0.120 ms --- ::1 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.033/0.076/0.120/0.044 ms 1002.060 ( 0.129 ms): ping/19573 sendto(fd: 4<socket:[1498931]>, buff: 0x555e5b975720, len: 64, flags: CONFIRM, addr: { .family: INET6, port: 58, addr: ::1 }, addr_len: 28) = 64 # # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c cat tools/perf/examples/bpf/hello.c #include <stdio.h> int syscall_enter(openat)(void *args) { puts("Hello, world\n"); return 0; } license(GPL); 0.000 ( 0.008 ms): cat/20054 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) = 3 0.020 ( 0.005 ms): cat/20054 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) = 3 0.176 ( 0.011 ms): cat/20054 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3 0.243 ( 0.006 ms): cat/20054 openat(dfd: CWD, filename: tools/perf/examples/bpf/hello.c) = 3 # Now to think how to hook on all syscalls, fallbacking to the non-augmented raw_syscalls:sys_enter payload. Probably the best way is to use a BPF_MAP_TYPE_PROG_ARRAY just like samples/bpf/tracex5_kern.c does. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-nlt60y69o26xi59z5vtpdrj5@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-09-04 16:51:12 -03:00
Arnaldo Carvalho de Melo	db2da3f85c	perf trace: Setup augmented_args in the raw_syscalls:sys_enter handler Without using something to augment the raw_syscalls:sys_enter tracepoint payload with the pointer contents, this will work just like before, i.e. the augmented_args arg will be NULL and the augmented_args_size will be 0. This just paves the way for the next cset where we will associate the trace__sys_enter tracepoint handler with the augmented "bpf-output" event named "__augmented_args__". Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-p8uvt2a6ug3uwlhja3cno4la@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-09-04 16:51:07 -03:00
Arnaldo Carvalho de Melo	8a041f86a8	perf trace: Introduce syscall__augmented_args() method That will be used by trace__sys_enter when we start combining the augmented syscalls:sys_enter_FOO + syscalls:sys_exit_FOO. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-iiseo3s0qbf9i3rzn8k597bv@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-09-03 16:07:53 -03:00
Benjamin Peterson	3de3e8bbf3	perf trace beauty: Alias 'umount' to 'umount2' Before: # perf trace -e mount umount /dev/mapper/fedora-home /s 11.576 ( 0.004 ms) umount/3138 umount2(arg0: 94501956754656, arg1: 0, arg2: 1, arg3: 140051050083104, arg4: 4, arg5: 94501956755136) = -1 EINVAL Invalid argument # After: # perf trace -e mount umount /s 0.000 ( 9.241 ms): umount/5251 umount2(name: 0x55f74a986480) = 0 Signed-off-by: Benjamin Peterson <benjamin@python.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180828035344.31500-1-benjamin@python.org [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-30 15:52:25 -03:00
Arnaldo Carvalho de Melo	c4191e55b8	perf trace: Show comm and tid for tracepoint events So that all events have that info, improving reading by having information better aligned, etc. Before: # echo 1 > /proc/sys/vm/drop_caches # perf trace -e block:,ext4:,tools/perf/examples/bpf/augmented_syscalls.c,close cat tools/perf/examples/bpf/hello.c 0.000 ( ): #include <stdio.h> int syscall_enter(openat)(void args) { puts("Hello, world\n"); return 0; } license(GPL); cat/2731 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) 0.025 ( ): syscalls:sys_exit_openat:0x3 0.063 ( 0.022 ms): cat/2731 close(fd: 3) = 0 0.110 ( ): cat/2731 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) 0.123 ( ): syscalls:sys_exit_openat:0x3 0.243 ( 0.008 ms): cat/2731 close(fd: 3) = 0 0.485 ( ): cat/2731 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) 0.500 ( ): syscalls:sys_exit_open:0x3 0.531 ( 0.017 ms): cat/2731 close(fd: 3) = 0 0.587 ( ): cat/2731 openat(dfd: CWD, filename: tools/perf/examples/bpf/hello.c) 0.601 ( ): syscalls:sys_exit_openat:0x3 0.631 ( ): ext4:ext4_es_lookup_extent_enter:dev 253,2 ino 1311399 lblk 0 0.639 ( ): ext4:ext4_es_lookup_extent_exit:dev 253,2 ino 1311399 found 1 [0/1) 5276651 W0x10 0.654 ( ): block:block_bio_queue:253,2 R 42213208 + 8 [cat] 0.663 ( ): block:block_bio_remap:8,0 R 58206040 + 8 <- (253,2) 42213208 0.671 ( ): block:block_bio_remap:8,0 R 175570776 + 8 <- (8,6) 58206040 0.678 ( ): block:block_bio_queue:8,0 R 175570776 + 8 [cat] 0.692 ( ): block:block_getrq:8,0 R 175570776 + 8 [cat] 0.700 ( ): block:block_plug:[cat] 0.708 ( ): block:block_rq_insert:8,0 R 4096 () 175570776 + 8 [cat] 0.713 ( ): block:block_unplug:[cat] 1 0.716 ( ): block:block_rq_issue:8,0 R 4096 () 175570776 + 8 [cat] 0.949 ( 0.007 ms): cat/2731 close(fd: 3) = 0 0.969 ( 0.006 ms): cat/2731 close(fd: 1) = 0 0.982 ( 0.006 ms): cat/2731 close(fd: 2) = 0 # After: # echo 1 > /proc/sys/vm/drop_caches # perf trace -e block:,ext4:,tools/perf/examples/bpf/augmented_syscalls.c,close cat tools/perf/examples/bpf/hello.c 0.000 ( ): cat/1380 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC)#include <stdio.h> int syscall_enter(openat)(void args) { puts("Hello, world\n"); return 0; } license(GPL); 0.024 ( ): cat/1380 syscalls:sys_exit_openat:0x3 0.063 ( 0.024 ms): cat/1380 close(fd: 3) = 0 0.114 ( ): cat/1380 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) 0.127 ( ): cat/1380 syscalls:sys_exit_openat:0x3 0.247 ( 0.009 ms): cat/1380 close(fd: 3) = 0 0.484 ( ): cat/1380 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) 0.499 ( ): cat/1380 syscalls:sys_exit_open:0x3 0.613 ( 0.010 ms): cat/1380 close(fd: 3) = 0 0.662 ( ): cat/1380 openat(dfd: CWD, filename: tools/perf/examples/bpf/hello.c) 0.678 ( ): cat/1380 syscalls:sys_exit_openat:0x3 0.712 ( ): cat/1380 ext4:ext4_es_lookup_extent_enter:dev 253,2 ino 1311399 lblk 0 0.721 ( ): cat/1380 ext4:ext4_es_lookup_extent_exit:dev 253,2 ino 1311399 found 1 [0/1) 5276651 W0x10 0.734 ( ): cat/1380 block:block_bio_queue:253,2 R 42213208 + 8 [cat] 0.745 ( ): cat/1380 block:block_bio_remap:8,0 R 58206040 + 8 <- (253,2) 42213208 0.754 ( ): cat/1380 block:block_bio_remap:8,0 R 175570776 + 8 <- (8,6) 58206040 0.761 ( ): cat/1380 block:block_bio_queue:8,0 R 175570776 + 8 [cat] 0.780 ( ): cat/1380 block:block_getrq:8,0 R 175570776 + 8 [cat] 0.791 ( ): cat/1380 block:block_plug:[cat] 0.802 ( ): cat/1380 block:block_rq_insert:8,0 R 4096 () 175570776 + 8 [cat] 0.806 ( ): cat/1380 block:block_unplug:[cat] 1 0.810 ( ): cat/1380 block:block_rq_issue:8,0 R 4096 () 175570776 + 8 [cat] 1.005 ( 0.011 ms): cat/1380 close(fd: 3) = 0 1.031 ( 0.008 ms): cat/1380 close(fd: 1) = 0 1.048 ( 0.008 ms): cat/1380 close(fd: 2) = 0 # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-us1mwsupxffs4jlm3uqm5dvj@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-30 15:52:21 -03:00
Arnaldo Carvalho de Melo	6ebb686225	perf trace augmented_syscalls: Augment sendto's 'addr' arg Its a 'struct sockaddr' pointer, augment it with the same beautifier as for 'connect' and 'bind', that all receive from userspace that pointer. Doing it in the other direction remains to be done, hooking at the syscalls:sys_exit_{accept4?,recvmsg} tracepoints somehow. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-k2eu68lsphnm2fthc32gq76c@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-30 15:52:20 -03:00
Arnaldo Carvalho de Melo	02ef288420	perf trace augmented_syscalls: Augment bind's 'myaddr' sockaddr arg One more, to reuse the augmented_sockaddr_syscall_enter() macro introduced from the augmentation of connect's sockaddr arg, also to get a subset of the struct arg augmentations done using the manual method, before switching to something automatic, using tracefs's format file or, even better, BTF containing the syscall args structs. # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c 0.000 sshd/11479 bind(fd: 3<socket:[170336]>, umyaddr: { .family: NETLINK }, addrlen: 12) 1.752 sshd/11479 bind(fd: 3<socket:[170336]>, umyaddr: { .family: INET, port: 22, addr: 0.0.0.0 }, addrlen: 16) 1.924 sshd/11479 bind(fd: 4<socket:[170338]>, umyaddr: { .family: INET6, port: 22, addr: :: }, addrlen: 28) ^C# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-a2drqpahpmc7uwb3n3gj2plu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-30 15:52:20 -03:00
Arnaldo Carvalho de Melo	d5a7e6613b	perf trace augmented_syscalls: Augment connect's 'sockaddr' arg As the first example of augmenting something other than a 'filename', augment the 'struct sockaddr' argument for the 'connect' syscall: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c ssh -6 fedorapeople.org 0.000 ssh/29669 connect(fd: 3, uservaddr: { .family: LOCAL, path: /var/run/nscd/socket }, addrlen: 110) 0.042 ssh/29669 connect(fd: 3, uservaddr: { .family: LOCAL, path: /var/run/nscd/socket }, addrlen: 110) 1.329 ssh/29669 connect(fd: 3, uservaddr: { .family: LOCAL, path: /var/run/nscd/socket }, addrlen: 110) 1.362 ssh/29669 connect(fd: 3, uservaddr: { .family: LOCAL, path: /var/run/nscd/socket }, addrlen: 110) 1.458 ssh/29669 connect(fd: 3, uservaddr: { .family: LOCAL, path: /var/run/nscd/socket }, addrlen: 110) 1.478 ssh/29669 connect(fd: 3, uservaddr: { .family: LOCAL, path: /var/run/nscd/socket }, addrlen: 110) 1.683 ssh/29669 connect(fd: 3<socket:[125942]>, uservaddr: { .family: INET, port: 53, addr: 192.168.43.1 }, addrlen: 16) 4.710 ssh/29669 connect(fd: 3<socket:[125942]>, uservaddr: { .family: INET6, port: 22, addr: 2610:28:3090:3001:5054:ff:fea7:9474 }, addrlen: 28) root@fedorapeople.org: Permission denied (publickey). # This is still just augmenting the syscalls:sys_enter_connect part, later we'll wire this up to augment the enter+exit combo, like in the tradicional 'perf trace' and 'strace' outputs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-s7l541cbiqb22ifio6z7dpf6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-30 15:52:20 -03:00
Arnaldo Carvalho de Melo	75d1e30681	perf trace: Use the augmented filename, expanding syscall enter pointers This is the final touch in showing how a syscall argument beautifier can access the augmented args put in place by the tools/perf/examples/bpf/augmented_syscalls.c eBPF script, right after the regular raw syscall args, i.e. the up to 6 long integer values in the syscall interface. With this we are able to show the 'openat' syscall arg, now with up to 64 bytes, but in time this will be configurable, just like with the 'strace -s strsize' argument, from 'strace''s man page: -s strsize Specify the maximum string size to print (the default is 32). This actually is the maximum string to _collect_ and store in the ring buffer, not just print. Before: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): cat/9658 openat(dfd: CWD, filename: 0x6626eda8, flags: CLOEXEC) 0.017 ( 0.007 ms): cat/9658 openat(dfd: CWD, filename: 0x6626eda8, flags: CLOEXEC) = 3 0.049 ( ): cat/9658 openat(dfd: CWD, filename: 0x66476ce0, flags: CLOEXEC) 0.051 ( 0.007 ms): cat/9658 openat(dfd: CWD, filename: 0x66476ce0, flags: CLOEXEC) = 3 0.377 ( ): cat/9658 openat(dfd: CWD, filename: 0x1e8f806b) 0.379 ( 0.005 ms): cat/9658 openat(dfd: CWD, filename: 0x1e8f806b) = 3 # After: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): cat/11966 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) 0.006 ( 0.006 ms): cat/11966 openat(dfd: CWD, filename: 0x4bfdcda8, flags: CLOEXEC) = 3 0.034 ( ): cat/11966 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) 0.036 ( 0.008 ms): cat/11966 openat(dfd: CWD, filename: 0x4c1e4ce0, flags: CLOEXEC) = 3 0.375 ( ): cat/11966 openat(dfd: CWD, filename: /etc/passwd) 0.377 ( 0.005 ms): cat/11966 openat(dfd: CWD, filename: 0xe87906b) = 3 # This cset should show all the aspects of establishing a protocol between an eBPF syscall arg augmenter program, tools/perf/examples/bpf/augmented_syscalls.c and a 'perf trace' beautifier, the one associated with all 'char ' point syscall args with names that can heuristically be associated with filenames. Now to wire up 'open' to show a second syscall using this scheme, all we have to do now is to change tools/perf/examples/bpf/augmented_syscalls.c, as 'perf trace' will notice that the perf_sample.raw_size is more than what is expected for a particular syscall payload as defined by its tracefs format file and will then use the augmented payload in the 'filename' syscall arg beautifier. The same protocol will be used for structs such as 'struct sockaddr ', 'struct pollfd', etc, with additions for handling arrays. This will all be done under the hood when 'perf trace' realizes the system has the necessary components, and also can be done by providing a precompiled augmented_syscalls.c eBPF ELF object. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-gj9kqb61wo7m3shtpzercbcr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-30 15:52:19 -03:00
Arnaldo Carvalho de Melo	c96f4edcc3	perf trace: Show comm/tid for augmented_syscalls To get us a bit more like the sys_enter + sys_exit combo: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): openat(dfd: CWD, filename: 0x31b6dda8, flags: CLOEXEC) 0.009 ( 0.009 ms): cat/3619 openat(dfd: CWD, filename: 0x31b6dda8, flags: CLOEXEC) = 3 0.051 ( ): openat(dfd: CWD, filename: 0x31d75ce0, flags: CLOEXEC) 0.054 ( 0.010 ms): cat/3619 openat(dfd: CWD, filename: 0x31d75ce0, flags: CLOEXEC) = 3 0.539 ( ): openat(dfd: CWD, filename: 0xca71506b) 0.543 ( 0.115 ms): cat/3619 openat(dfd: CWD, filename: 0xca71506b) = 3 # After: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): cat/4919 openat(dfd: CWD, filename: 0xc8358da8, flags: CLOEXEC) 0.007 ( 0.005 ms): cat/4919 openat(dfd: CWD, filename: 0xc8358da8, flags: CLOEXEC) = 3 0.032 ( ): cat/4919 openat(dfd: CWD, filename: 0xc8560ce0, flags: CLOEXEC) 0.033 ( 0.006 ms): cat/4919 openat(dfd: CWD, filename: 0xc8560ce0, flags: CLOEXEC) = 3 0.301 ( ): cat/4919 openat(dfd: CWD, filename: 0x91fa306b) 0.304 ( 0.004 ms): cat/4919 openat(dfd: CWD, filename: 0x91fa306b) = 3 # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-6w8ytyo5y655a1hsyfpfily6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-30 15:52:19 -03:00
Arnaldo Carvalho de Melo	6dcbd212ff	perf trace: Extract the comm/tid printing for syscall enter Will be used with augmented syscalls, where we haven't transitioned completely to combining sys_enter_FOO with sys_exit_FOO, so we'll go as far as having it similar to the end result, strace like, as possible. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-canomaoiybkswwnhj69u9ae4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-30 15:52:19 -03:00
Arnaldo Carvalho de Melo	1cdf618f23	perf trace: Print the syscall name for augmented_syscalls Since we copy all the payload for raw_syscalls:sys_enter plus add expanded pointers, we can use the syscall id to get its name, etc: # grep 'field:.* id' /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/format field:long id; offset:8; size:8; signed:1; # Before: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): __augmented_syscalls__:dfd: CWD, filename: 0xec9f9da8, flags: CLOEXEC 0.006 ( 0.006 ms): cat/2395 openat(dfd: CWD, filename: 0xec9f9da8, flags: CLOEXEC) = 3 0.041 ( ): __augmented_syscalls__:dfd: CWD, filename: 0xecc01ce0, flags: CLOEXEC 0.042 ( 0.007 ms): cat/2395 openat(dfd: CWD, filename: 0xecc01ce0, flags: CLOEXEC) = 3 0.376 ( ): __augmented_syscalls__:dfd: CWD, filename: 0xac0a806b 0.379 ( 0.006 ms): cat/2395 openat(dfd: CWD, filename: 0xac0a806b) = 3 # After: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): openat(dfd: CWD, filename: 0x31b6dda8, flags: CLOEXEC) 0.009 ( 0.009 ms): cat/3619 openat(dfd: CWD, filename: 0x31b6dda8, flags: CLOEXEC) = 3 0.051 ( ): openat(dfd: CWD, filename: 0x31d75ce0, flags: CLOEXEC) 0.054 ( 0.010 ms): cat/3619 openat(dfd: CWD, filename: 0x31d75ce0, flags: CLOEXEC) = 3 0.539 ( ): openat(dfd: CWD, filename: 0xca71506b) 0.543 ( 0.115 ms): cat/3619 openat(dfd: CWD, filename: 0xca71506b) = 3 # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-epz6y9i0eavmerc5ha98t7gn@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-30 15:52:19 -03:00
Arnaldo Carvalho de Melo	7a983a0fe2	perf trace: Pass augmented args to the arg formatters when available If the tracepoint payload is bigger than what a syscall expected from what is in its format file in tracefs, then that will be used as augmented args, i.e. the expansion of syscall arg pointers, with things like a filename, structs, etc. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-bsbqx7xi2ot4q9bf570f7tqs@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-30 15:52:18 -03:00
Arnaldo Carvalho de Melo	88cf7084f9	perf trace: Wire up the augmented syscalls with the syscalls:sys_enter_FOO beautifier We just check that the evsel is the one we associated with the bpf-output event associated with the "__augmented_syscalls__" eBPF map, to show that the formatting is done properly: # perf trace -e perf/tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): __augmented_syscalls__:dfd: CWD, filename: 0x43e06da8, flags: CLOEXEC 0.006 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x43e06da8, flags: CLOEXEC 0.007 ( 0.004 ms): cat/11486 openat(dfd: CWD, filename: 0x43e06da8, flags: CLOEXEC ) = 3 0.029 ( ): __augmented_syscalls__:dfd: CWD, filename: 0x4400ece0, flags: CLOEXEC 0.030 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x4400ece0, flags: CLOEXEC 0.031 ( 0.004 ms): cat/11486 openat(dfd: CWD, filename: 0x4400ece0, flags: CLOEXEC ) = 3 0.249 ( ): __augmented_syscalls__:dfd: CWD, filename: 0xc3700d6 0.250 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0xc3700d6 0.252 ( 0.003 ms): cat/11486 openat(dfd: CWD, filename: 0xc3700d6 ) = 3 # Now we just need to get the full blown enter/exit handlers to check if the evsel being processed is the augmented_syscalls one to go pick the pointer payloads from the end of the payload. We also need to state somehow what is the layout for multi pointer arg syscalls. Also handy would be to have a BTF file with the struct definitions used in syscalls, compact, generated at kernel built time and available for use in eBPF programs. Till we get there we can go on doing some manual coupling of the most relevant syscalls with some hand built beautifiers. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-r6ba5izrml82nwfmwcp7jpkm@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-08 15:55:59 -03:00
Arnaldo Carvalho de Melo	d3d1c4bdf5	perf trace: Setup the augmented syscalls bpf-output event fields The payload that is put in place by the eBPF script attached to syscalls:sys_enter_openat (and other syscalls with pointers, in the future) can be consumed by the existing sys_enter beautifiers if evsel->priv is setup with a struct syscall_tp with struct tp_fields for the 'syscall_id' and 'args' fields expected by the beautifiers, this patch does just that. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-xfjyog8oveg2fjys9r1yy1es@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-08 15:55:58 -03:00
Arnaldo Carvalho de Melo	78e890ea86	perf bpf: Make bpf__setup_output_event() return the bpf-output event We're calling it to setup that event, and we'll need it later to decide if the bpf-output event we're handling is the one setup for a specific purpose, return it using ERR_PTR, etc. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-zhachv7il2n1lopt9aonwhu7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-08 15:55:58 -03:00
Arnaldo Carvalho de Melo	e0b6d2ef32	perf trace: Handle "bpf-output" events associated with "__augmented_syscalls__" BPF map Add an example BPF script that writes syscalls:sys_enter_openat raw tracepoint payloads augmented with the first 64 bytes of the "filename" syscall pointer arg. Then catch it and print it just like with things written to the "__bpf_stdout__" map associated with a PERF_COUNT_SW_BPF_OUTPUT software event, by just letting the default tracepoint handler in 'perf trace', trace__event_handler(), to use bpf_output__fprintf(trace, sample), just like it does with all other PERF_COUNT_SW_BPF_OUTPUT events, i.e. just do a dump on the payload, so that we can check if what is being printed has at least the first 64 bytes of the "filename" arg: The augmented_syscalls.c eBPF script: # cat tools/perf/examples/bpf/augmented_syscalls.c // SPDX-License-Identifier: GPL-2.0 #include <stdio.h> struct bpf_map SEC("maps") __augmented_syscalls__ = { .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, .key_size = sizeof(int), .value_size = sizeof(u32), .max_entries = __NR_CPUS__, }; struct syscall_enter_openat_args { unsigned long long common_tp_fields; long syscall_nr; long dfd; char filename_ptr; long flags; long mode; }; struct augmented_enter_openat_args { struct syscall_enter_openat_args args; char filename[64]; }; int syscall_enter(openat)(struct syscall_enter_openat_args args) { struct augmented_enter_openat_args augmented_args; probe_read(&augmented_args.args, sizeof(augmented_args.args), args); probe_read_str(&augmented_args.filename, sizeof(augmented_args.filename), args->filename_ptr); perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, sizeof(augmented_args)); return 1; } license(GPL); # So it will just prepare a raw_syscalls:sys_enter payload for the "openat" syscall. This will eventually be done for all syscalls with pointer args, globally or just when the user asks, using some spec, which args of which syscalls it wants "expanded" this way, we'll probably start with just all the syscalls that have char * pointers with familiar names, the ones we already handle with the probe:vfs_getname kprobe if it is in place hooking the kernel getname_flags() function used to copy from user the paths. Running it we get: # perf trace -e perf/tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): __augmented_syscalls__:X?.C......................`\..................../etc/ld.so.cache..#......,....ao.k...............k......1."......... 0.006 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC 0.008 ( 0.005 ms): cat/31292 openat(dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC ) = 3 0.036 ( ): __augmented_syscalls__:X?.C.......................\..................../lib64/libc.so.6......... .\....#........?.......=.C..../."......... 0.037 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC 0.039 ( 0.007 ms): cat/31292 openat(dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC ) = 3 0.323 ( ): __augmented_syscalls__:X?.C.....................P....................../etc/passwd......>.C....@................>.C.....,....ao.>.C........ 0.325 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0xe8be50d6 0.327 ( 0.004 ms): cat/31292 openat(dfd: CWD, filename: 0xe8be50d6 ) = 3 # We need to go on optimizing this to avoid seding trash or zeroes in the pointer content payload, using the return from bpf_probe_read_str(), but to keep things simple at this stage and make incremental progress, lets leave it at that for now. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-g360n1zbj6bkbk6q0qo11c28@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-08 15:55:57 -03:00
Arnaldo Carvalho de Melo	f3acd8869b	perf trace: Use perf_evsel__sc_tp_{uint,ptr} for "id"/"args" handling syscalls:* events Now it looks just about the same as for the trace__sys_{enter,exit}. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-y59may7zx1eccnp4m3qm4u0b@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-02 15:39:00 -03:00
Arnaldo Carvalho de Melo	d32855fa35	perf trace: Setup struct syscall_tp for syscalls:sys_{enter,exit}_NAME events Mapping "__syscall_nr" to "id" and setting up "args" from the offset of "__syscall_nr" + sizeof(u64), as the payload for syscalls:* is the same as for raw_syscalls:*, just the fields have different names. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-ogeenrpviwcpwl3oy1l55f3m@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-02 15:38:57 -03:00
Arnaldo Carvalho de Melo	aa823f58f7	perf trace: Allow setting up a syscall_tp struct without a format_field To avoid having to ask libtraceevent to find a field by name when handling each tracepoint event, we setup a struct syscall_tp with a tp_field struct having an extractor function + the offset for the "id", "args" and "ret" raw_syscalls:sys_{enter,exit} tracepoints. Now that we want to do the same with syscalls:sys_{entry,exit}_NAME individual syscall tracepoints, where we have "id" as "__syscall_nr" and "args" as the actual series of per syscall parameters, we need more flexibility from the routines that set up these pre-looked up syscall tracepoint arg fields. The next cset will use it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-v59q5e0jrlzkpl9a1c7t81ni@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-02 15:07:33 -03:00
Arnaldo Carvalho de Melo	63f11c80e5	perf trace: Rename some syscall_tp methods to raw_syscall Because raw_syscalls have the field for the syscall number as 'id' while the syscalls:sys_{enter,exit}_NAME have it as __syscall_nr... Since we want to support both for being able to enable just a syscalls:sys_{enter,exit}_name instead of asking for raw_syscalls:sys_{enter,exit} plus filters, make the method names for each kind of tracepoint more explicit. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-4rixbfzco6tsry0w9ghx3ktb@git.kernel.org Signef-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-02 15:07:28 -03:00
Arnaldo Carvalho de Melo	a98392bb1e	perf trace: Use beautifiers on syscalls:sys_enter_ handlers We were using the beautifiers only when processing the raw_syscalls:sys_enter events, but we can as well use them for the syscalls:sys_enter_NAME events, as the layout is the same. Some more tweaking is needed as we're processing them straight away, i.e. there is no buffering in the sys_enter_NAME event to wait for things like vfs_getname to provide pointer contents and then flushing at sys_exit_NAME, so we need to state in the syscall_arg that this is unbuffered, just print the pointer values, beautifying just non-pointer syscall args. This just shows an alternative way of processing tracepoints, that we will end up using when creating "tracepoint" payloads that already copy pointer contents (or chunks of it, i.e. not the whole filename, but just the end of it, not all the bf for a read/write, but just the start, etc), directly in the kernel using eBPF. E.g.: # perf trace -e syscalls:entersleep,sleep sleep 1 0.303 ( ): syscalls:sys_enter_nanosleep:rqtp: 0x7ffc93d5ecc0 0.305 (1000.229 ms): sleep/8746 nanosleep(rqtp: 0x7ffc93d5ecc0) = 0 # perf trace -e syscalls:_sleep,sleep sleep 1 0.288 ( ): syscalls:sys_enter_nanosleep:rqtp: 0x7ffecde87e40 0.289 ( ): sleep/8748 nanosleep(rqtp: 0x7ffecde87e40) ... 1000.479 ( ): syscalls:sys_exit_nanosleep:0x0 0.289 (1000.208 ms): sleep/8748 ... [continued]: nanosleep()) = 0 # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-jehyd2zwhw00z3p7v7mg9632@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-02 15:07:19 -03:00
Arnaldo Carvalho de Melo	6a648b534d	perf trace: Associate vfs_getname()'ed pathname with fd returned from 'openat' When the vfs_getname() wannabe tracepoint is in place: # perf probe -l probe:vfs_getname (on getname_flags:73@acme/git/linux/fs/namei.c with pathname) # 'perf trace' will use it to get the pathname when it is copied from userspace to the kernel, right after syscalls:sys_enter_open, copied in the 'probe:vfs_getname', stash it somewhere and then, at syscalls:sys_exit_open time, if the 'open' return is not -1, i.e. a successfull open syscall, associate that pathname to this return, i.e. the fd. We were not doing this for the 'openat' syscall, which would cause 'perf trace' to fallback to using /proc to get the fd, change it so that we use what we got from probe:vfs_getname, reducing the 'openat' beautification process cost, ditching the syscalls performed to read procfs state and avoiding some possible races in the process. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-xnp44ao3bkb6ejeczxfnjwsh@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-02 10:30:08 -03:00
Arnaldo Carvalho de Melo	b912885ab7	perf trace: Do not require --no-syscalls to suppress strace like output So far the --syscalls option was the default, requiring explicit --no-syscalls when wanting to process just some other event, invert that and assume it only when no other event was specified, allowing its explicit enablement when wanting to see all syscalls together with some other event: E.g: The existing default is maintained for a single workload: # perf trace sleep 1 <SNIP> 0.264 ( 0.003 ms): sleep/12762 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) = 0x7f62cbf04000 0.271 ( 0.001 ms): sleep/12762 close(fd: 3) = 0 0.295 (1000.130 ms): sleep/12762 nanosleep(rqtp: 0x7ffd15194fd0) = 0 1000.469 ( 0.006 ms): sleep/12762 close(fd: 1) = 0 1000.480 ( 0.004 ms): sleep/12762 close(fd: 2) = 0 1000.502 ( ): sleep/12762 exit_group() # For a pid: # pidof ssh 7826 3961 3226 2628 2493 # perf trace -p 3961 ? ( ): ... [continued]: select()) = 1 0.023 ( 0.005 ms): clock_gettime(which_clock: BOOTTIME, tp: 0x7ffcc8fce870 ) = 0 0.036 ( 0.009 ms): read(fd: 5</dev/pts/7>, buf: 0x7ffcc8fca7b0, count: 16384 ) = 3 0.060 ( 0.004 ms): getpid( ) = 3961 (ssh) 0.079 ( 0.004 ms): clock_gettime(which_clock: BOOTTIME, tp: 0x7ffcc8fce8e0 ) = 0 0.088 ( 0.003 ms): clock_gettime(which_clock: BOOTTIME, tp: 0x7ffcc8fce7c0 ) = 0 <SNIP> For system wide, threads, cgroups, user, etc when no event is specified, the existing behaviour is maintained, i.e. --syscalls is selected. When some event is specified, then --no-syscalls doesn't need to be specified: # perf trace -e tcp:tcp_probe ssh localhost 0.000 tcp:tcp_probe:src=[::1]:22 dest=[::1]:39074 mark=0 length=53 snd_nxt=0xb67ce8f7 snd_una=0xb67ce8f7 snd_cwnd=10 ssthresh=2147483647 snd_wnd=43776 srtt=18 rcv_wnd=43690 0.010 tcp:tcp_probe:src=[::1]:39074 dest=[::1]:22 mark=0 length=32 snd_nxt=0xa8f9ef38 snd_una=0xa8f9ef23 snd_cwnd=10 ssthresh=2147483647 snd_wnd=43690 srtt=31 rcv_wnd=43776 4.525 tcp:tcp_probe:src=[::1]:22 dest=[::1]:39074 mark=0 length=1240 snd_nxt=0xb67ce90c snd_una=0xb67ce90c snd_cwnd=10 ssthresh=2147483647 snd_wnd=43776 srtt=18 rcv_wnd=43776 7.242 tcp:tcp_probe:src=[::1]:22 dest=[::1]:39074 mark=0 length=80 snd_nxt=0xb67ced44 snd_una=0xb67ce90c snd_cwnd=10 ssthresh=2147483647 snd_wnd=43776 srtt=18 rcv_wnd=174720 The authenticity of host 'localhost (::1)' can't be established. ECDSA key fingerprint is SHA256:TKZS58923458203490asekfjaklskljmkjfgPMBfHzY. ECDSA key fingerprint is MD5:d8:29:54:40:71:fa:b8:44:89:52:64:8a:35:42:d0:e8. Are you sure you want to continue connecting (yes/no)? ^C # To get the previous behaviour just use --syscalls and get all syscalls formatted strace like + the specified extra events: # trace -e sched:switch --syscalls sleep 1 <SNIP> 0.160 ( 0.003 ms): sleep/12877 mprotect(start: 0x7fdfe2361000, len: 4096, prot: READ) = 0 0.164 ( 0.009 ms): sleep/12877 munmap(addr: 0x7fdfe2345000, len: 113155) = 0 0.211 ( 0.001 ms): sleep/12877 brk() = 0x55d3ce68e000 0.212 ( 0.002 ms): sleep/12877 brk(brk: 0x55d3ce6af000) = 0x55d3ce6af000 0.215 ( 0.001 ms): sleep/12877 brk() = 0x55d3ce6af000 0.219 ( 0.004 ms): sleep/12877 open(filename: 0xe1f07c00, flags: CLOEXEC) = 3 0.225 ( 0.001 ms): sleep/12877 fstat(fd: 3, statbuf: 0x7fdfe2138aa0) = 0 0.227 ( 0.003 ms): sleep/12877 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) = 0x7fdfdb1b8000 0.234 ( 0.001 ms): sleep/12877 close(fd: 3) = 0 0.257 ( ): sleep/12877 nanosleep(rqtp: 0x7fffb36b6020) ... 0.260 ( ): sched:sched_switch:prev_comm=sleep prev_pid=12877 prev_prio=120 prev_state=D ==> next_comm=swapper/3 next_pid=0 next_prio=120 0.257 (1000.134 ms): sleep/12877 ... [continued]: nanosleep()) = 0 1000.428 ( 0.006 ms): sleep/12877 close(fd: 1) = 0 1000.440 ( 0.004 ms): sleep/12877 close(fd: 2) = 0 1000.461 ( ): sleep/12877 exit_group() # When specifiying just some syscalls, the behaviour doesn't change, i.e.: # trace -e nanosleep -e sched:switch sleep 1 0.000 ( ): sleep/14974 nanosleep(rqtp: 0x7ffc344ba9c0 ) ... 0.007 ( ): sched:sched_switch:prev_comm=sleep prev_pid=14974 prev_prio=120 prev_state=D ==> next_comm=swapper/2 next_pid=0 next_prio=120 0.000 (1000.139 ms): sleep/14974 ... [continued]: nanosleep()) = 0 # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-om2fulll97ytnxv40ler8jkf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-08-01 16:20:28 -03:00
Arnaldo Carvalho de Melo	162d3edbe5	perf trace: Beautify the AF_INET & AF_INET6 'socket' syscall 'protocol' args For instance: $ trace -e socket* ssh sandy 0.000 ( 0.031 ms): ssh/19919 socket(family: LOCAL, type: STREAM\|CLOEXEC\|NONBLOCK ) = 3 0.052 ( 0.015 ms): ssh/19919 socket(family: LOCAL, type: STREAM\|CLOEXEC\|NONBLOCK ) = 3 1.568 ( 0.020 ms): ssh/19919 socket(family: LOCAL, type: STREAM\|CLOEXEC\|NONBLOCK ) = 3 1.603 ( 0.012 ms): ssh/19919 socket(family: LOCAL, type: STREAM\|CLOEXEC\|NONBLOCK ) = 3 1.699 ( 0.014 ms): ssh/19919 socket(family: LOCAL, type: STREAM\|CLOEXEC\|NONBLOCK ) = 3 1.724 ( 0.012 ms): ssh/19919 socket(family: LOCAL, type: STREAM\|CLOEXEC\|NONBLOCK ) = 3 1.804 ( 0.020 ms): ssh/19919 socket(family: INET, type: STREAM, protocol: TCP ) = 3 17.549 ( 0.098 ms): ssh/19919 socket(family: LOCAL, type: STREAM ) = 4 acme@sandy's password: Just like with other syscall args, the common bits are supressed so that the output is more compact, i.e. we use "TCP" instead of "IPPROTO_TCP", but we can make this show the original constant names if we like it by using some command line knob or ~/.perfconfig "[trace]" section variable. Also needed is to make perf's event parser accept things like: $ perf trace -e socket*/protocol=TCP/ By using both the tracefs event 'format' files and these tables built from the kernel sources. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-l39jz1vnyda0b6jsufuc8bz7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-07-31 10:52:49 -03:00
Arnaldo Carvalho de Melo	bc972ada4f	perf trace beauty: Do not print NULL strarray entries We may have string tables where not all slots have values, in those cases its better to print the numeric value, for instance: In the table below we would show "protocol: (null)" for socket_ipproto[3] Where it would be better to show "protocol: 3". $ tools/perf/trace/beauty/socket_ipproto.sh static const char *socket_ipproto[] = { [0] = "IP", [103] = "PIM", [108] = "COMP", [12] = "PUP", [132] = "SCTP", [136] = "UDPLITE", [137] = "MPLS", [17] = "UDP", [1] = "ICMP", [22] = "IDP", [255] = "RAW", [29] = "TP", [2] = "IGMP", [33] = "DCCP", [41] = "IPV6", [46] = "RSVP", [47] = "GRE", [4] = "IPIP", [50] = "ESP", [51] = "AH", [6] = "TCP", [8] = "EGP", [92] = "MTP", [94] = "BEETPH", [98] = "ENCAP", }; $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-7djfak94eb3b9ltr79cpn3ti@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-07-31 10:52:46 -03:00
Arnaldo Carvalho de Melo	27de9b2bd9	perf evsel: Add has_callchain() helper to make code more compact/clear Its common to have the (evsel->attr.sample_type & PERF_SAMPLE_CALLCHAIN), so add an evsel__has_callchain(evsel) helper. This will actually get more uses as we check that instead of symbol_conf.use_callchain in places where that produces the same result but makes this decision to be more fine grained, per evsel. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-145340oytbthatpfeaq1do18@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-06-05 10:09:54 -03:00
Arnaldo Carvalho de Melo	029c75e5cf	perf tools: No need to unconditionally read the max_stack sysctls Let tools that need to have those variables with the sysctl current values use a function that will read them. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-1ljj3oeo5kpt2n1icfd9vowe@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-05-17 16:31:32 -03:00
Arnaldo Carvalho de Melo	117d3c2474	perf thread: Ditch __thread__find_symbol() Simulate having all symbols in just one tree by searching the still existing two trees. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-uss70e8tvzzbzs326330t83q@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-04-26 13:47:17 -03:00
Arnaldo Carvalho de Melo	4546263d72	perf thread: Introduce thread__find_symbol() Out of thread__find_addr_location(..., MAP__FUNCTION, ...), idea here is to continue removing references to MAP__{FUNCTION,VARIABLE} ahead of getting both types of symbols in the same rbtree, as various places do two lookups, looking first at MAP__FUNCTION, then at MAP__VARIABLE. So thread__find_symbol() will eventually do just that, and 'struct symbol' will have the symbol type, for code that cares about that. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-n7528en9e08yd3flzmb26tth@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-04-26 13:47:07 -03:00
Changbin Du	51125a29a3	perf trace: Remove redundant ')' There is a redundant ')' at the tail of each event. So remove it. $ sudo perf trace --no-syscalls -e 'kmem:*' -a 899.342 kmem:kfree:(vfs_writev+0xb9) call_site=ffffffff9c453979 ptr=(nil)) 899.344 kmem:kfree:(___sys_recvmsg+0x188) call_site=ffffffff9c9b8b88 ptr=(nil)) Signed-off-by: Changbin Du <changbin.du@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1520937601-24952-1-git-send-email-changbin.du@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-04-03 16:16:41 -03:00
Arnaldo Carvalho de Melo	0a6545bda2	perf trace: Show only failing syscalls For instance: # perf probe "vfs_getname=getname_flags:72 pathname=result->name:string" Added new event: probe:vfs_getname (on getname_flags:72 with pathname=result->name:string) You can now use it in all perf tools, such as: perf record -e probe:vfs_getname -aR sleep 1 # perf trace --failure sleep 1 0.043 ( 0.010 ms): sleep/10978 access(filename: /etc/ld.so.preload, mode: R) = -1 ENOENT No such file or directory For reference, here are all the syscalls in this case: # perf trace sleep 1 ? ( ): sleep/10976 ... [continued]: execve()) = 0 0.027 ( 0.001 ms): sleep/10976 brk() = 0x55bdc2d04000 0.044 ( 0.010 ms): sleep/10976 access(filename: /etc/ld.so.preload, mode: R) = -1 ENOENT No such file or directory 0.057 ( 0.006 ms): sleep/10976 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) = 3 0.064 ( 0.002 ms): sleep/10976 fstat(fd: 3, statbuf: 0x7fffac22b370) = 0 0.067 ( 0.003 ms): sleep/10976 mmap(len: 111457, prot: READ, flags: PRIVATE, fd: 3) = 0x7feec8615000 0.071 ( 0.001 ms): sleep/10976 close(fd: 3) = 0 0.080 ( 0.007 ms): sleep/10976 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) = 3 0.088 ( 0.002 ms): sleep/10976 read(fd: 3, buf: 0x7fffac22b538, count: 832) = 832 0.092 ( 0.001 ms): sleep/10976 fstat(fd: 3, statbuf: 0x7fffac22b3d0) = 0 0.094 ( 0.002 ms): sleep/10976 mmap(len: 8192, prot: READ\|WRITE, flags: PRIVATE\|ANONYMOUS) = 0x7feec8613000 0.099 ( 0.004 ms): sleep/10976 mmap(len: 3889792, prot: EXEC\|READ, flags: PRIVATE\|DENYWRITE, fd: 3) = 0x7feec8057000 0.104 ( 0.007 ms): sleep/10976 mprotect(start: 0x7feec8203000, len: 2097152) = 0 0.112 ( 0.005 ms): sleep/10976 mmap(addr: 0x7feec8403000, len: 24576, prot: READ\|WRITE, flags: PRIVATE\|DENYWRITE\|FIXED, fd: 3, off: 1753088) = 0x7feec8403000 0.120 ( 0.003 ms): sleep/10976 mmap(addr: 0x7feec8409000, len: 14976, prot: READ\|WRITE, flags: PRIVATE\|ANONYMOUS\|FIXED) = 0x7feec8409000 0.128 ( 0.001 ms): sleep/10976 close(fd: 3) = 0 0.139 ( 0.001 ms): sleep/10976 arch_prctl(option: 4098, arg2: 140663540761856) = 0 0.186 ( 0.004 ms): sleep/10976 mprotect(start: 0x7feec8403000, len: 16384, prot: READ) = 0 0.204 ( 0.003 ms): sleep/10976 mprotect(start: 0x55bdc0ec3000, len: 4096, prot: READ) = 0 0.209 ( 0.004 ms): sleep/10976 mprotect(start: 0x7feec8631000, len: 4096, prot: READ) = 0 0.214 ( 0.010 ms): sleep/10976 munmap(addr: 0x7feec8615000, len: 111457) = 0 0.269 ( 0.001 ms): sleep/10976 brk() = 0x55bdc2d04000 0.271 ( 0.002 ms): sleep/10976 brk(brk: 0x55bdc2d25000) = 0x55bdc2d25000 0.274 ( 0.001 ms): sleep/10976 brk() = 0x55bdc2d25000 0.278 ( 0.007 ms): sleep/10976 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3 0.288 ( 0.001 ms): sleep/10976 fstat(fd: 3</usr/lib/locale/locale-archive>, statbuf: 0x7feec8408aa0) = 0 0.290 ( 0.003 ms): sleep/10976 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) = 0x7feec1488000 0.297 ( 0.001 ms): sleep/10976 close(fd: 3</usr/lib/locale/locale-archive>) = 0 0.325 (1000.193 ms): sleep/10976 nanosleep(rqtp: 0x7fffac22c0b0) = 0 1000.560 ( 0.006 ms): sleep/10976 close(fd: 1) = 0 1000.573 ( 0.005 ms): sleep/10976 close(fd: 2) = 0 1000.596 ( ): sleep/10976 exit_group() # And can be done systemwide, etc, with backtraces: # perf trace --max-stack=16 --failure sleep 1 0.048 ( 0.015 ms): sleep/11092 access(filename: /etc/ld.so.preload, mode: R) = -1 ENOENT No such file or directory __access (inlined) dl_main (/usr/lib64/ld-2.26.so) # Or for some specific syscalls: # perf trace --max-stack=16 -e openat --failure cat /tmp/rien cat: /tmp/rien: No such file or directory 0.251 ( 0.012 ms): cat/11106 openat(dfd: CWD, filename: /tmp/rien) = -1 ENOENT No such file or directory __libc_open64 (inlined) main (/usr/bin/cat) __libc_start_main (/usr/lib64/libc-2.26.so) _start (/usr/bin/cat) # Look for inotify* syscalls that fail, system wide, for 2 seconds, with backtraces: # perf trace -a --max-stack=16 --failure -e inotify* sleep 2 819.165 ( 0.058 ms): gmain/1724 inotify_add_watch(fd: 8<anon_inode:inotify>, pathname: /home/acme/~, mask: 16789454) = -1 ENOENT No such file or directory __GI_inotify_add_watch (inlined) _ik_watch (/usr/lib64/libgio-2.0.so.0.5400.3) _ip_start_watching (/usr/lib64/libgio-2.0.so.0.5400.3) im_scan_missing (/usr/lib64/libgio-2.0.so.0.5400.3) g_timeout_dispatch (/usr/lib64/libglib-2.0.so.0.5400.3) g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0.5400.3) g_main_context_iterate.isra.23 (/usr/lib64/libglib-2.0.so.0.5400.3) g_main_context_iteration (/usr/lib64/libglib-2.0.so.0.5400.3) glib_worker_main (/usr/lib64/libglib-2.0.so.0.5400.3) g_thread_proxy (/usr/lib64/libglib-2.0.so.0.5400.3) start_thread (/usr/lib64/libpthread-2.26.so) __GI___clone (inlined) # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-8f7d3mngaxvi7tlzloz3n7cs@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-04-02 07:57:37 -03:00
Kan Liang	b9bae2c841	perf mmap: Simplify perf_mmap__read_init() It isn't necessary to pass the 'start', 'end' and 'overwrite' arguments to perf_mmap__read_init(). The data is stored in the struct perf_mmap. Discard the parameters. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1520350567-80082-8-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-03-08 10:05:53 -03:00
Kan Liang	0019dc87b9	perf mmap: Simplify perf_mmap__read_event() It isn't necessary to pass the 'overwrite', 'start' and 'end' argument to perf_mmap__read_event(). Discard them. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1520350567-80082-7-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-03-08 10:05:53 -03:00
Kan Liang	d6ace3df43	perf mmap: Simplify perf_mmap__consume() It isn't necessary to pass the 'overwrite' argument to perf_mmap__consume(). Discard it. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1520350567-80082-6-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-03-08 10:05:52 -03:00
Arnaldo Carvalho de Melo	9ea42ba441	perf trace: Support setting cgroups as targets One can set a cgroup as a default cgroup to be used by all events or set cgroups with the 'perf stat' and 'perf record' behaviour, i.e. '-G A' will be the cgroup for events defined so far in the command line. Here in my main machine, with a kvm instance running a rhel6 guinea pig I have: # ls -la /sys/fs/cgroup/perf_event/ \| grep drw drwxr-xr-x. 14 root root 360 Mar 6 12:04 .. drwxr-xr-x. 3 root root 0 Mar 6 15:05 machine.slice # So I can go ahead and use that cgroup hierarchy, say lets see what syscalls are being emitted by threads in that 'machine.slice' hierarchy that are taking more than 100ms: # perf trace --duration 100 -G machine.slice 0.188 (249.850 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0 250.274 (249.743 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0 500.224 (249.755 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0 750.097 (249.934 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0 1000.244 (249.780 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0 1250.197 (249.796 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0 1500.124 (249.859 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0 1750.076 (172.900 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0 902.570 (1021.116 ms): qemu-system-x8/23667 ppoll(ufds: 0x558151e03180, nfds: 74, tsp: 0x7ffc00cd0900, sigsetsize: 8) = 1 1923.825 (305.133 ms): qemu-system-x8/23667 ppoll(ufds: 0x558151e03180, nfds: 74, tsp: 0x7ffc00cd0900, sigsetsize: 8) = 1 2000.172 (229.002 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0 ^C # If we look inside that cgroup hierarchy we get: # ls -la /sys/fs/cgroup/perf_event/machine.slice/ \| grep drw drwxr-xr-x. 3 root root 0 Mar 6 15:05 . drwxr-xr-x. 2 root root 0 Mar 6 16:16 machine-qemu\x2d2\x2drhel6.sandy.scope # There is just one, but lets say there were more and we would want to see 5 seconds worth of syscall summary for the threads in that cgroup: # perf trace --summary -G machine.slice/machine-qemu\\x2d2\\x2drhel6.sandy.scope/ -a sleep 5 Summary of events: qemu-system-x86 (23667), 143858 events, 24.2% syscall calls total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- --------- --------- --------- --------- ------ ppoll 28492 4348.631 0.000 0.153 11.616 1.05% futex 19661 140.801 0.001 0.007 2.993 3.20% read 18440 68.084 0.001 0.004 1.653 4.33% ioctl 5387 24.768 0.002 0.005 0.134 1.62% CPU 0/KVM (23744), 449455 events, 75.8% syscall calls total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- --------- --------- --------- --------- ------ ioctl 148364 3401.812 0.000 0.023 11.801 1.15% futex 36131 404.127 0.001 0.011 7.377 2.63% writev 29452 339.688 0.003 0.012 1.740 1.36% write 11315 45.992 0.001 0.004 0.105 1.10% # See the documentation about how to set more than one cgroup for different events in the same command line. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-t126jh4occqvu0xdqlcjygex@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-03-07 10:22:26 -03:00
Kan Liang	d7f55c62e6	perf trace: Switch to new perf_mmap__read_event() interface The 'perf trace' utility still use the legacy interface. Switch to the new perf_mmap__read_event() interface. No functional change. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1519945751-37786-2-git-send-email-kan.liang@linux.intel.com [ Changed bool parameters from 0 to 'false', as per Jiri comment ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-03-05 10:41:59 -03:00
Ravi Bangoria	3a9e9a4709	perf trace: Fix call-graph output Recently, Arnaldo fixed global vs event specific --max-stack usage with commit `bd3dda9ab0` ("perf trace: Allow overriding global --max-stack per event"). This commit is having a regression when we don't use --max-stack at all with perf trace. Ex, $ ./perf trace record -g ls $ ./perf trace -i perf.data 0.076 ( 0.002 ms): ls/9109 brk( 0.196 ( 0.008 ms): ls/9109 access(filename: 0x9f998b70, mode: R 0.209 ( 0.031 ms): ls/9109 open(filename: 0x9f998978, flags: CLOEXEC This is missing call-traces. After patch: $ ./perf trace -i perf.data 0.076 ( 0.002 ms): ls/9109 brk( do_syscall_trace_leave ([kernel.kallsyms]) [0] ([unknown]) syscall_exit_work ([kernel.kallsyms]) brk (/usr/lib64/ld-2.17.so) _dl_sysdep_start (/usr/lib64/ld-2.17.so) _dl_start_final (/usr/lib64/ld-2.17.so) _dl_start (/usr/lib64/ld-2.17.so) _start (/usr/lib64/ld-2.17.so) 0.196 ( 0.008 ms): ls/9109 access(filename: 0x9f998b70, mode: R do_syscall_trace_leave ([kernel.kallsyms]) [0] ([unknown]) Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: `bd3dda9ab0` ("perf trace: Allow overriding global --max-stack per event") Link: http://lkml.kernel.org/r/20180130053053.13214-3-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-02-05 13:53:45 -03:00
Arnaldo Carvalho de Melo	c19d0847b2	perf trace beauty flock: Move to separate object file To resolve some header conflicts that were preventing the build to succeed in the Alpine Linux distribution. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-bvud0dvzvip3kibeplupdbmc@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-01-25 06:37:31 -03:00
Arnaldo Carvalho de Melo	bafae98e7a	perf evlist: Remove fcntl.h from evlist.h Not needed there, fixup the places where it is needed and was getting only by luck via evlist.h. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-yxjpetn64z8vjuguu84gr6x6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-01-25 06:37:30 -03:00
Arnaldo Carvalho de Melo	3258abe099	perf trace beauty futex: Beautify FUTEX_BITSET_MATCH_ANY E.g.: # strace -e futex -p 14437 strace: Process 14437 attached futex(0x7f46f4808d70, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x7f46f24e68b0, FUTEX_WAIT_BITSET_PRIVATE\|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1516636744, tv_nsec=221969000}, 0xffffffff) = -1 ETIMEDOUT (Connection timed out) <detached ...> # Should pretty print that 0xffffffff value, like: # trace -e futex --tid 14437 0.028 ( 0.005 ms): futex(uaddr: 0x7f46f4808d70, op: WAKE\|PRIV, val: 1 ) = 0 0.037 (1000.092 ms): futex(uaddr: 0x7f46f24e68b0, op: WAIT_BITSET\|PRIV\|CLKRT, utime: 0x7f46f23fedf0, val3: MATCH_ANY) = -1 ETIMEDOUT Connection timed out ^C# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-raef6e352la90600yksthao1@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-01-25 06:37:30 -03:00
Arnaldo Carvalho de Melo	522283fec7	perf trace: Do not print from time delta for interrupted syscall lines We were calculating the delta from a in-flight syscall that got its output interrupted by another syscall, which doesn't seem like useful information, we will print the syscall duration (sys_exit - sys_enter) when the raw_syscalls:sys_exit event happens. The problem here is how we're consuming the multiple ring buffers, without using the ordered_events code used by perf_session, which may cause some reordering of syscalls for diferent CPUs, so just stop printing that delta, to avoid things like: # trace --print-sample -p 9626 -e futex raw_syscalls:sys_enter 411967179.269 Timer 9609/9626 [2] raw_syscalls:sys_enter 411967179.213 file:// Content 9609/9609 [3] 328.038 (18446744073709.496 ms): Timer/9626 futex(uaddr: 0x7fc0d4027044, op: WAIT\|PRIV, utime: 0x7fc0b0ffdb50 ) ... raw_syscalls:sys_exit 411967179.225 file:// Content 9609/9609 [3] 327.982 ( 0.012 ms): file:// Conten/9609 futex(uaddr: 0x7fc0d4027040, op: WAKE\|PRIV, val: 1 ) = 1 This is a bandaid, we should better try and use the ordered_events code, possibly with some refactoring prep work, but for now at least we don't show those false long deltas for the lines ending in '...'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-q6xgsqrju1sr6ltud9kjjhmb@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-01-25 06:37:29 -03:00
Arnaldo Carvalho de Melo	591421e151	perf trace: Add --print-sample To help with debugging, like the interrupted out of order issue that will be dealt with in the next patch in this series, changing the code to deal with: raw_syscalls:sys_enter 411967179.269 Timer 9609/9626 [2] raw_syscalls:sys_enter 411967179.213 file:// Content 9609/9609 [3] 328.038 (18446744073709.496 ms): Timer/9626 futex(uaddr: 0x7fc0d4027044, op: WAIT\|PRIV, utime: 0x7fc0b0ffdb50 ) ... raw_syscalls:sys_exit 411967179.225 file:// Content 9609/9609 [3] 327.982 ( 0.012 ms): file:// Conten/9609 futex(uaddr: 0x7fc0d4027040, op: WAKE\|PRIV, val: 1 ) = 1 That long duration is the bug. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-fljqiibjn7wet24jd1ed7abc@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-01-25 06:37:29 -03:00

1 2 3 4 5 ...

484 Commits (579e5ff629b17744f62473f314768de8b6f1e66c)