Commit graph

3709 commits

Author SHA1 Message Date
Linus Torvalds 2074006dac The new features of this release:
- Added TRACE_DEFINE_SIZEOF() which allows trace events that use
     sizeof() it the TP_printk() to be converted to the actual size such
     that trace-cmd and perf can parse them correctly.
 
   - Some rework of the TRACE_DEFINE_ENUM() such that the above
     TRACE_DEFINE_SIZEOF() could reuse the same code.
 
   - Recording of tgid (Thread Group ID). This is similar to how
     task COMMs are recorded (cached at sched_switch), where it is
     in a table and used on output of the trace and trace_pipe files.
 
   - Have ":mod:<module>" be cached when written into set_ftrace_filter.
     Then the functions of the module will be traced at module load.
 
   - Some random clean ups and small fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQExBAABCAAbBQJZXjYuFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
 fsgIAKUvhpn2igoYCR9tWqu+DovEmwxCIumbCzmCFQcRKlLttRte94yY5+W9hnV0
 JPzd9T9zBDVqq1fI7iIop1SuTwEfKW6lJom0usZ8AFpK+YKm6FHnQ28POlvHzre2
 lzO41tpRWiehLQsITZ47eByhsvEfhx86mYT/oM1JSR6Pii1OpjyNYmDMw6BaMNBT
 kSCQFgIhzAhVuHjwAnB/S++E/ou7M5bCwCb5CNh7MubKubV5upHpoJcgYGO+WWa6
 56H/iEhff4EECTGJVefd8e78MtJPL8EsuM0nAcMPlnl8AaiOpP7XCdlgTwdefLvP
 b3o+nP15voSHkARGXC6eM6gH0po=
 =rvGB
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The new features of this release:

   - Added TRACE_DEFINE_SIZEOF() which allows trace events that use
     sizeof() it the TP_printk() to be converted to the actual size such
     that trace-cmd and perf can parse them correctly.

   - Some rework of the TRACE_DEFINE_ENUM() such that the above
     TRACE_DEFINE_SIZEOF() could reuse the same code.

   - Recording of tgid (Thread Group ID). This is similar to how task
     COMMs are recorded (cached at sched_switch), where it is in a table
     and used on output of the trace and trace_pipe files.

   - Have ":mod:<module>" be cached when written into set_ftrace_filter.
     Then the functions of the module will be traced at module load.

   - Some random clean ups and small fixes"

* tag 'trace-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (26 commits)
  ftrace: Test for NULL iter->tr in regex for stack_trace_filter changes
  ftrace: Decrement count for dyn_ftrace_total_info for init functions
  ftrace: Unlock hash mutex on failed allocation in process_mod_list()
  tracing: Add support for display of tgid in trace output
  tracing: Add support for recording tgid of tasks
  ftrace: Decrement count for dyn_ftrace_total_info file
  ftrace: Remove unused function ftrace_arch_read_dyn_info()
  sh/ftrace: Remove only user of ftrace_arch_read_dyn_info()
  ftrace: Have cached module filters be an active filter
  ftrace: Implement cached modules tracing on module load
  ftrace: Have the cached module list show in set_ftrace_filter
  ftrace: Add :mod: caching infrastructure to trace_array
  tracing: Show address when function names are not found
  ftrace: Add missing comment for FTRACE_OPS_FL_RCU
  tracing: Rename update the enum_map file
  tracing: Add TRACE_DEFINE_SIZEOF() macros
  tracing: define TRACE_DEFINE_SIZEOF() macro to map sizeof's to their values
  tracing: Rename enum_replace to eval_replace
  trace: rename enum_map functions
  trace: rename trace.c enum functions
  ...
2017-07-06 19:45:45 -07:00
Michael Sartain 99c621d704 tracing: Add saved_tgids file to show cached pid to tgid mappings
Export the cached pid / tgid mappings in debugfs tracing saved_tgids file.
This allows user apps to translate the pids from a trace to their respective
thread group.

Example saved_tgids file with pid / tgid values separated by ' ':

  # cat saved_tgids
  1048 1048
  1047 1047
  7 7
  1049 1047
  1054 1047
  1053 1047

Link: http://lkml.kernel.org/r/20170630004023.064965233@goodmis.org
Link: http://lkml.kernel.org/r/20170706040713.unwkumbta5menygi@mikesart-cos

Reviewed-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Michael Sartain <mikesart@fastmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-07-06 10:11:53 -04:00
Linus Torvalds 5518b69b76 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Reasonably busy this cycle, but perhaps not as busy as in the 4.12
  merge window:

   1) Several optimizations for UDP processing under high load from
      Paolo Abeni.

   2) Support pacing internally in TCP when using the sch_fq packet
      scheduler for this is not practical. From Eric Dumazet.

   3) Support mutliple filter chains per qdisc, from Jiri Pirko.

   4) Move to 1ms TCP timestamp clock, from Eric Dumazet.

   5) Add batch dequeueing to vhost_net, from Jason Wang.

   6) Flesh out more completely SCTP checksum offload support, from
      Davide Caratti.

   7) More plumbing of extended netlink ACKs, from David Ahern, Pablo
      Neira Ayuso, and Matthias Schiffer.

   8) Add devlink support to nfp driver, from Simon Horman.

   9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa
      Prabhu.

  10) Add stack depth tracking to BPF verifier and use this information
      in the various eBPF JITs. From Alexei Starovoitov.

  11) Support XDP on qed device VFs, from Yuval Mintz.

  12) Introduce BPF PROG ID for better introspection of installed BPF
      programs. From Martin KaFai Lau.

  13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann.

  14) For loads, allow narrower accesses in bpf verifier checking, from
      Yonghong Song.

  15) Support MIPS in the BPF selftests and samples infrastructure, the
      MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David
      Daney.

  16) Support kernel based TLS, from Dave Watson and others.

  17) Remove completely DST garbage collection, from Wei Wang.

  18) Allow installing TCP MD5 rules using prefixes, from Ivan
      Delalande.

  19) Add XDP support to Intel i40e driver, from Björn Töpel

  20) Add support for TC flower offload in nfp driver, from Simon
      Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub
      Kicinski, and Bert van Leeuwen.

  21) IPSEC offloading support in mlx5, from Ilan Tayari.

  22) Add HW PTP support to macb driver, from Rafal Ozieblo.

  23) Networking refcount_t conversions, From Elena Reshetova.

  24) Add sock_ops support to BPF, from Lawrence Brako. This is useful
      for tuning the TCP sockopt settings of a group of applications,
      currently via CGROUPs"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits)
  net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap
  dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap
  cxgb4: Support for get_ts_info ethtool method
  cxgb4: Add PTP Hardware Clock (PHC) support
  cxgb4: time stamping interface for PTP
  nfp: default to chained metadata prepend format
  nfp: remove legacy MAC address lookup
  nfp: improve order of interfaces in breakout mode
  net: macb: remove extraneous return when MACB_EXT_DESC is defined
  bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case
  bpf: fix return in load_bpf_file
  mpls: fix rtm policy in mpls_getroute
  net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t
  net, ax25: convert ax25_route.refcount from atomic_t to refcount_t
  net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t
  net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t
  net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t
  net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t
  net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t
  net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t
  ...
2017-07-05 12:31:59 -07:00
Steven Rostedt (VMware) 69d71879d2 ftrace: Test for NULL iter->tr in regex for stack_trace_filter changes
As writing into stack_trace_filter, the iter-tr is not set and is NULL.
Check if it is NULL before dereferencing it in ftrace_regex_release().

Fixes: 8c08f0d5c6 ("ftrace: Have cached module filters be an active filter")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-07-05 09:52:18 -04:00
Steven Rostedt (VMware) 4dce17b26b Merge commit '0f17976568b3f72e676450af0c0db6f8752253d6' into trace/ftrace/core
Need to get the changes from 0f17976568 ("ftrace: Fix regression with
module command in stack_trace_filter") as it is required to fix some other
changes with stack_trace_filter and the new development code.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-07-05 09:51:24 -04:00
Linus Torvalds c6b1e36c8f Merge branch 'for-4.13/block' of git://git.kernel.dk/linux-block
Pull core block/IO updates from Jens Axboe:
 "This is the main pull request for the block layer for 4.13. Not a huge
  round in terms of features, but there's a lot of churn related to some
  core cleanups.

  Note this depends on the UUID tree pull request, that Christoph
  already sent out.

  This pull request contains:

   - A series from Christoph, unifying the error/stats codes in the
     block layer. We now use blk_status_t everywhere, instead of using
     different schemes for different places.

   - Also from Christoph, some cleanups around request allocation and IO
     scheduler interactions in blk-mq.

   - And yet another series from Christoph, cleaning up how we handle
     and do bounce buffering in the block layer.

   - A blk-mq debugfs series from Bart, further improving on the support
     we have for exporting internal information to aid debugging IO
     hangs or stalls.

   - Also from Bart, a series that cleans up the request initialization
     differences across types of devices.

   - A series from Goldwyn Rodrigues, allowing the block layer to return
     failure if we will block and the user asked for non-blocking.

   - Patch from Hannes for supporting setting loop devices block size to
     that of the underlying device.

   - Two series of patches from Javier, fixing various issues with
     lightnvm, particular around pblk.

   - A series from me, adding support for write hints. This comes with
     NVMe support as well, so applications can help guide data placement
     on flash to improve performance, latencies, and write
     amplification.

   - A series from Ming, improving and hardening blk-mq support for
     stopping/starting and quiescing hardware queues.

   - Two pull requests for NVMe updates. Nothing major on the feature
     side, but lots of cleanups and bug fixes. From the usual crew.

   - A series from Neil Brown, greatly improving the bio rescue set
     support. Most notably, this kills the bio rescue work queues, if we
     don't really need them.

   - Lots of other little bug fixes that are all over the place"

* 'for-4.13/block' of git://git.kernel.dk/linux-block: (217 commits)
  lightnvm: pblk: set line bitmap check under debug
  lightnvm: pblk: verify that cache read is still valid
  lightnvm: pblk: add initialization check
  lightnvm: pblk: remove target using async. I/Os
  lightnvm: pblk: use vmalloc for GC data buffer
  lightnvm: pblk: use right metadata buffer for recovery
  lightnvm: pblk: schedule if data is not ready
  lightnvm: pblk: remove unused return variable
  lightnvm: pblk: fix double-free on pblk init
  lightnvm: pblk: fix bad le64 assignations
  nvme: Makefile: remove dead build rule
  blk-mq: map all HWQ also in hyperthreaded system
  nvmet-rdma: register ib_client to not deadlock in device removal
  nvme_fc: fix error recovery on link down.
  nvmet_fc: fix crashes on bad opcodes
  nvme_fc: Fix crash when nvme controller connection fails.
  nvme_fc: replace ioabort msleep loop with completion
  nvme_fc: fix double calls to nvme_cleanup_cmd()
  nvme-fabrics: verify that a controller returns the correct NQN
  nvme: simplify nvme_dev_attrs_are_visible
  ...
2017-07-03 10:34:51 -07:00
John Fastabend 7bda4b40c5 bpf: extend bpf_trace_printk to support %i
Currently, bpf_trace_printk does not support common formatting
symbol '%i' however vsprintf does and is what eventually gets
called by bpf helper. If users are used to '%i' and currently
make use of it, then bpf_trace_printk will just return with
error without dumping anything to the trace pipe, so just add
support for '%i' to the helper.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:52 -07:00
Daniel Borkmann f96da09473 bpf: simplify narrower ctx access
This work tries to make the semantics and code around the
narrower ctx access a bit easier to follow. Right now
everything is done inside the .is_valid_access(). Offset
matching is done differently for read/write types, meaning
writes don't support narrower access and thus matching only
on offsetof(struct foo, bar) is enough whereas for read
case that supports narrower access we must check for
offsetof(struct foo, bar) + offsetof(struct foo, bar) +
sizeof(<bar>) - 1 for each of the cases. For read cases of
individual members that don't support narrower access (like
packet pointers or skb->cb[] case which has its own narrow
access logic), we check as usual only offsetof(struct foo,
bar) like in write case. Then, for the case where narrower
access is allowed, we also need to set the aux info for the
access. Meaning, ctx_field_size and converted_op_size have
to be set. First is the original field size e.g. sizeof(<bar>)
as in above example from the user facing ctx, and latter
one is the target size after actual rewrite happened, thus
for the kernel facing ctx. Also here we need the range match
and we need to keep track changing convert_ctx_access() and
converted_op_size from is_valid_access() as both are not at
the same location.

We can simplify the code a bit: check_ctx_access() becomes
simpler in that we only store ctx_field_size as a meta data
and later in convert_ctx_accesses() we fetch the target_size
right from the location where we do convert. Should the verifier
be misconfigured we do reject for BPF_WRITE cases or target_size
that are not provided. For the subsystems, we always work on
ranges in is_valid_access() and add small helpers for ranges
and narrow access, convert_ctx_accesses() sets target_size
for the relevant instruction.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:52 -07:00
Sabrina Dubroca 9e52b32567 tracing/kprobes: Allow to create probe with a module name starting with a digit
Always try to parse an address, since kstrtoul() will safely fail when
given a symbol as input. If that fails (which will be the case for a
symbol), try to parse a symbol instead.

This allows creating a probe such as:

    p:probe/vlan_gro_receive 8021q:vlan_gro_receive+0

Which is necessary for this command to work:

    perf probe -m 8021q -a vlan_gro_receive

Link: http://lkml.kernel.org/r/fd72d666f45b114e2c5b9cf7e27b91de1ec966f1.1498122881.git.sd@queasysnail.net

Cc: stable@vger.kernel.org
Fixes: 413d37d1e ("tracing: Add kprobe-based event tracer")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-29 23:13:23 -04:00
Steven Rostedt (VMware) 0f17976568 ftrace: Fix regression with module command in stack_trace_filter
When doing the following command:

 # echo ":mod:kvm_intel" > /sys/kernel/tracing/stack_trace_filter

it triggered a crash.

This happened with the clean up of probes. It required all callers to the
regex function (doing ftrace filtering) to have ops->private be a pointer to
a trace_array. But for the stack tracer, that is not the case.

Allow for the ops->private to be NULL, and change the function command
callbacks to handle the trace_array pointer being NULL as well.

Fixes: d2afd57a4b ("tracing/ftrace: Allow instances to have their own function probes")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-29 10:05:45 -04:00
Steven Rostedt (VMware) 4ec7846785 ftrace: Decrement count for dyn_ftrace_total_info for init functions
Init boot up functions may be traced, but they are also freed when the
kernel finishes booting. These are removed from the ftrace tables, and the
debug variable for dyn_ftrace_total_info needs to reflect that as well.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-28 11:57:03 -04:00
Steven Rostedt (VMware) 3b58a3c72f ftrace: Unlock hash mutex on failed allocation in process_mod_list()
If the new_hash fails to allocate, then unlock the hash mutex on error.

Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-28 09:09:38 -04:00
Joel Fernandes 441dae8f2f tracing: Add support for display of tgid in trace output
Earlier patches introduced ability to record the tgid using the 'record-tgid'
option. Here we read the tgid and output it if the option is enabled.

Link: http://lkml.kernel.org/r/20170626053844.5746-3-joelaf@google.com

Cc: kernel-team@android.com
Cc: Ingo Molnar <mingo@redhat.com>
Tested-by: Michael Sartain <mikesart@gmail.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-27 13:30:28 -04:00
Joel Fernandes d914ba37d7 tracing: Add support for recording tgid of tasks
Inorder to support recording of tgid, the following changes are made:

* Introduce a new API (tracing_record_taskinfo) to additionally record the tgid
  along with the task's comm at the same time. This has has the benefit of not
  setting trace_cmdline_save before all the information for a task is saved.
* Add a new API tracing_record_taskinfo_sched_switch to record task information
  for 2 tasks at a time (previous and next) and use it from sched_switch probe.
* Preserve the old API (tracing_record_cmdline) and create it as a wrapper
  around the new one so that existing callers aren't affected.
* Reuse the existing sched_switch and sched_wakeup probes to record tgid
  information and add a new option 'record-tgid' to enable recording of tgid

When record-tgid option isn't enabled to being with, we take care to make sure
that there's isn't memory or runtime overhead.

Link: http://lkml.kernel.org/r/20170627020155.5139-1-joelaf@google.com

Cc: kernel-team@android.com
Cc: Ingo Molnar <mingo@redhat.com>
Tested-by: Michael Sartain <mikesart@gmail.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-27 13:30:28 -04:00
Steven Rostedt (VMware) 83dd14933e ftrace: Decrement count for dyn_ftrace_total_info file
The dyn_ftrace_total_info file is used to show how many functions have been
converted into nops and can be used by ftrace. The problem is that it does
not get decremented when functions are removed (init boot code being freed,
and modules being freed). That means the number is very inaccurate everytime
functions are removed from the ftrace tables. Decrement it when functions
are removed.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-27 13:30:27 -04:00
Steven Rostedt (VMware) 6a9c981b1e ftrace: Remove unused function ftrace_arch_read_dyn_info()
ftrace_arch_read_dyn_info() was used so that archs could add its own debug
information into the dyn_ftrace_total_info in the tracefs file system. That
file is for debugging usage of dynamic ftrace. No arch uses that function
anymore, so just get rid of it.

This also allows for tracing_read_dyn_info() to be cleaned up a bit.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-27 13:30:22 -04:00
Steven Rostedt (VMware) 8c08f0d5c6 ftrace: Have cached module filters be an active filter
When a module filter is added to set_ftrace_filter, if the module is not
loaded, it is cached. This should be considered an active filter, and
function tracing should be filtered by this. That is, if a cached module
filter is the only filter set, then no function tracing should be happening,
as all the functions available will be filtered out.

This makes sense, as the reason to add a cached module filter, is to trace
the module when you load it. There shouldn't be any other tracing happening
until then.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-26 11:53:04 -04:00
Steven Rostedt (VMware) d7fbf8df7c ftrace: Implement cached modules tracing on module load
If a module is cached in the set_ftrace_filter, and that module is loaded,
then enable tracing on that module as if the cached module text was written
into set_ftrace_filter just as the module is loaded.

  # echo ":mod:kvm_intel" >
  # cat /sys/kernel/tracing/set_ftrace_filter
 #### all functions enabled ####
 :mod:kvm_intel
  # modprobe kvm_intel
  # cat /sys/kernel/tracing/set_ftrace_filter
 vmx_get_rflags [kvm_intel]
 vmx_get_pkru [kvm_intel]
 vmx_get_interrupt_shadow [kvm_intel]
 vmx_rdtscp_supported [kvm_intel]
 vmx_invpcid_supported [kvm_intel]
 [..]

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-26 11:53:03 -04:00
Steven Rostedt (VMware) 5985ea8bd5 ftrace: Have the cached module list show in set_ftrace_filter
When writing in a module filter into set_ftrace_filter for a module that is
not yet loaded, it it cached, and will be executed when the module is loaded
(although that is not implemented yet at this commit). Display the list of
cached modules to be traced.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-26 11:53:02 -04:00
Steven Rostedt (VMware) 673feb9d76 ftrace: Add :mod: caching infrastructure to trace_array
This is the start of the infrastructure work to allow for tracing module
functions before it is loaded.

Currently the following command:

  # echo :mod:some-mod > set_ftrace_filter

will enable tracing of all functions within the module "some-mod" if it is
loaded. What we want, is if the module is not loaded, that line will be
saved. When the module is loaded, then the "some-mod" will have that line
executed on it, so that the functions within it starts being traced.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-26 11:53:02 -04:00
Yonghong Song 239946314e bpf: possibly avoid extra masking for narrower load in verifier
Commit 31fd85816d ("bpf: permits narrower load from bpf program
context fields") permits narrower load for certain ctx fields.
The commit however will already generate a masking even if
the prog-specific ctx conversion produces the result with
narrower size.

For example, for __sk_buff->protocol, the ctx conversion
loads the data into register with 2-byte load.
A narrower 2-byte load should not generate masking.
For __sk_buff->vlan_present, the conversion function
set the result as either 0 or 1, essentially a byte.
The narrower 2-byte or 1-byte load should not generate masking.

To avoid unnecessary masking, prog-specific *_is_valid_access
now passes converted_op_size back to verifier, which indicates
the valid data width after perceived future conversion.
Based on this information, verifier is able to avoid
unnecessary marking.

Since we want more information back from prog-specific
*_is_valid_access checking, all of them are packed into
one data structure for more clarity.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23 14:04:11 -04:00
Steven Rostedt (VMware) feaf1283d1 tracing: Show address when function names are not found
Currently, when a function is not found in kallsyms, instead of simply
showing the function address, it shows nothing at all:

 # echo ':mod:kvm_intel' > /sys/kernel/tracing/set_ftrace_filter
 # echo function > /sys/kernel/tracing/set_ftrace_filter
 # qemu -enable-kvm /home/my-qemu-image
   <Ctrl-C>
 # rmmod kvm_intel
 # cat /sys/kernel/tracing/trace
 qemu-system-x86-2408  [001] d..2   135.013238:  <-kvm_arch_hardware_enable
 qemu-system-x86-2408  [001] ....   135.014574:  <-kvm_arch_vm_ioctl
 qemu-system-x86-2408  [001] ....   135.015420:  <-kvm_vm_ioctl_check_extension
 qemu-system-x86-2408  [001] ....   135.045411:  <-__do_cpuid_ent
 qemu-system-x86-2408  [001] ....   135.045412:  <-__do_cpuid_ent
 qemu-system-x86-2408  [001] ....   135.045412:  <-__do_cpuid_ent
 qemu-system-x86-2408  [001] ....   135.045412:  <-__do_cpuid_ent
 qemu-system-x86-2408  [001] ...1   135.045413:  <-__do_cpuid_ent
 qemu-system-x86-2408  [001] ....   135.045413:  <-__do_cpuid_ent

When it should show:

 qemu-system-x86-2408  [001] d..2   135.013238: 0xffffffffa02a39f0 <-kvm_arch_hardware_enable
 qemu-system-x86-2408  [001] ....   135.014574: 0xffffffffa02a2ba0 <-kvm_arch_vm_ioctl
 qemu-system-x86-2408  [001] ....   135.015420: 0xffffffffa029e4e0 <-kvm_vm_ioctl_check_extension
 qemu-system-x86-2408  [001] ....   135.045411: 0xffffffffa02a1380 <-__do_cpuid_ent
 qemu-system-x86-2408  [001] ....   135.045412: 0xffffffffa029e160 <-__do_cpuid_ent
 qemu-system-x86-2408  [001] ....   135.045412: 0xffffffffa029e180 <-__do_cpuid_ent
 qemu-system-x86-2408  [001] ....   135.045412: 0xffffffffa029e520 <-__do_cpuid_ent
 qemu-system-x86-2408  [001] ...1   135.045413: 0xffffffffa02a13b0 <-__do_cpuid_ent
 qemu-system-x86-2408  [001] ....   135.045413: 0xffffffffa02a1380 <-__do_cpuid_ent

instead.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-22 17:10:06 -04:00
Yonghong Song 31fd85816d bpf: permits narrower load from bpf program context fields
Currently, verifier will reject a program if it contains an
narrower load from the bpf context structure. For example,
        __u8 h = __sk_buff->hash, or
        __u16 p = __sk_buff->protocol
        __u32 sample_period = bpf_perf_event_data->sample_period
which are narrower loads of 4-byte or 8-byte field.

This patch solves the issue by:
  . Introduce a new parameter ctx_field_size to carry the
    field size of narrower load from prog type
    specific *__is_valid_access validator back to verifier.
  . The non-zero ctx_field_size for a memory access indicates
    (1). underlying prog type specific convert_ctx_accesses
         supporting non-whole-field access
    (2). the current insn is a narrower or whole field access.
  . In verifier, for such loads where load memory size is
    less than ctx_field_size, verifier transforms it
    to a full field load followed by proper masking.
  . Currently, __sk_buff and bpf_perf_event_data->sample_period
    are supporting narrowing loads.
  . Narrower stores are still not allowed as typical ctx stores
    are just normal stores.

Because of this change, some tests in verifier will fail and
these tests are removed. As a bonus, rename some out of bound
__sk_buff->cb access to proper field name and remove two
redundant "skb cb oob" tests.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14 14:56:25 -04:00
Jeremy Linton 681bec0367 tracing: Rename update the enum_map file
The enum_map file is used to display a list of symbol
to name conversions. As its now used to resolve sizeof
lets update the name and description.

Link: http://lkml.kernel.org/r/20170531215653.3240-13-jeremy.linton@arm.com

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-13 17:13:06 -04:00
Jeremy Linton 67ec0d8595 tracing: Rename enum_replace to eval_replace
The enum_replace stanza works as is for sizeof()
calls as well as enums. Rename it as well.

Link: http://lkml.kernel.org/r/20170531215653.3240-9-jeremy.linton@arm.com

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-13 17:10:34 -04:00
Jeremy Linton f57a41434f trace: rename enum_map functions
Rename the core trace enum routines to use eval, to
reflect their use by more than just enum to value mapping.

Link: http://lkml.kernel.org/r/20170531215653.3240-8-jeremy.linton@arm.com

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-13 17:10:23 -04:00
Jeremy Linton 5f60b351a7 trace: rename trace.c enum functions
Rename the init and trace_enum_jmp_to_tail() routines
to reflect their use by more than enumerated types.

Link: http://lkml.kernel.org/r/20170531215653.3240-7-jeremy.linton@arm.com

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-13 17:10:12 -04:00
Jeremy Linton 1793ed939b trace: rename trace_enum_mutex to trace_eval_mutex
There is a lock protecting the trace_enum_map, rename
it to reflect the use by more than enums.

Link: http://lkml.kernel.org/r/20170531215653.3240-6-jeremy.linton@arm.com

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-13 17:10:01 -04:00
Jeremy Linton 23bf8cb8dc trace: rename trace enum data structures in trace.c
The enum map entries can be exported to userspace
via a sys enum_map file. Rename those functions
and structures to reflect the fact that we are using
them for more than enums.

Link: http://lkml.kernel.org/r/20170531215653.3240-5-jeremy.linton@arm.com

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-13 17:09:50 -04:00
Jeremy Linton 99be647c58 trace: rename struct module entry for trace enums
Each module has a list of enum's its contributing to the
enum map, rename that entry to reflect its use by more than
enums.

Link: http://lkml.kernel.org/r/20170531215653.3240-4-jeremy.linton@arm.com

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-13 17:09:31 -04:00
Jeremy Linton 00f4b652b6 trace: rename trace_enum_map to trace_eval_map
Each enum is loaded into the trace_enum_map, as we
are now using this for more than enums rename it.

Link: http://lkml.kernel.org/r/20170531215653.3240-3-jeremy.linton@arm.com

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-13 17:08:57 -04:00
Jeremy Linton 02fd7f68f5 trace: rename kernel enum section to eval
The kernel and its modules have sections containing the enum
string to value conversions. Rename this section because we
intend to store more than enums in it.

Link: http://lkml.kernel.org/r/20170531215653.3240-2-jeremy.linton@arm.com

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-13 17:08:46 -04:00
Joel Fernandes 6a5ae63a0c tracing: Remove unused declaration of trace_stop_cmdline_recording
trace_stop_cmdline_recording declaration isn't in use, remove it.

Link: http://lkml.kernel.org/r/20170609025327.9508-2-joelaf@google.com

Cc: kernel-team@android.com
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-13 09:40:52 -04:00
Jens Axboe 8f66439eec Linux 4.12-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZPdbLAAoJEHm+PkMAQRiGx4wH/1nCjfnl6fE8oJ24/1gEAOUh
 biFdqJkYZmlLYHVtYfLm4Ueg4adJdg0wx6qM/4RaAzmQVvLfDV34bc1qBf1+P95G
 kVF+osWyXrZo5cTwkwapHW/KNu4VJwAx2D1wrlxKDVG5AOrULH1pYOYGOpApEkZU
 4N+q5+M0ce0GJpqtUZX+UnI33ygjdDbBxXoFKsr24B7eA0ouGbAJ7dC88WcaETL+
 2/7tT01SvDMo0jBSV0WIqlgXwZ5gp3yPGnklC3F4159Yze6VFrzHMKS/UpPF8o8E
 W9EbuzwxsKyXUifX2GY348L1f+47glen/1sedbuKnFhP6E9aqUQQJXvEO7ueQl4=
 =m2Gx
 -----END PGP SIGNATURE-----

Merge tag 'v4.12-rc5' into for-4.13/block

We've already got a few conflicts and upcoming work depends on some of the
changes that have gone into mainline as regression fixes for this series.

Pull in 4.12-rc5 to resolve these conflicts and make it easier on down stream
trees to continue working on 4.13 changes.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-12 08:30:13 -06:00
Daniel Borkmann 20b9d7ac48 bpf: avoid excessive stack usage for perf_sample_data
perf_sample_data consumes 386 bytes on stack, reduce excessive stack
usage and move it to per cpu buffer. It's allowed due to preemption
being disabled for tracing, xdp and tc programs, thus at all times
only one program can run on a specific CPU and programs cannot run
from interrupt. We similarly also handle bpf_pt_regs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-10 19:05:45 -04:00
Christoph Hellwig 4e4cbee93d block: switch bios to blk_status_t
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09 09:27:32 -06:00
Masami Hiramatsu c3ca46ef71 ftrace/kprobes: selftests: Check kretprobe maxactive is supported
Check the kretprobe maxactive is supported by kprobe_events
interface. To ensure the kernel feature, this changes ftrace
README to describe it.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2017-06-07 10:07:22 -06:00
David S. Miller 216fe8f021 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just some simple overlapping changes in marvell PHY driver
and the DSA core code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 22:20:08 -04:00
Alexei Starovoitov f91840a32d perf, bpf: Add BPF support to all perf_event types
Allow BPF_PROG_TYPE_PERF_EVENT program types to attach to all
perf_event types, including HW_CACHE, RAW, and dynamic pmu events.
Only tracepoint/kprobe events are treated differently which require
BPF_PROG_TYPE_TRACEPOINT/BPF_PROG_TYPE_KPROBE program types accordingly.

Also add support for reading all event counters using
bpf_perf_event_read() helper.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 21:58:01 -04:00
Luis Henriques f9797c2f20 ftrace: Fix memory leak in ftrace_graph_release()
ftrace_hash is being kfree'ed in ftrace_graph_release(), however the
->buckets field is not.  This results in a memory leak that is easily
captured by kmemleak:

unreferenced object 0xffff880038afe000 (size 8192):
  comm "trace-cmd", pid 238, jiffies 4294916898 (age 9.736s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff815f561e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff8113964d>] __kmalloc+0x12d/0x1a0
    [<ffffffff810bf6d1>] alloc_ftrace_hash+0x51/0x80
    [<ffffffff810c0523>] __ftrace_graph_open.isra.39.constprop.46+0xa3/0x100
    [<ffffffff810c05e8>] ftrace_graph_open+0x68/0xa0
    [<ffffffff8114003d>] do_dentry_open.isra.1+0x1bd/0x2d0
    [<ffffffff81140df7>] vfs_open+0x47/0x60
    [<ffffffff81150f95>] path_openat+0x2a5/0x1020
    [<ffffffff81152d6a>] do_filp_open+0x8a/0xf0
    [<ffffffff811411df>] do_sys_open+0x12f/0x200
    [<ffffffff811412ce>] SyS_open+0x1e/0x20
    [<ffffffff815fa6e0>] entry_SYSCALL_64_fastpath+0x13/0x94
    [<ffffffffffffffff>] 0xffffffffffffffff

Link: http://lkml.kernel.org/r/20170525152038.7661-1-lhenriques@suse.com

Cc: stable@vger.kernel.org
Fixes: b9b0c831be ("ftrace: Convert graph filter to use hash tables")
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-26 22:35:48 -04:00
Linus Torvalds 56f410cf45 This fixes a bug caused by not cleaning up the new instance unique triggers
when deleting an instance. It also creates a selftest that triggers that bug.
 
 Fix the delayed optimization happening after kprobes boot up self tests
 being removed by freeing of init memory.
 
 Comment kprobes on why the delay optimization is not a problem for removal
 of modules, to keep other developers from searching that riddle.
 
 Fix another rcu isn't watching in stack trace tracing.
 
 Naveen N. Rao (4):
       ftrace: Simplify glob handling in unregister_ftrace_function_probe_func()
       ftrace/instances: Clear function triggers when removing instances
       selftests/ftrace: Fix bashisms
       selftests/ftrace: Add test to remove instance with active event triggers
 
 Steven Rostedt (1):
       tracing: Move postpone selftests to core from early_initcall
 
 Steven Rostedt (VMware) (3):
       ftrace: Remove #ifdef from code and add clear_ftrace_function_probes() stub
       kprobes: Document how optimized kprobes are removed from module unload
       tracing: Make sure RCU is watching before calling a stack trace
 
 Thomas Gleixner (1):
       tracing/kprobes: Enforce kprobes teardown after testing
 -----BEGIN PGP SIGNATURE-----
 
 iQExBAABCAAbBQJZIQapFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
 A6MIAKFLb6mQ4flRBXpWd2tD2B4DQpQ0H7SovseZnlH6Q7grU6POY/qbNl9xXiBA
 3NavxqbIYokH8cxEqGAusL7ASUFPXJj6erMM1uc1WRuAzMpIjvgNacOtW5R+c5S9
 ofR1xtKlBo/854J/IP6M3J0WqrK+B7TsS1WYKohe/tFMBpolbnFloHVfMMZlaL58
 CQhCoAhkjJRsta6dJhbo+HoQy03VGyWsfFHtutBpIwsf81Naq4Stpxp7jdZLWhB8
 Di5QdOji9lDayK6Uk7DDZqHxbjC9z6cCS9nVWIGHkE4AMpR3peYtsyCaAOBjVMLV
 2OuhuREfZgKaYVMjUfdeYCayDAY=
 =1gek
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix a bug caused by not cleaning up the new instance unique triggers
   when deleting an instance. It also creates a selftest that triggers
   that bug.

 - Fix the delayed optimization happening after kprobes boot up self
   tests being removed by freeing of init memory.

 - Comment kprobes on why the delay optimization is not a problem for
   removal of modules, to keep other developers from searching that
   riddle.

 - Fix another case of rcu not watching in stack trace tracing.

* tag 'trace-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Make sure RCU is watching before calling a stack trace
  kprobes: Document how optimized kprobes are removed from module unload
  selftests/ftrace: Add test to remove instance with active event triggers
  selftests/ftrace: Fix bashisms
  ftrace: Remove #ifdef from code and add clear_ftrace_function_probes() stub
  ftrace/instances: Clear function triggers when removing instances
  ftrace: Simplify glob handling in unregister_ftrace_function_probe_func()
  tracing/kprobes: Enforce kprobes teardown after testing
  tracing: Move postpone selftests to core from early_initcall
2017-05-20 23:39:03 -07:00
Linus Torvalds 894e21642d Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A small collection of fixes that should go into this cycle.

   - a pull request from Christoph for NVMe, which ended up being
     manually applied to avoid pulling in newer bits in master. Mostly
     fibre channel fixes from James, but also a few fixes from Jon and
     Vijay

   - a pull request from Konrad, with just a single fix for xen-blkback
     from Gustavo.

   - a fuseblk bdi fix from Jan, fixing a regression in this series with
     the dynamic backing devices.

   - a blktrace fix from Shaohua, replacing sscanf() with kstrtoull().

   - a request leak fix for drbd from Lars, fixing a regression in the
     last series with the kref changes. This will go to stable as well"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nvmet: release the sq ref on rdma read errors
  nvmet-fc: remove target cpu scheduling flag
  nvme-fc: stop queues on error detection
  nvme-fc: require target or discovery role for fc-nvme targets
  nvme-fc: correct port role bits
  nvme: unmap CMB and remove sysfs file in reset path
  blktrace: fix integer parse
  fuseblk: Fix warning in super_setup_bdi_name()
  block: xen-blkback: add null check to avoid null pointer dereference
  drbd: fix request leak introduced by locking/atomic, kref: Kill kref_sub()
2017-05-20 16:12:30 -07:00
Shaohua Li 5f3394530f blktrace: fix integer parse
sscanf is a very poor way to parse integer. For example, I input
"discard" for act_mask, it gets 0xd and completely messes up. Using
correct API to do integer parse.

This patch also makes attributes accept any base of integer.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-19 09:21:15 -06:00
Steven Rostedt (VMware) a33d7d94ee tracing: Make sure RCU is watching before calling a stack trace
As stack tracing now requires "rcu watching", force RCU to be watching when
recording a stack trace.

Link: http://lkml.kernel.org/r/20170512172449.879684501@goodmis.org

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-18 23:57:56 -04:00
Steven Rostedt (VMware) 8a49f3e03c ftrace: Remove #ifdef from code and add clear_ftrace_function_probes() stub
No need to add ugly #ifdefs in the code. Having a standard stub file is much
prettier.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-17 21:53:32 -04:00
Naveen N. Rao a0e6369e4b ftrace/instances: Clear function triggers when removing instances
If instance directories are deleted while there are registered function
triggers:

  # cd /sys/kernel/debug/tracing/instances
  # mkdir test
  # echo "schedule:enable_event:sched:sched_switch" > test/set_ftrace_filter
  # rmdir test
  Unable to handle kernel paging request for data at address 0x00000008
  Unable to handle kernel paging request for data at address 0x00000008
  Faulting instruction address: 0xc0000000021edde8
  Oops: Kernel access of bad area, sig: 11 [#1]
  SMP NR_CPUS=2048
  NUMA
  pSeries
  Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc kvm iptable_filter fuse binfmt_misc pseries_rng rng_core vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c multipath virtio_net virtio_blk virtio_pci crc32c_vpmsum virtio_ring virtio
  CPU: 8 PID: 8694 Comm: rmdir Not tainted 4.11.0-nnr+ #113
  task: c0000000bab52800 task.stack: c0000000baba0000
  NIP: c0000000021edde8 LR: c0000000021f0590 CTR: c000000002119620
  REGS: c0000000baba3870 TRAP: 0300   Not tainted  (4.11.0-nnr+)
  MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>
    CR: 22002422  XER: 20000000
  CFAR: 00007fffabb725a8 DAR: 0000000000000008 DSISR: 40000000 SOFTE: 0
  GPR00: c00000000220f750 c0000000baba3af0 c000000003157e00 0000000000000000
  GPR04: 0000000000000040 00000000000000eb 0000000000000040 0000000000000000
  GPR08: 0000000000000000 0000000000000113 0000000000000000 c00000000305db98
  GPR12: c000000002119620 c00000000fd42c00 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 c0000000bab52e90 0000000000000000
  GPR24: 0000000000000000 00000000000000eb 0000000000000040 c0000000baba3bb0
  GPR28: c00000009cb06eb0 c0000000bab52800 c00000009cb06eb0 c0000000baba3bb0
  NIP [c0000000021edde8] ring_buffer_lock_reserve+0x8/0x4e0
  LR [c0000000021f0590] trace_event_buffer_lock_reserve+0xe0/0x1a0
  Call Trace:
  [c0000000baba3af0] [c0000000021f96c8] trace_event_buffer_commit+0x1b8/0x280 (unreliable)
  [c0000000baba3b60] [c00000000220f750] trace_event_buffer_reserve+0x80/0xd0
  [c0000000baba3b90] [c0000000021196b8] trace_event_raw_event_sched_switch+0x98/0x180
  [c0000000baba3c10] [c0000000029d9980] __schedule+0x6e0/0xab0
  [c0000000baba3ce0] [c000000002122230] do_task_dead+0x70/0xc0
  [c0000000baba3d10] [c0000000020ea9c8] do_exit+0x828/0xd00
  [c0000000baba3dd0] [c0000000020eaf70] do_group_exit+0x60/0x100
  [c0000000baba3e10] [c0000000020eb034] SyS_exit_group+0x24/0x30
  [c0000000baba3e30] [c00000000200bcec] system_call+0x38/0x54
  Instruction dump:
  60000000 60420000 7d244b78 7f63db78 4bffaa09 393efff8 793e0020 39200000
  4bfffecc 60420000 3c4c00f7 3842a020 <81230008> 2f890000 409e02f0 a14d0008
  ---[ end trace b917b8985d0e650b ]---
  Unable to handle kernel paging request for data at address 0x00000008
  Faulting instruction address: 0xc0000000021edde8
  Unable to handle kernel paging request for data at address 0x00000008
  Faulting instruction address: 0xc0000000021edde8
  Faulting instruction address: 0xc0000000021edde8

To address this, let's clear all registered function probes before
deleting the ftrace instance.

Link: http://lkml.kernel.org/r/c5f1ca624043690bd94642bb6bffd3f2fc504035.1494956770.git.naveen.n.rao@linux.vnet.ibm.com

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-17 21:52:22 -04:00
Naveen N. Rao cbab567c3d ftrace: Simplify glob handling in unregister_ftrace_function_probe_func()
Handle a NULL glob properly and simplify the check.

Link: http://lkml.kernel.org/r/5df74d4ffb4721db6d5a22fa08ca031d62ead493.1494956770.git.naveen.n.rao@linux.vnet.ibm.com

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-17 21:51:54 -04:00
Thomas Gleixner 30e7d894c1 tracing/kprobes: Enforce kprobes teardown after testing
Enabling the tracer selftest triggers occasionally the warning in
text_poke(), which warns when the to be modified page is not marked
reserved.

The reason is that the tracer selftest installs kprobes on functions marked
__init for testing. These probes are removed after the tests, but that
removal schedules the delayed kprobes_optimizer work, which will do the
actual text poke. If the work is executed after the init text is freed,
then the warning triggers. The bug can be reproduced reliably when the work
delay is increased.

Flush the optimizer work and wait for the optimizing/unoptimizing lists to
become empty before returning from the kprobes tracer selftest. That
ensures that all operations which were queued due to the probes removal
have completed.

Link: http://lkml.kernel.org/r/20170516094802.76a468bb@gandalf.local.home

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 6274de498 ("kprobes: Support delayed unoptimizing")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-17 21:50:27 -04:00
Steven Rostedt b9ef0326c0 tracing: Move postpone selftests to core from early_initcall
I hit the following lockdep splat when booting with ftrace selftests
enabled, as well as CONFIG_PREEMPT and LOCKDEP.

 Testing dynamic ftrace ops #1:
 (1 0 1 0 0)
 (1 1 2 0 0)
 (2 1 3 0 169)
 (2 2 4 0 50066)
 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 13 at kernel/rcu/srcutree.c:202 check_init_srcu_struct+0x60/0x70
 Modules linked in:
 CPU: 0 PID: 13 Comm: rcu_tasks_kthre Not tainted 4.12.0-rc1-test+ #587
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
 task: ffff880119628040 task.stack: ffffc900006a4000
 RIP: 0010:check_init_srcu_struct+0x60/0x70
 RSP: 0000:ffffc900006a7d98 EFLAGS: 00010246
 RAX: 0000000000000246 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: ffff880119628040 RSI: 00000000ffffffff RDI: ffffffff81e5fb40
 RBP: ffffc900006a7e20 R08: 00000023b403c000 R09: 0000000000000001
 R10: ffffc900006a7e40 R11: 0000000000000000 R12: ffffffff81e5fb40
 R13: 0000000000000286 R14: ffff880119628040 R15: ffffc900006a7e98
 FS:  0000000000000000(0000) GS:ffff88011ea00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffff88011edff000 CR3: 0000000001e0f000 CR4: 00000000001406f0
 Call Trace:
  ? __synchronize_srcu+0x6e/0x140
  ? lock_acquire+0xdc/0x1d0
  ? ktime_get_mono_fast_ns+0x5d/0xb0
  synchronize_srcu+0x6f/0x110
  ? synchronize_srcu+0x6f/0x110
  rcu_tasks_kthread+0x20a/0x540
  kthread+0x114/0x150
  ? __rcu_read_unlock+0x70/0x70
  ? kthread_create_on_node+0x40/0x40
  ret_from_fork+0x2e/0x40
 Code: f6 83 70 06 00 00 03 49 89 c5 74 0d be 01 00 00 00 48 89 df e8 42 fa ff ff 4c 89 ee 4c 89 e7 e8 b7 42 75 00 5b 41 5c 41 5d 5d c3 <0f> ff eb aa 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
 ---[ end trace 5c3f4206ce50f6ac ]---

What happens is that the selftests include a creating of a dynamically
allocated ftrace_ops, which requires the use of synchronize_rcu_tasks()
which uses srcu, and triggers the above warning.

It appears that synchronize_rcu_tasks() is not set up at early_initcall(),
but it is at core_initcall(). By moving the tests down to that location
works out properly.

Link: http://lkml.kernel.org/r/20170517111435.7388c033@gandalf.local.home

Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-17 21:39:38 -04:00
Linus Torvalds 29250d301b This is a trivial patch that changes a check for a cpumask from a NULL
pointer to using cpumask_available(), which will do the check. This is
 because cpumasks when not allocated are always set, and clang complains
 about it.
 -----BEGIN PGP SIGNATURE-----
 
 iQExBAABCAAbBQJZEcUIFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
 zygH/051hNj3aZlSZCD1pXwPkHgeVBn7lSB9k6hcJ5J/OknL/hNXws3Dv4Lb7Dzj
 cZhg62LTwwS6PVCJtOHk+PE/c5FIdY9o1mXJpAst6wbl9Sp1lzPJbFum45UadvWn
 UyU3RP0ncSgfojyrwIu6XyND7/NatdYk9irTMWL9+cDuy9xGvJgRX1sf7tXmxj4C
 AbZzQorDw7XDczDbvFM1XyPU3ApGUDqQ7VhCEBP6ivE+5Ceoo9xi/z7yfKyjLeb+
 H7+/eA8ztaMLgTzLWwkFKdP/knqwPmAb+MHTR0DoLHcVe7fbbxFS7x+cfR8mfIA9
 8tA5SUxc7bymRvDAcN2dMrtL7f8=
 =3hKI
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "This is a trivial patch that changes a check for a cpumask from a NULL
  pointer to using cpumask_available(), which will do the check. This is
  because cpumasks when not allocated are always set, and clang
  complains about it"

* tag 'trace-v4.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Use cpumask_available() to check if cpumask variable may be used
2017-05-10 11:26:25 -07:00
Linus Torvalds 00d9593335 These are three simple changes.
The first one is just a switch from using strcpy() to strlcpy(). Someone
 thought that it may cause an overflow bug, but since it only copies comms
 into a pre-allocated array of TASK_COMM_LEN, and no comm should ever
 be bigger than that, nor not end with a nul character, this change is more
 of a safety precaution than fixing anything that is actually broken.
 
 The other two changes are simply cleaning and optimizing some code.
 -----BEGIN PGP SIGNATURE-----
 
 iQExBAABCAAbBQJZEKYJFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
 +NcH/jK6ELGykogqi2FfLNzwJTjVpHdKrrMKyxHcC+jXv3mJUyK+0qKHkCO6zyy1
 EWAbTrSMjHGG6r6AP42QLRRehsijk7xXjJm86T771PNtSgY4xCKobFisk73KR4YB
 2Y1JXkSpKH2kKgdixR9hcg4h5RTv16KeAMu2cLSMxRfDEr1mBvv7LU8ZrobJSx2C
 LGR/241bTxOB6mWCmjqSTVrhHyEAMgJhVwV+ym7qfjqQULGhgFmq3CVTicFU0PWx
 UkzrcwYT2T56jU3Ngu/e1KkEZq0/rG7O86iSxgwnuraW4n48u3rpkl/q9eZ029Hd
 /kxyyXBKQDxx6cQd4hZrYUTW4IU=
 =8/K8
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull more tracing updates from Steven Rostedt:
 "These are three simple changes.

  The first one is just a switch from using strcpy() to strlcpy().
  Someone thought that it may cause an overflow bug, but since it only
  copies comms into a pre-allocated array of TASK_COMM_LEN, and no comm
  should ever be bigger than that, nor not end with a nul character,
  this change is more of a safety precaution than fixing anything that
  is actually broken.

  The other two changes are simply cleaning and optimizing some code"

* tag 'trace-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Simplify ftrace_match_record() even more
  ftrace: Remove an unneeded condition
  tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()
2017-05-08 20:36:38 -07:00
Matthias Kaehlcke 4dbbe2d8e9 tracing: Use cpumask_available() to check if cpumask variable may be used
This fixes the following clang warning:

kernel/trace/trace.c:3231:12: warning: address of array 'iter->started'
  will always evaluate to 'true' [-Wpointer-bool-conversion]
        if (iter->started)

Link: http://lkml.kernel.org/r/20170421234110.117075-1-mka@chromium.org

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-08 21:22:04 -04:00
Deepa Dinamani 51aad0aee5 trace: make trace_hwlat timestamp y2038 safe
struct timespec is not y2038 safe on 32 bit machines and needs to be
replaced by struct timespec64 in order to represent times beyond year
2038 on such machines.

Fix all the timestamp representation in struct trace_hwlat and all the
corresponding implementations.

Link: http://lkml.kernel.org/r/1491613030-11599-3-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Steven Rostedt (VMware) 77c0eddeee ftrace: Simplify ftrace_match_record() even more
Dan Carpenter sent a patch to remove a check in ftrace_match_record()
because the logic of the code made the check redundant. I looked deeper into
the code, and made the following logic table, with the three variables and
the result of the original code.

modname        mod_matches     exclude_mod         result
-------        -----------     -----------         ------
  0                 0               0              return 0
  0                 0               1              func_match
  0                 1               *             < cannot exist >
  1                 0               0              return 0
  1                 0               1              func_match
  1                 1               0              func_match
  1                 1               1              return 0

Notice that when mod_matches == exclude mod, the result is always to
return 0, and when mod_matches != exclude_mod, then the result is to test
the function. This means we only need test if mod_matches is equal to
exclude_mod.

Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-03 22:15:13 -04:00
Dan Carpenter 31805c9052 ftrace: Remove an unneeded condition
We know that "mod_matches" is true here so there is no need to check
again.

Link: http://lkml.kernel.org/r/20170331152130.GA4947@mwanda

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-03 22:15:12 -04:00
Amey Telawane e09e28671c tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()
Strcpy is inherently not safe, and strlcpy() should be used instead.
__trace_find_cmdline() uses strcpy() because the comms saved must have a
terminating nul character, but it doesn't hurt to add the extra protection
of using strlcpy() instead of strcpy().

Link: http://lkml.kernel.org/r/1493806274-13936-1-git-send-email-amit.pundir@linaro.org

Signed-off-by: Amey Telawane <ameyt@codeaurora.org>
[AmitP: Cherry-picked this commit from CodeAurora kernel/msm-3.10
https://source.codeaurora.org/quic/la/kernel/msm-3.10/commit/?id=2161ae9a70b12cf18ac8e5952a20161ffbccb477]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[ Updated change log and removed the "- 1" from len parameter ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-03 22:15:11 -04:00
Linus Torvalds 4c174688ee New features for this release:
o Pretty much a full rewrite of the processing of function plugins.
    i.e. echo do_IRQ:stacktrace > set_ftrace_filter
 
  o The rewrite was needed to add plugins to be unique to tracing instances.
    i.e. mkdir instance/foo; cd instances/foo; echo do_IRQ:stacktrace > set_ftrace_filter
    The old way was written very hacky. This removes a lot of those hacks.
 
  o New "function-fork" tracing option. When set, pids in the set_ftrace_pid
    will have their children added when the processes with their pids
    listed in the set_ftrace_pid file forks.
 
  o Exposure of "maxactive" for kretprobe in kprobe_events
 
  o Allow for builtin init functions to be traced by the function tracer
    (via the kernel command line). Module init function tracing will come
    in the next release.
 
  o Added more selftests, and have selftests also test in an instance.
 -----BEGIN PGP SIGNATURE-----
 
 iQExBAABCAAbBQJZCRchFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
 zuIH/RsLUb8Hj6GmhAvn/tblUDzWyqlXX2h79VVlo/XrWayHYNHnKOmua1WwMZC6
 xESXb/AffAc89VWTkKsrwaK7yfRPG6+w8zTZOcFuXSBpqSGG/oey9Fxj5Wqqpche
 oJ2UY7ngxANAipkP5GxdYTafFSoWhGZGfUUtW+5tAHoFHzqO2lOjO8olbXP69sON
 kVX/b461S20cVvRe5H/F0klXLSc37Tlp5YznXy4H4V4HcJSN1Fb6/uozOXALZ4se
 SBpVMWmVVoGJorzj+ic7gVOeohvC8RnR400HbeMVwaI0Lj50noidDj/5Hv8F7T+D
 h1B8vATNZLFAFUOSHINCBIu6Vj0=
 =t8mg
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "New features for this release:

   - Pretty much a full rewrite of the processing of function plugins.
     i.e. echo do_IRQ:stacktrace > set_ftrace_filter

   - The rewrite was needed to add plugins to be unique to tracing
     instances. i.e. mkdir instance/foo; cd instances/foo; echo
     do_IRQ:stacktrace > set_ftrace_filter The old way was written very
     hacky. This removes a lot of those hacks.

   - New "function-fork" tracing option. When set, pids in the
     set_ftrace_pid will have their children added when the processes
     with their pids listed in the set_ftrace_pid file forks.

   - Exposure of "maxactive" for kretprobe in kprobe_events

   - Allow for builtin init functions to be traced by the function
     tracer (via the kernel command line). Module init function tracing
     will come in the next release.

   - Added more selftests, and have selftests also test in an instance"

* tag 'trace-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (60 commits)
  ring-buffer: Return reader page back into existing ring buffer
  selftests: ftrace: Allow some event trigger tests to run in an instance
  selftests: ftrace: Have some basic tests run in a tracing instance too
  selftests: ftrace: Have event tests also run in an tracing instance
  selftests: ftrace: Make func_event_triggers and func_traceonoff_triggers tests do instances
  selftests: ftrace: Allow some tests to be run in a tracing instance
  tracing/ftrace: Allow for instances to trigger their own stacktrace probes
  tracing/ftrace: Allow for the traceonoff probe be unique to instances
  tracing/ftrace: Enable snapshot function trigger to work with instances
  tracing/ftrace: Allow instances to have their own function probes
  tracing/ftrace: Add a better way to pass data via the probe functions
  ftrace: Dynamically create the probe ftrace_ops for the trace_array
  tracing: Pass the trace_array into ftrace_probe_ops functions
  tracing: Have the trace_array hold the list of registered func probes
  ftrace: If the hash for a probe fails to update then free what was initialized
  ftrace: Have the function probes call their own function
  ftrace: Have each function probe use its own ftrace_ops
  ftrace: Have unregister_ftrace_function_probe_func() return a value
  ftrace: Add helper function ftrace_hash_move_and_update_ops()
  ftrace: Remove data field from ftrace_func_probe structure
  ...
2017-05-03 18:41:21 -07:00
Linus Torvalds 8d65b08deb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Millar:
 "Here are some highlights from the 2065 networking commits that
  happened this development cycle:

   1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)

   2) Add a generic XDP driver, so that anyone can test XDP even if they
      lack a networking device whose driver has explicit XDP support
      (me).

   3) Sparc64 now has an eBPF JIT too (me)

   4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
      Starovoitov)

   5) Make netfitler network namespace teardown less expensive (Florian
      Westphal)

   6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)

   7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)

   8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)

   9) Multiqueue support in stmmac driver (Joao Pinto)

  10) Remove TCP timewait recycling, it never really could possibly work
      well in the real world and timestamp randomization really zaps any
      hint of usability this feature had (Soheil Hassas Yeganeh)

  11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
      Aleksandrov)

  12) Add socket busy poll support to epoll (Sridhar Samudrala)

  13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
      and several others)

  14) IPSEC hw offload infrastructure (Steffen Klassert)"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
  tipc: refactor function tipc_sk_recv_stream()
  tipc: refactor function tipc_sk_recvmsg()
  net: thunderx: Optimize page recycling for XDP
  net: thunderx: Support for XDP header adjustment
  net: thunderx: Add support for XDP_TX
  net: thunderx: Add support for XDP_DROP
  net: thunderx: Add basic XDP support
  net: thunderx: Cleanup receive buffer allocation
  net: thunderx: Optimize CQE_TX handling
  net: thunderx: Optimize RBDR descriptor handling
  net: thunderx: Support for page recycling
  ipx: call ipxitf_put() in ioctl error path
  net: sched: add helpers to handle extended actions
  qed*: Fix issues in the ptp filter config implementation.
  qede: Fix concurrency issue in PTP Tx path processing.
  stmmac: Add support for SIMATIC IOT2000 platform
  net: hns: fix ethtool_get_strings overflow in hns driver
  tcp: fix wraparound issue in tcp_lp
  bpf, arm64: fix jit branch offset related to ldimm64
  bpf, arm64: implement jiting of BPF_XADD
  ...
2017-05-02 16:40:27 -07:00
Linus Torvalds da7b66ffb2 Merge branch 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull splice updates from Al Viro:
 "These actually missed the last cycle; the branch itself is from last
  December"

* 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  make nr_pages calculation in default_file_splice_read() a bit less ugly
  splice/tee/vmsplice: validate flags
  splice_pipe_desc: kill ->flags
  remove spd_release_page()
2017-05-02 11:38:06 -07:00
Linus Torvalds 7c8c03bfc7 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "The main changes in this cycle were:

  Kernel side changes:

   - Kprobes and uprobes changes:
      - Make their trampolines read-only while they are used
      - Make UPROBES_EVENTS default-y which is the distro practice
      - Apply misc fixes and robustization to probe point insertion.

   - add support for AMD IOMMU events

   - extend hw events on Intel Goldmont CPUs

   - ... plus misc fixes and updates.

  Tooling side changes:

   - support s390 jump instructions in perf annotate (Christian
     Borntraeger)

   - vendor hardware events updates (Andi Kleen)

   - add argument support for SDT events in powerpc (Ravi Bangoria)

   - beautify the statx syscall arguments in 'perf trace' (Arnaldo
     Carvalho de Melo)

   - handle inline functions in callchains (Jin Yao)

   - enable sorting by srcline as key (Milian Wolff)

   - add 'brstackinsn' field in 'perf script' to reuse the x86
     instruction decoder used in the Intel PT code to study hot paths to
     samples (Andi Kleen)

   - add PERF_RECORD_NAMESPACES so that the kernel can record
     information required to associate samples to namespaces, helping in
     container problem characterization. (Hari Bathini)

   - allow sorting by symbol_size in 'perf report' and 'perf top'
     (Charles Baylis)

   - in perf stat, make system wide (-a) the default option if no target
     was specified and one of following conditions is met:
      - no workload specified (current behaviour)
      - a workload is specified but all requested events are system wide
        ones, like uncore ones. (Jiri Olsa)

   - ... plus lots of other updates, enhancements, cleanups and fixes"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (235 commits)
  perf tools: Fix the code to strip command name
  tools arch x86: Sync cpufeatures.h
  tools arch: Sync arch/x86/lib/memcpy_64.S with the kernel
  tools: Update asm-generic/mman-common.h copy from the kernel
  perf tools: Use just forward declarations for struct thread where possible
  perf tools: Add the right header to obtain PERF_ALIGN()
  perf tools: Remove poll.h and wait.h from util.h
  perf tools: Remove string.h, unistd.h and sys/stat.h from util.h
  perf tools: Remove stale prototypes from builtin.h
  perf tools: Remove string.h from util.h
  perf tools: Remove sys/ioctl.h from util.h
  perf tools: Remove a few more needless includes from util.h
  perf tools: Include sys/param.h where needed
  perf callchain: Move callchain specific routines from util.[ch]
  perf tools: Add compress.h for the *_decompress_to_file() headers
  perf mem: Fix display of data source snoop indication
  perf debug: Move dump_stack() and sighandler_dump_stack() to debug.h
  perf kvm: Make function only used by 'perf kvm' static
  perf tools: Move timestamp routines from util.h to time-utils.h
  perf tools: Move units conversion/formatting routines to separate object
  ...
2017-05-01 20:23:17 -07:00
Linus Torvalds 5db6db0d40 Merge branch 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess unification updates from Al Viro:
 "This is the uaccess unification pile. It's _not_ the end of uaccess
  work, but the next batch of that will go into the next cycle. This one
  mostly takes copy_from_user() and friends out of arch/* and gets the
  zero-padding behaviour in sync for all architectures.

  Dealing with the nocache/writethrough mess is for the next cycle;
  fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am
  sold on access_ok() in there, BTW; just not in this pile), same for
  reducing __copy_... callsites, strn*... stuff, etc. - there will be a
  pile about as large as this one in the next merge window.

  This one sat in -next for weeks. -3KLoC"

* 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits)
  HAVE_ARCH_HARDENED_USERCOPY is unconditional now
  CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now
  m32r: switch to RAW_COPY_USER
  hexagon: switch to RAW_COPY_USER
  microblaze: switch to RAW_COPY_USER
  get rid of padding, switch to RAW_COPY_USER
  ia64: get rid of copy_in_user()
  ia64: sanitize __access_ok()
  ia64: get rid of 'segment' argument of __do_{get,put}_user()
  ia64: get rid of 'segment' argument of __{get,put}_user_check()
  ia64: add extable.h
  powerpc: get rid of zeroing, switch to RAW_COPY_USER
  esas2r: don't open-code memdup_user()
  alpha: fix stack smashing in old_adjtimex(2)
  don't open-code kernel_setsockopt()
  mips: switch to RAW_COPY_USER
  mips: get rid of tail-zeroing in primitives
  mips: make copy_from_user() zero tail explicitly
  mips: clean and reorder the forest of macros...
  mips: consolidate __invoke_... wrappers
  ...
2017-05-01 14:41:04 -07:00
Linus Torvalds 694752922b Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:

 - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
   was initially a fork of CFQ, but subsequently changed to implement
   fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
   to be used on desktop type single drives, providing good fairness.
   From Paolo.

 - Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
   using a scalable token based algorithm that throttles IO based on
   live completion IO stats, similary to blk-wbt. From Omar.

 - A series from Jan, moving users to separately allocated backing
   devices. This continues the work of separating backing device life
   times, solving various problems with hot removal.

 - A series of updates for lightnvm, mostly from Javier. Includes a
   'pblk' target that exposes an open channel SSD as a physical block
   device.

 - A series of fixes and improvements for nbd from Josef.

 - A series from Omar, removing queue sharing between devices on mostly
   legacy drivers. This helps us clean up other bits, if we know that a
   queue only has a single device backing. This has been overdue for
   more than a decade.

 - Fixes for the blk-stats, and improvements to unify the stats and user
   windows. This both improves blk-wbt, and enables other users to
   register a need to receive IO stats for a device. From Omar.

 - blk-throttle improvements from Shaohua. This provides a scalable
   framework for implementing scalable priotization - particularly for
   blk-mq, but applicable to any type of block device. The interface is
   marked experimental for now.

 - Bucketized IO stats for IO polling from Stephen Bates. This improves
   efficiency of polled workloads in the presence of mixed block size
   IO.

 - A few fixes for opal, from Scott.

 - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
   From a variety of folks, mostly Sagi and James Smart.

 - A series from Bart, improving our exposed info and capabilities from
   the blk-mq debugfs support.

 - A series from Christoph, cleaning up how handle WRITE_ZEROES.

 - A series from Christoph, cleaning up the block layer handling of how
   we track errors in a request. On top of being a nice cleanup, it also
   shrinks the size of struct request a bit.

 - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
   never used by platforms, and the latter has outlived it's usefulness.

 - Various little bug fixes and cleanups from a wide variety of folks.

* 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
  block: hide badblocks attribute by default
  blk-mq: unify hctx delay_work and run_work
  block: add kblock_mod_delayed_work_on()
  blk-mq: unify hctx delayed_run_work and run_work
  nbd: fix use after free on module unload
  MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
  blk-mq-sched: alloate reserved tags out of normal pool
  mtip32xx: use runtime tag to initialize command header
  scsi: Implement blk_mq_ops.show_rq()
  blk-mq: Add blk_mq_ops.show_rq()
  blk-mq: Show operation, cmd_flags and rq_flags names
  blk-mq: Make blk_flags_show() callers append a newline character
  blk-mq: Move the "state" debugfs attribute one level down
  blk-mq: Unregister debugfs attributes earlier
  blk-mq: Only unregister hctxs for which registration succeeded
  blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
  blk-mq: Let blk_mq_debugfs_register() look up the queue name
  blk-mq: Register <dev>/queue/mq after having registered <dev>/queue
  ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
  ide-pm: always pass 0 error to __blk_end_request_all
  ..
2017-05-01 10:39:57 -07:00
Steven Rostedt (VMware) 73a757e631 ring-buffer: Return reader page back into existing ring buffer
When reading the ring buffer for consuming, it is optimized for splice,
where a page is taken out of the ring buffer (zero copy) and sent to the
reading consumer. When the read is finished with the page, it calls
ring_buffer_free_read_page(), which simply frees the page. The next time the
reader needs to get a page from the ring buffer, it must call
ring_buffer_alloc_read_page() which allocates and initializes a reader page
for the ring buffer to be swapped into the ring buffer for a new filled page
for the reader.

The problem is that there's no reason to actually free the page when it is
passed back to the ring buffer. It can hold it off and reuse it for the next
iteration. This completely removes the interaction with the page_alloc
mechanism.

Using the trace-cmd utility to record all events (causing trace-cmd to
require reading lots of pages from the ring buffer, and calling
ring_buffer_alloc/free_read_page() several times), and also assigning a
stack trace trigger to the mm_page_alloc event, we can see how many times
the ring_buffer_alloc_read_page() needed to allocate a page for the ring
buffer.

Before this change:

  # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
  # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
  9968

After this change:

  # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
  # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
  4

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-01 10:26:40 -04:00
David S. Miller fb796707d7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Both conflict were simple overlapping changes.

In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the
conversion of the driver in net-next to use in-netdev stats.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21 20:23:53 -07:00
Steven Rostedt (VMware) dcc19d2809 tracing/ftrace: Allow for instances to trigger their own stacktrace probes
Have the stacktrace function trigger probe trigger stack traces within the
instance that they were added to in the set_ftrace_filter.

 ># cd /sys/kernel/debug/tracing
 ># mkdir instances/foo
 ># cd instances/foo
 ># echo schedule:stacktrace:1 > set_ftrace_filter
 ># cat trace
 # tracer: nop
 #
 # entries-in-buffer/entries-written: 1/1   #P:4
 #
 #                              _-----=> irqs-off
 #                             / _----=> need-resched
 #                            | / _---=> hardirq/softirq
 #                            || / _--=> preempt-depth
 #                            ||| /     delay
 #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
 #              | |       |   ||||       |         |
           <idle>-0     [001] .N.2   202.585010: <stack trace>
  =>
  => schedule
  => schedule_preempt_disabled
  => do_idle
  => cpu_startup_entry
  => start_secondary
  => verify_cpu

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:49 -04:00
Steven Rostedt (VMware) 2290f2c589 tracing/ftrace: Allow for the traceonoff probe be unique to instances
Have the traceon/off function probe triggers affect only the instance they
are set in. This required making the trace_on/off accessible for other files
in the tracing directory.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:48 -04:00
Steven Rostedt (VMware) cab5037950 tracing/ftrace: Enable snapshot function trigger to work with instances
Modify the snapshot probe trigger to work with instances. This way the
snapshot function trigger will only affect the instance that it is added to
in the set_ftrace_filter file.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:48 -04:00
Steven Rostedt (VMware) d2afd57a4b tracing/ftrace: Allow instances to have their own function probes
Pass around the local trace_array that is the descriptor for tracing
instances, when enabling and disabling probes. This by default sets the
enable/disable of event probe triggers to work with instances.

The other probes will need some more work to get them working with
instances.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:47 -04:00
Steven Rostedt (VMware) 6e4443199e tracing/ftrace: Add a better way to pass data via the probe functions
With the redesign of the registration and execution of the function probes
(triggers), data can now be passed from the setup of the probe to the probe
callers that are specific to the trace_array it is on. Although, all probes
still only affect the toplevel trace array, this change will allow for
instances to have their own probes separated from other instances and the
top array.

That is, something like the stacktrace probe can be set to trace only in an
instance and not the toplevel trace array. This isn't implement yet, but
this change sets the ground work for the change.

When a probe callback is triggered (someone writes the probe format into
set_ftrace_filter), it calls register_ftrace_function_probe() passing in
init_data that will be used to initialize the probe. Then for every matching
function, register_ftrace_function_probe() will call the probe_ops->init()
function with the init data that was passed to it, as well as an address to
a place holder that is associated with the probe and the instance. The first
occurrence will have a NULL in the pointer. The init() function will then
initialize it. If other probes are added, or more functions are part of the
probe, the place holder will be passed to the init() function with the place
holder data that it was initialized to the last time.

Then this place_holder is passed to each of the other probe_ops functions,
where it can be used in the function callback. When the probe_ops free()
function is called, it can be called either with the rip of the function
that is being removed from the probe, or zero, indicating that there are no
more functions attached to the probe, and the place holder is about to be
freed. This gives the probe_ops a way to free the data it assigned to the
place holder if it was allocade during the first init call.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:46 -04:00
Steven Rostedt (VMware) 7b60f3d876 ftrace: Dynamically create the probe ftrace_ops for the trace_array
In order to eventually have each trace_array instance have its own unique
set of function probes (triggers), the trace array needs to hold the ops and
the filters for the probes.

This is the first step to accomplish this. Instead of having the private
data of the probe ops point to the trace_array, create a separate list that
the trace_array holds. There's only one private_data for a probe, we need
one per trace_array. The probe ftrace_ops will be dynamically created for
each instance, instead of being static.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:46 -04:00
Steven Rostedt (VMware) b5f081b563 tracing: Pass the trace_array into ftrace_probe_ops functions
Pass the trace_array associated to a ftrace_probe_ops into the probe_ops
func(), init() and free() functions. The trace_array is the descriptor that
describes a tracing instance. This will help create the infrastructure that
will allow having function probes unique to tracing instances.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:45 -04:00
Steven Rostedt (VMware) 04ec7bb642 tracing: Have the trace_array hold the list of registered func probes
Add a link list to the trace_array to hold func probes that are registered.
Currently, all function probes are the same for all instances as it was
before, that is, only the top level trace_array holds the function probes.
But this lays the ground work to have function probes be attached to
individual instances, and having the event trigger only affect events in the
given instance. But that work is still to be done.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:45 -04:00
Steven Rostedt (VMware) 8d70725e45 ftrace: If the hash for a probe fails to update then free what was initialized
If the ftrace_hash_move_and_update_ops() fails, and an ops->free() function
exists, then it needs to be called on all the ops that were added by this
registration.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:44 -04:00
Steven Rostedt (VMware) eee8ded131 ftrace: Have the function probes call their own function
Now that the function probes have their own ftrace_ops, there's no reason to
continue using the ftrace_func_hash to find which probe to call in the
function callback. The ops that is passed in to the function callback is
part of the probe_ops to call.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:43 -04:00
Steven Rostedt (VMware) 1ec3a81a0c ftrace: Have each function probe use its own ftrace_ops
Have the function probes have their own ftrace_ops, and remove the
trace_probe_ops. This simplifies some of the ftrace infrastructure code.

Individual entries for each function is still allocated for the use of the
output for set_ftrace_filter, but they will be removed soon too.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:43 -04:00
Steven Rostedt (VMware) d3d532d798 ftrace: Have unregister_ftrace_function_probe_func() return a value
Currently unregister_ftrace_function_probe_func() is a void function. It
does not give any feedback if an error occurred or no item was found to
remove and nothing was done.

Change it to return status and success if it removed something. Also update
the callers to return that feedback to the user.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:42 -04:00
Steven Rostedt (VMware) e16b35ddb8 ftrace: Add helper function ftrace_hash_move_and_update_ops()
The processes of updating a ops filter_hash is a bit complex, and requires
setting up an old hash to perform the update. This is done exactly the same
in two locations for the same reasons. Create a helper function that does it
in one place.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:42 -04:00
Steven Rostedt (VMware) 1a48df0041 ftrace: Remove data field from ftrace_func_probe structure
No users of the function probes uses the data field anymore. Remove it, and
change the init function to take a void *data parameter instead of a
void **data, because the init will just get the data that the registering
function was received, and there's no state after it is called.

The other functions for ftrace_probe_ops still take the data parameter, but
it will currently only be passed NULL. It will stay as a parameter for
future data to be passed to these functions.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:41 -04:00
Steven Rostedt (VMware) 02b77e2afb ftrace: Remove printing of data in showing of a function probe
None of the probe users uses the data field anymore of the entry. They all
have their own print() function. Remove showing the data field in the
generic function as the data field will be going away.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:40 -04:00
Steven Rostedt (VMware) 78f78e07d5 ftrace: Remove unused unregister_ftrace_function_probe_all() function
There are no users of unregister_ftrace_function_probe_all(). The only probe
function that is used is unregister_ftrace_function_probe_func(). Rename the
internal static function __unregister_ftrace_function_probe() to
unregister_ftrace_function_probe_func() and make it global.

Also remove the PROBE_TEST_FUNC as it would be always set.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:40 -04:00
Steven Rostedt (VMware) 0fe7e7e3f8 ftrace: Remove unused unregister_ftrace_function_probe() function
Nothing calls unregister_ftrace_function_probe(). Remove it as well as the
flag PROBE_TEST_DATA, as this function was the only one to set it.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:39 -04:00
Steven Rostedt (VMware) fe014e24b6 ftrace: Convert the rest of the function trigger over to the mapping functions
As the data pointer for individual ips will soon be removed and no longer
passed to the callback function probe handlers, convert the rest of the function
trigger counters over to the new ftrace_func_mapper helper functions.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:39 -04:00
Steven Rostedt (VMware) 1a93f8bd19 tracing: Have the snapshot trigger use the mapping helper functions
As the data pointer for individual ips will soon be removed and no longer
passed to the callback function probe handlers, convert the snapshot
trigger counter over to the new ftrace_func_mapper helper functions.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:38 -04:00
Steven Rostedt (VMware) 41794f1907 ftrace: Added ftrace_func_mapper for function probe triggers
In order to move the ops to the function probes directly, they need a way to
map function ips to their own data without depending on the infrastructure
of the function probes, as the data field will be going away.

New helper functions are added that are based on the ftrace_hash code.
ftrace_func_mapper functions are there to let the probes map ips to their
data. These can be allocated by the probe ops, and referenced in the
function callbacks.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:37 -04:00
Steven Rostedt (VMware) bca6c8d048 ftrace: Pass probe ops to probe function
In preparation to cleaning up the probe function registration code, the
"data" parameter will eventually be removed from the probe->func() call.
Instead it will receive its own "ops" function, in which it can set up its
own data that it needs to map.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:37 -04:00
Steven Rostedt (VMware) e51a989679 ftrace: Remove unused "flags" field from struct ftrace_func_probe
Nothing uses "flags" in the ftrace_func_probe descriptor. Remove it.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:36 -04:00
Steven Rostedt (VMware) 92a68fa047 ftrace: Move the function commands into the tracing directory
As nothing outside the tracing directory uses the function command mechanism,
I'm moving the prototypes out of the include/linux/ftrace.h and into the
local kernel/trace/trace.h header. I plan on making them hook to the
trace_array structure which is local to kernel/trace, and I do not want to
expose it to the rest of the kernel. This requires that the command functions
must also be local to tracing. But luckily nothing else uses them.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:33 -04:00
Christoph Hellwig caf7df1227 block: remove the errors field from struct request
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20 12:16:10 -06:00
Christoph Hellwig cee4b7ce3f blktrace: remove the unused block_rq_abort tracepoint
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20 12:16:10 -06:00
David S. Miller 7b9f6da175 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
A function in kernel/bpf/syscall.c which got a bug fix in 'net'
was moved to kernel/bpf/verifier.c in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20 10:35:33 -04:00
Steven Rostedt (VMware) 78f7a45dac ring-buffer: Have ring_buffer_iter_empty() return true when empty
I noticed that reading the snapshot file when it is empty no longer gives a
status. It suppose to show the status of the snapshot buffer as well as how
to allocate and use it. For example:

 ># cat snapshot
 # tracer: nop
 #
 #
 # * Snapshot is allocated *
 #
 # Snapshot commands:
 # echo 0 > snapshot : Clears and frees snapshot buffer
 # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
 #                      Takes a snapshot of the main buffer.
 # echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free)
 #                      (Doesn't have to be '2' works with any number that
 #                       is not a '0' or '1')

But instead it just showed an empty buffer:

 ># cat snapshot
 # tracer: nop
 #
 # entries-in-buffer/entries-written: 0/0   #P:4
 #
 #                              _-----=> irqs-off
 #                             / _----=> need-resched
 #                            | / _---=> hardirq/softirq
 #                            || / _--=> preempt-depth
 #                            ||| /     delay
 #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
 #              | |       |   ||||       |         |

What happened was that it was using the ring_buffer_iter_empty() function to
see if it was empty, and if it was, it showed the status. But that function
was returning false when it was empty. The reason was that the iter header
page was on the reader page, and the reader page was empty, but so was the
buffer itself. The check only tested to see if the iter was on the commit
page, but the commit page was no longer pointing to the reader page, but as
all pages were empty, the buffer is also.

Cc: stable@vger.kernel.org
Fixes: 651e22f270 ("ring-buffer: Always reset iterator to reader page")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-19 21:23:47 -04:00
Steven Rostedt (VMware) df62db5be2 tracing: Allocate the snapshot buffer before enabling probe
Currently the snapshot trigger enables the probe and then allocates the
snapshot. If the probe triggers before the allocation, it could cause the
snapshot to fail and turn tracing off. It's best to allocate the snapshot
buffer first, and then enable the trigger. If something goes wrong in the
enabling of the trigger, the snapshot buffer is still allocated, but it can
also be freed by the user by writting zero into the snapshot buffer file.

Also add a check of the return status of alloc_snapshot().

Cc: stable@vger.kernel.org
Fixes: 77fd5c15e3 ("tracing: Add snapshot trigger to function probes")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-19 14:19:08 -04:00
Steven Rostedt (VMware) ec19b85913 ftrace: Move the probe function into the tracing directory
As nothing outside the tracing directory uses the function probes mechanism,
I'm moving the prototypes out of the include/linux/ftrace.h and into the
local kernel/trace/trace.h header. I plan on making them hook to the
trace_array structure which is local to kernel/trace, and I do not want to
expose it to the rest of the kernel. This requires that the probe functions
must also be local to tracing. But luckily nothing else uses them.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-18 13:49:59 -04:00
Namhyung Kim 1e10486ffe ftrace: Add 'function-fork' trace option
The function-fork option is same as event-fork that it tracks task
fork/exit and set the pid filter properly.  This can be useful if user
wants to trace selected tasks including their children only.

Link: http://lkml.kernel.org/r/20170417024430.21194-3-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-17 17:13:00 -04:00
Namhyung Kim d879d0b8c1 ftrace: Fix function pid filter on instances
When function tracer has a pid filter, it adds a probe to sched_switch
to track if current task can be ignored.  The probe checks the
ftrace_ignore_pid from current tr to filter tasks.  But it misses to
delete the probe when removing an instance so that it can cause a crash
due to the invalid tr pointer (use-after-free).

This is easily reproducible with the following:

  # cd /sys/kernel/debug/tracing
  # mkdir instances/buggy
  # echo $$ > instances/buggy/set_ftrace_pid
  # rmdir instances/buggy

  ============================================================================
  BUG: KASAN: use-after-free in ftrace_filter_pid_sched_switch_probe+0x3d/0x90
  Read of size 8 by task kworker/0:1/17
  CPU: 0 PID: 17 Comm: kworker/0:1 Tainted: G    B           4.11.0-rc3  #198
  Call Trace:
   dump_stack+0x68/0x9f
   kasan_object_err+0x21/0x70
   kasan_report.part.1+0x22b/0x500
   ? ftrace_filter_pid_sched_switch_probe+0x3d/0x90
   kasan_report+0x25/0x30
   __asan_load8+0x5e/0x70
   ftrace_filter_pid_sched_switch_probe+0x3d/0x90
   ? fpid_start+0x130/0x130
   __schedule+0x571/0xce0
   ...

To fix it, use ftrace_clear_pids() to unregister the probe.  As
instance_rmdir() already updated ftrace codes, it can just free the
filter safely.

Link: http://lkml.kernel.org/r/20170417024430.21194-2-namhyung@kernel.org

Fixes: 0c8916c342 ("tracing: Add rmdir to remove multibuffer instances")
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-17 16:44:23 -04:00
Steven Rostedt (VMware) b980b117c9 tracing: Have the trace_event benchmark thread call cond_resched_rcu_qs()
The trace_event benchmark thread runs in kernel space in an infinite loop
while also calling cond_resched() in case anything else wants to schedule
in. Unfortunately, on a PREEMPT kernel, that makes it a nop, in which case,
this will never voluntarily schedule. That will cause synchronize_rcu_tasks()
to forever block on this thread, while it is running.

This is exactly what cond_resched_rcu_qs() is for. Use that instead.

Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-17 15:21:19 -04:00
Steven Rostedt (VMware) fcdc712579 ftrace: Fix indexing of t_hash_start() from t_next()
t_hash_start() does not increment *pos, where as t_next() must. But when
t_next() does increment *pos, it must still pass in the original *pos to
t_hash_start() otherwise it will skip the first instance:

 # cd /sys/kernel/debug/tracing
 # echo schedule:traceoff > set_ftrace_filter
 # echo do_IRQ:traceoff > set_ftrace_filter
 # echo call_rcu > set_ftrace_filter
 # cat set_ftrace_filter
call_rcu
schedule:traceoff:unlimited
do_IRQ:traceoff:unlimited

The above called t_hash_start() from t_start() as there was only one
function (call_rcu), but if we add another function:

 # echo xfrm_policy_destroy_rcu >> set_ftrace_filter
 # cat set_ftrace_filter
call_rcu
xfrm_policy_destroy_rcu
do_IRQ:traceoff:unlimited

The "schedule:traceoff" disappears.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-17 10:22:29 -04:00
David S. Miller 6b6cbc1471 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts were simply overlapping changes.  In the net/ipv4/route.c
case the code had simply moved around a little bit and the same fix
was made in both 'net' and 'net-next'.

In the net/sched/sch_generic.c case a fix in 'net' happened at
the same time that a new argument was added to qdisc_hash_add().

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-15 21:16:30 -04:00
Steven Rostedt (VMware) acceb72e90 ftrace: Fix removing of second function probe
When two function probes are added to set_ftrace_filter, and then one of
them is removed, the update to the function locations is not performed, and
the record keeping of the function states are corrupted, and causes an
ftrace_bug() to occur.

This is easily reproducable by adding two probes, removing one, and then
adding it back again.

 # cd /sys/kernel/debug/tracing
 # echo schedule:traceoff > set_ftrace_filter
 # echo do_IRQ:traceoff > set_ftrace_filter
 # echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter
 # echo do_IRQ:traceoff > set_ftrace_filter

Causes:
 ------------[ cut here ]------------
 WARNING: CPU: 2 PID: 1098 at kernel/trace/ftrace.c:2369 ftrace_get_addr_curr+0x143/0x220
 Modules linked in: [...]
 CPU: 2 PID: 1098 Comm: bash Not tainted 4.10.0-test+ #405
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
 Call Trace:
  dump_stack+0x68/0x9f
  __warn+0x111/0x130
  ? trace_irq_work_interrupt+0xa0/0xa0
  warn_slowpath_null+0x1d/0x20
  ftrace_get_addr_curr+0x143/0x220
  ? __fentry__+0x10/0x10
  ftrace_replace_code+0xe3/0x4f0
  ? ftrace_int3_handler+0x90/0x90
  ? printk+0x99/0xb5
  ? 0xffffffff81000000
  ftrace_modify_all_code+0x97/0x110
  arch_ftrace_update_code+0x10/0x20
  ftrace_run_update_code+0x1c/0x60
  ftrace_run_modify_code.isra.48.constprop.62+0x8e/0xd0
  register_ftrace_function_probe+0x4b6/0x590
  ? ftrace_startup+0x310/0x310
  ? debug_lockdep_rcu_enabled.part.4+0x1a/0x30
  ? update_stack_state+0x88/0x110
  ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
  ? preempt_count_sub+0x18/0xd0
  ? mutex_lock_nested+0x104/0x800
  ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
  ? __unwind_start+0x1c0/0x1c0
  ? _mutex_lock_nest_lock+0x800/0x800
  ftrace_trace_probe_callback.isra.3+0xc0/0x130
  ? func_set_flag+0xe0/0xe0
  ? __lock_acquire+0x642/0x1790
  ? __might_fault+0x1e/0x20
  ? trace_get_user+0x398/0x470
  ? strcmp+0x35/0x60
  ftrace_trace_onoff_callback+0x48/0x70
  ftrace_regex_write.isra.43.part.44+0x251/0x320
  ? match_records+0x420/0x420
  ftrace_filter_write+0x2b/0x30
  __vfs_write+0xd7/0x330
  ? do_loop_readv_writev+0x120/0x120
  ? locks_remove_posix+0x90/0x2f0
  ? do_lock_file_wait+0x160/0x160
  ? __lock_is_held+0x93/0x100
  ? rcu_read_lock_sched_held+0x5c/0xb0
  ? preempt_count_sub+0x18/0xd0
  ? __sb_start_write+0x10a/0x230
  ? vfs_write+0x222/0x240
  vfs_write+0xef/0x240
  SyS_write+0xab/0x130
  ? SyS_read+0x130/0x130
  ? trace_hardirqs_on_caller+0x182/0x280
  ? trace_hardirqs_on_thunk+0x1a/0x1c
  entry_SYSCALL_64_fastpath+0x18/0xad
 RIP: 0033:0x7fe61c157c30
 RSP: 002b:00007ffe87890258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: ffffffff8114a410 RCX: 00007fe61c157c30
 RDX: 0000000000000010 RSI: 000055814798f5e0 RDI: 0000000000000001
 RBP: ffff8800c9027f98 R08: 00007fe61c422740 R09: 00007fe61ca53700
 R10: 0000000000000073 R11: 0000000000000246 R12: 0000558147a36400
 R13: 00007ffe8788f160 R14: 0000000000000024 R15: 00007ffe8788f15c
  ? trace_hardirqs_off_caller+0xc0/0x110
 ---[ end trace 99fa09b3d9869c2c ]---
 Bad trampoline accounting at: ffffffff81cc3b00 (do_IRQ+0x0/0x150)

Cc: stable@vger.kernel.org
Fixes: 59df055f19 ("ftrace: trace different functions with a different tracer")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-15 17:04:37 -04:00
Steven Rostedt (VMware) 82cc4fc2e7 ftrace: Fix removing of second function probe
When two function probes are added to set_ftrace_filter, and then one of
them is removed, the update to the function locations is not performed, and
the record keeping of the function states are corrupted, and causes an
ftrace_bug() to occur.

This is easily reproducable by adding two probes, removing one, and then
adding it back again.

 # cd /sys/kernel/debug/tracing
 # echo schedule:traceoff > set_ftrace_filter
 # echo do_IRQ:traceoff > set_ftrace_filter
 # echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter
 # echo do_IRQ:traceoff > set_ftrace_filter

Causes:
 ------------[ cut here ]------------
 WARNING: CPU: 2 PID: 1098 at kernel/trace/ftrace.c:2369 ftrace_get_addr_curr+0x143/0x220
 Modules linked in: [...]
 CPU: 2 PID: 1098 Comm: bash Not tainted 4.10.0-test+ #405
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
 Call Trace:
  dump_stack+0x68/0x9f
  __warn+0x111/0x130
  ? trace_irq_work_interrupt+0xa0/0xa0
  warn_slowpath_null+0x1d/0x20
  ftrace_get_addr_curr+0x143/0x220
  ? __fentry__+0x10/0x10
  ftrace_replace_code+0xe3/0x4f0
  ? ftrace_int3_handler+0x90/0x90
  ? printk+0x99/0xb5
  ? 0xffffffff81000000
  ftrace_modify_all_code+0x97/0x110
  arch_ftrace_update_code+0x10/0x20
  ftrace_run_update_code+0x1c/0x60
  ftrace_run_modify_code.isra.48.constprop.62+0x8e/0xd0
  register_ftrace_function_probe+0x4b6/0x590
  ? ftrace_startup+0x310/0x310
  ? debug_lockdep_rcu_enabled.part.4+0x1a/0x30
  ? update_stack_state+0x88/0x110
  ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
  ? preempt_count_sub+0x18/0xd0
  ? mutex_lock_nested+0x104/0x800
  ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
  ? __unwind_start+0x1c0/0x1c0
  ? _mutex_lock_nest_lock+0x800/0x800
  ftrace_trace_probe_callback.isra.3+0xc0/0x130
  ? func_set_flag+0xe0/0xe0
  ? __lock_acquire+0x642/0x1790
  ? __might_fault+0x1e/0x20
  ? trace_get_user+0x398/0x470
  ? strcmp+0x35/0x60
  ftrace_trace_onoff_callback+0x48/0x70
  ftrace_regex_write.isra.43.part.44+0x251/0x320
  ? match_records+0x420/0x420
  ftrace_filter_write+0x2b/0x30
  __vfs_write+0xd7/0x330
  ? do_loop_readv_writev+0x120/0x120
  ? locks_remove_posix+0x90/0x2f0
  ? do_lock_file_wait+0x160/0x160
  ? __lock_is_held+0x93/0x100
  ? rcu_read_lock_sched_held+0x5c/0xb0
  ? preempt_count_sub+0x18/0xd0
  ? __sb_start_write+0x10a/0x230
  ? vfs_write+0x222/0x240
  vfs_write+0xef/0x240
  SyS_write+0xab/0x130
  ? SyS_read+0x130/0x130
  ? trace_hardirqs_on_caller+0x182/0x280
  ? trace_hardirqs_on_thunk+0x1a/0x1c
  entry_SYSCALL_64_fastpath+0x18/0xad
 RIP: 0033:0x7fe61c157c30
 RSP: 002b:00007ffe87890258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: ffffffff8114a410 RCX: 00007fe61c157c30
 RDX: 0000000000000010 RSI: 000055814798f5e0 RDI: 0000000000000001
 RBP: ffff8800c9027f98 R08: 00007fe61c422740 R09: 00007fe61ca53700
 R10: 0000000000000073 R11: 0000000000000246 R12: 0000558147a36400
 R13: 00007ffe8788f160 R14: 0000000000000024 R15: 00007ffe8788f15c
  ? trace_hardirqs_off_caller+0xc0/0x110
 ---[ end trace 99fa09b3d9869c2c ]---
 Bad trampoline accounting at: ffffffff81cc3b00 (do_IRQ+0x0/0x150)

Cc: stable@vger.kernel.org
Fixes: 59df055f19 ("ftrace: trace different functions with a different tracer")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-14 17:54:22 -04:00
Johannes Berg be9370a7d8 bpf: remove struct bpf_prog_type_list
There's no need to have struct bpf_prog_type_list since
it just contains a list_head, the type, and the ops
pointer. Since the types are densely packed and not
actually dynamically registered, it's much easier and
smaller to have an array of type->ops pointer. Also
initialize this array statically to remove code needed
to initialize it.

In order to save duplicating the list, move it to a new
header file and include it in the places needing it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11 14:38:43 -04:00
Ingo Molnar 84b1e36a6a Linux 4.11-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJY6mY1AAoJEHm+PkMAQRiGB14IAImsH28JPjxJVDasMIRPBxVc
 euPPlZgoBieu7sNt+kEsEqdkXuu0MLk6gln0IGxWLeoB2S+u3Tz5LMa2YArVqV9Z
 tWzOnI9auE73P2Pz/tUMOdyMs5tO0PolQxX3uljbULBozOHjHRh13fsXchX2yQvl
 mFeFCDqpPV0KhWRH/ciA8uIHdvYPhMpkKgRtmR8jXL0yzqLp6+2J+Bs8nHG4NNng
 HMVxZPC8jOE/TgWq6k/GmXgxh3H/AideFdHFbLKYnIFJW41ZGOI8a262zq3NmjPd
 lywpVU7O7RMhSITY5PnuR3LpNV8ftw1hz2y6t35unyFK1P02adOSj5GJ3hGdhaQ=
 =Xz5O
 -----END PGP SIGNATURE-----

Merge tag 'v4.11-rc6' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 08:42:47 +02:00
Steven Rostedt (VMware) 03ecd3f48e rcu/tracing: Add rcu_disabled to denote when rcu_irq_enter() will not work
Tracing uses rcu_irq_enter() as a way to make sure that RCU is watching when
it needs to use rcu_read_lock() and friends. This is because tracing can
happen as RCU is about to enter user space, or about to go idle, and RCU
does not watch for RCU read side critical sections as it makes the
transition.

There is a small location within the RCU infrastructure that rcu_irq_enter()
itself will not work. If tracing were to occur in that section it will break
if it tries to use rcu_irq_enter().

Originally, this happens with the stack_tracer, because it will call
save_stack_trace when it encounters stack usage that is greater than any
stack usage it had encountered previously. There was a case where that
happened in the RCU section where rcu_irq_enter() did not work, and lockdep
complained loudly about it. To fix it, stack tracing added a call to be
disabled and RCU would disable stack tracing during the critical section
that rcu_irq_enter() was inoperable. This solution worked, but there are
other cases that use rcu_irq_enter() and it would be a good idea to let RCU
give a way to let others know that rcu_irq_enter() will not work. For
example, in trace events.

Another helpful aspect of this change is that it also moves the per cpu
variable called in the RCU critical section into a cache locale along with
other RCU per cpu variables used in that same location.

I'm keeping the stack_trace_disable() code, as that still could be used in
the future by places that really need to disable it. And since it's only a
static inline, it wont take up any kernel text if it is not used.

Link: http://lkml.kernel.org/r/20170405093207.404f8deb@gandalf.local.home

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-10 15:22:03 -04:00
Steven Rostedt (VMware) 8aaf1ee70e tracing: Rename trace_active to disable_stack_tracer and inline its modification
In order to eliminate a function call, make "trace_active" into
"disable_stack_tracer" and convert stack_tracer_disable() and friends into
static inline functions.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-10 15:21:47 -04:00
Steven Rostedt (VMware) 5367278cb7 tracing: Add stack_tracer_disable/enable() functions
There are certain parts of the kernel that cannot let stack tracing
proceed (namely in RCU), because the stack tracer uses RCU, and parts of RCU
internals cannot handle having RCU read side locks taken.

Add stack_tracer_disable() and stack_tracer_enable() functions to let RCU
stop stack tracing on the current CPU when it is in those critical sections.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-10 14:34:10 -04:00
Steven Rostedt (VMware) 252babcd52 tracing: Replace the per_cpu() with __this_cpu*() in trace_stack.c
The updates to the trace_active per cpu variable can be updated with the
__this_cpu_*() functions as it only gets updated on the CPU that the variable
is on.

Thanks to Paul McKenney for suggesting __this_cpu_* instead of this_cpu_*.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-10 14:33:54 -04:00
Steven Rostedt (VMware) 0598e4f08e ftrace: Add use of synchronize_rcu_tasks() with dynamic trampolines
The function tracer needs to be more careful than other subsystems when it
comes to freeing data. Especially if that data is actually executable code.
When a single function is traced, a trampoline can be dynamically allocated
which is called to jump to the function trace callback. When the callback is
no longer needed, the dynamic allocated trampoline needs to be freed. This
is where the issues arise. The dynamically allocated trampoline must not be
used again. As function tracing can trace all subsystems, including
subsystems that are used to serialize aspects of freeing (namely RCU), it
must take extra care when doing the freeing.

Before synchronize_rcu_tasks() was around, there was no way for the function
tracer to know that nothing was using the dynamically allocated trampoline
when CONFIG_PREEMPT was enabled. That's because a task could be indefinitely
preempted while sitting on the trampoline. Now with synchronize_rcu_tasks(),
it will wait till all tasks have either voluntarily scheduled (not on the
trampoline) or goes into userspace (not on the trampoline). Then it is safe
to free the trampoline even with CONFIG_PREEMPT set.

Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-07 09:41:51 -04:00
Wei Yongjun 62277de758 ring-buffer: Fix return value check in test_ringbuffer()
In case of error, the function kthread_run() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check
should be replaced with IS_ERR().

Link: http://lkml.kernel.org/r/1466184839-14927-1-git-send-email-weiyj_lk@163.com

Cc: stable@vger.kernel.org
Fixes: 6c43e554a ("ring-buffer: Add ring buffer startup selftest")
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-05 09:36:52 -04:00
Alban Crequy 696ced4fb1 tracing/kprobes: expose maxactive for kretprobe in kprobe_events
When a kretprobe is installed on a kernel function, there is a maximum
limit of how many calls in parallel it can catch (aka "maxactive"). A
kernel module could call register_kretprobe() and initialize maxactive
(see example in samples/kprobes/kretprobe_example.c).

But that is not exposed to userspace and it is currently not possible to
choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events

The default maxactive can be as low as 1 on single-core with a
non-preemptive kernel. This is too low and we need to increase it not
only for recursive functions, but for functions that sleep or resched.

This patch updates the format of the command that can be written to
kprobe_events so that maxactive can be optionally specified.

I need this for a bpf program attached to the kretprobe of
inet_csk_accept, which can sleep for a long time.

This patch includes a basic selftest:

> # ./ftracetest -v  test.d/kprobe/
> === Ftrace unit tests ===
> [1] Kprobe dynamic event - adding and removing	[PASS]
> [2] Kprobe dynamic event - busy event check	[PASS]
> [3] Kprobe dynamic event with arguments	[PASS]
> [4] Kprobes event arguments with types	[PASS]
> [5] Kprobe dynamic event with function tracer	[PASS]
> [6] Kretprobe dynamic event with arguments	[PASS]
> [7] Kretprobe dynamic event with maxactive	[PASS]
>
> # of passed:  7
> # of failed:  0
> # of unresolved:  0
> # of untested:  0
> # of unsupported:  0
> # of xfailed:  0
> # of undefined(test bug):  0

BugLink: https://github.com/iovisor/bcc/issues/1072
Link: http://lkml.kernel.org/r/1491215782-15490-1-git-send-email-alban@kinvolk.io

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-04 10:32:03 -04:00
Steven Rostedt (VMware) b80f0f6c9e ftrace: Have init/main.c call ftrace directly to free init memory
Relying on free_reserved_area() to call ftrace to free init memory proved to
not be sufficient. The issue is that on x86, when debug_pagealloc is
enabled, the init memory is not freed, but simply set as not present. Since
ftrace was uninformed of this, starting function tracing still tries to
update pages that are not present according to the page tables, causing
ftrace to bug, as well as killing the kernel itself.

Instead of relying on free_reserved_area(), have init/main.c call ftrace
directly just before it frees the init memory. Then it needs to use
__init_begin and __init_end to know where the init memory location is.
Looking at all archs (and testing what I can), it appears that this should
work for each of them.

Reported-by: kernel test robot <xiaolong.ye@intel.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-03 14:04:00 -04:00
Al Viro bee3f412d6 Merge branch 'parisc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux into uaccess.parisc 2017-04-02 10:33:48 -04:00
Steven Rostedt (VMware) 5bd84629a7 ftrace: Create separate t_func_next() to simplify the function / hash logic
I noticed that if I use dd to read the set_ftrace_filter file that the first
hash command is repeated.

 # cd /sys/kernel/debug/tracing
 # echo schedule > set_ftrace_filter
 # echo do_IRQ >> set_ftrace_filter
 # echo schedule:traceoff >> set_ftrace_filter
 # echo do_IRQ:traceoff >> set_ftrace_filter

 # cat set_ftrace_filter
 schedule
 do_IRQ
 schedule:traceoff:unlimited
 do_IRQ:traceoff:unlimited

 # dd if=set_ftrace_filter bs=1
 schedule
 do_IRQ
 schedule:traceoff:unlimited
 schedule:traceoff:unlimited
 do_IRQ:traceoff:unlimited
 98+0 records in
 98+0 records out
 98 bytes copied, 0.00265011 s, 37.0 kB/s

This is due to the way t_start() calls t_next() as well as the seq_file
calls t_next() and the state is slightly different between the two. Namely,
t_start() will call t_next() with a local "pos" variable.

By separating out the function listing from t_next() into its own function,
we can have better control of outputting the functions and the hash of
triggers. This simplifies the code.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-31 18:00:45 -04:00
Steven Rostedt (VMware) 43ff926a0c ftrace: Update func_pos in t_start() when all functions are enabled
If all functions are enabled, there's a comment displayed in the file to
denote that:

  # cd /sys/kernel/debug/tracing
  # cat set_ftrace_filter
 #### all functions enabled ####

If a function trigger is set, those are displayed as well:

  # echo schedule:traceoff >> /debug/tracing/set_ftrace_filter
  # cat set_ftrace_filter
 #### all functions enabled ####
 schedule:traceoff:unlimited

But if you read that file with dd, the output can change:

  # dd if=/debug/tracing/set_ftrace_filter bs=1
 #### all functions enabled ####
 32+0 records in
 32+0 records out
 32 bytes copied, 7.0237e-05 s, 456 kB/s

This is because the "pos" variable is updated for the comment, but func_pos
is not. "func_pos" is used by the triggers (or hashes) to know how many
functions were printed and it bases its index from the pos - func_pos.
func_pos should be 1 to count for the comment printed. But since it is not,
t_hash_start() thinks that one trigger was already printed.

The cat gets to t_hash_start() via t_next() and not t_start() which updates
both pos and func_pos.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-31 18:00:37 -04:00
Steven Rostedt (VMware) 2d71d98900 ftrace: Return NULL at end of t_start() instead of calling t_hash_start()
The loop in t_start() of calling t_next() will call t_hash_start() if the
pos is beyond the functions and enters the hash items. There's no reason to
check if p is NULL and call t_hash_start(), as that would be redundant.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-31 18:00:37 -04:00
Steven Rostedt (VMware) c20489dad1 ftrace: Assign iter->hash to filter or notrace hashes on seq read
Instead of testing if the hash to use is the filter_hash or the notrace_hash
at each iteration, do the test at open, and set the iter->hash to point to
the corresponding filter or notrace hash. Then use that directly instead of
testing which hash needs to be used each iteration.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-31 18:00:36 -04:00
Steven Rostedt (VMware) c1bc5919f6 ftrace: Clean up __seq_open_private() return check
The return status check of __seq_open_private() is rather strange:

	iter = __seq_open_private();
	if (iter) {
		/* do stuff */
	}

	return iter ? 0 : -ENOMEM;

It makes much more sense to do the return of failure right away:

	iter = __seq_open_private();
	if (!iter)
		return -ENOMEM;

	/* do stuff */

	return 0;

This clean up will make updates to this code a bit nicer.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-31 18:00:35 -04:00
Al Viro db68ce10c4 new helper: uaccess_kernel()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-28 16:43:25 -04:00
Steven Rostedt (VMware) af0009fc16 tracing: Move trace_handle_return() out of line
Currently trace_handle_return() looks like this:

 static inline enum print_line_t trace_handle_return(struct trace_seq *s)
 {
        return trace_seq_has_overflowed(s) ?
                TRACE_TYPE_PARTIAL_LINE : TRACE_TYPE_HANDLED;
 }

Where trace_seq_overflowed(s) is:

 static inline bool trace_seq_has_overflowed(struct trace_seq *s)
 {
	return s->full || seq_buf_has_overflowed(&s->seq);
 }

And seq_buf_has_overflowed(&s->seq) is:

 static inline bool
 seq_buf_has_overflowed(struct seq_buf *s)
 {
	return s->len > s->size;
 }

Making trace_handle_return() into:

 return (s->full || (s->seq->len > s->seq->size)) ?
           TRACE_TYPE_PARTIAL_LINE :
           TRACE_TYPE_HANDLED;

One would think this is not an issue to keep as an inline. But because this
is used in the TRACE_EVENT() macro, it is extended for every tracepoint in
the system. Taking a look at a single tracepoint x86_irq_vector (was the
first one I randomly chosen). As trace_handle_return is used in the
TRACE_EVENT() macro of trace_raw_output_##call() we disassemble
trace_raw_output_x86_irq_vector and do a diff:

- is the original
+ is the out-of-line code

I removed identical lines that were different just due to different
addresses.

--- /tmp/irq-vec-orig	2017-03-16 09:12:48.569384851 -0400
+++ /tmp/irq-vec-ool	2017-03-16 09:13:39.378153385 -0400
@@ -6,27 +6,23 @@
        53                      push   %rbx
        48 89 fb                mov    %rdi,%rbx
        4c 8b a7 c0 20 00 00    mov    0x20c0(%rdi),%r12
        e8 f7 72 13 00          callq  ffffffff81155c80 <trace_raw_output_prep>
        83 f8 01                cmp    $0x1,%eax
        74 05                   je     ffffffff8101e993 <trace_raw_output_x86_irq_vector+0x23>
        5b                      pop    %rbx
        41 5c                   pop    %r12
        5d                      pop    %rbp
        c3                      retq
        41 8b 54 24 08          mov    0x8(%r12),%edx
-       48 8d bb 98 10 00 00    lea    0x1098(%rbx),%rdi
+       48 81 c3 98 10 00 00    add    $0x1098,%rbx
-       48 c7 c6 7b 8a a0 81    mov    $0xffffffff81a08a7b,%rsi
+       48 c7 c6 ab 8a a0 81    mov    $0xffffffff81a08aab,%rsi
-       e8 c5 85 13 00          callq  ffffffff81156f70 <trace_seq_printf>

 === here's the start of the main difference ===

+       48 89 df                mov    %rbx,%rdi
+       e8 62 7e 13 00          callq  ffffffff81156810 <trace_seq_printf>
-       8b 93 b8 20 00 00       mov    0x20b8(%rbx),%edx
-       31 c0                   xor    %eax,%eax
-       85 d2                   test   %edx,%edx
-       75 11                   jne    ffffffff8101e9c8 <trace_raw_output_x86_irq_vector+0x58>
-       48 8b 83 a8 20 00 00    mov    0x20a8(%rbx),%rax
-       48 39 83 a0 20 00 00    cmp    %rax,0x20a0(%rbx)
-       0f 93 c0                setae  %al
+       48 89 df                mov    %rbx,%rdi
+       e8 4a c5 12 00          callq  ffffffff8114af00 <trace_handle_return>
        5b                      pop    %rbx
-       0f b6 c0                movzbl %al,%eax

 === end ===

        41 5c                   pop    %r12
        5d                      pop    %rbp
        c3                      retq

If you notice, the original has 22 bytes of text more than the out of line
version. As this is for every TRACE_EVENT() defined in the system, this can
become quite large.

   text	   data	    bss	    dec	    hex	filename
8690305	5450490	1298432	15439227	 eb957b	vmlinux-orig
8681725	5450490	1298432	15430647	 eb73f7	vmlinux-handle

This change has a total of 8580 bytes in savings.

 $ objdump -dr /tmp/vmlinux-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
324

That's 324 tracepoints. But this does not include modules (which contain
many more tracepoints). For an allyesconfig build:

 $ objdump -dr vmlinux-allyes-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
1401

That's 1401 tracepoints giving us:

   text    data     bss     dec     hex filename
137920629       140221067       53264384        331406080       13c0db00 vmlinux-allyes-orig
137827709       140221067       53264384        331313160       13bf7008 vmlinux-allyes-handle

92920 bytes in savings!!!

Link: http://lkml.kernel.org/r/20170315021431.13107-2-andi@firstfloor.org

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-24 20:51:50 -04:00
Steven Rostedt (VMware) 42c269c88d ftrace: Allow for function tracing to record init functions on boot up
Adding a hook into free_reserve_area() that informs ftrace that boot up init
text is being free, lets ftrace safely remove those init functions from its
records, which keeps ftrace from trying to modify text that no longer
exists.

Note, this still does not allow for tracing .init text of modules, as
modules require different work for freeing its init code.

Link: http://lkml.kernel.org/r/1488502497.7212.24.camel@linux.intel.com

Cc: linux-mm@kvack.org
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Requested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-24 20:51:49 -04:00
Steven Rostedt (VMware) dbeafd0d61 ftrace: Have function tracing start in early boot up
Register the function tracer right after the tracing buffers are initialized
in early boot up. This will allow function tracing to begin early if it is
enabled via the kernel command line.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-24 20:51:48 -04:00
Steven Rostedt (VMware) 9afecfbb95 tracing: Postpone tracer start-up tests till the system is more robust
As tracing can now be enabled very early in boot up, even before some
critical system services (like scheduling), do not run the tracer selftests
until after early_initcall() is performed. If a tracer is registered before
such time, it is saved off in a list and the test is run when the system is
able to handle more diverse functions.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-24 20:51:46 -04:00
Steven Rostedt (VMware) e725c731e3 tracing: Split tracing initialization into two for early initialization
Create an early_trace_init() function that will initialize the buffers and
allow for ealier use of trace_printk(). This will also allow for future work
to have function tracing start earlier at boot up.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-24 13:08:43 -04:00
Ingo Molnar 61f63e3837 perf/core improvements and fixes:
New features:
 
 - Add 'brstackinsn' field in 'perf script' to reuse the x86 instruction
   decoder used in the Intel PT code to study hot paths to samples (Andi Kleen)
 
 Kernel:
 
 - Default UPROBES_EVENTS to Y (Alexei Starovoitov)
 
 - Fix check for kretprobe offset within function entry (Naveen N. Rao)
 
 Infrastructure:
 
 - Introduce util func is_sdt_event() (Ravi Bangoria)
 
 - Make perf_event__synthesize_mmap_events() scale on older kernels where
   reading /proc/pid/maps is way slower than reading /proc/pid/task/pid/maps (Stephane Eranian)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJYyrdSAAoJENZQFvNTUqpAe+4P/3c4ilBSOxLCCxGO7jDYo9oq
 /KqlvsCIg7+vo5eqrOUJAb4qXFnvpYxwjMMkL5rx7gdsBCRfRXIINGWUMrq5mNyk
 MgxuqYnp+yRuxLYml2wn+tdwLzcHWSN2EO9mqQ14N4I+HvgdLmVPQ44ACQXs6KfL
 dk/Ix8YtnFWl2sDZjvyr7ZBqwCPzzklZgHM6erxNUr/WJspzUiixAWqUmewodOUl
 P3PitlHXkITOK3AxSqOjJ4g1k933215nGih7hr0XdjEm4pIYaYksShQ6k9DASCrv
 dn2o1pF1LTu7KCtAo70aaSB7GXydwoA//o2gRbDkSwJJ25DIImZxJXQz9PAYDOo1
 vXSIhmlQ72c4/Yv/XzVOrIoMMMpmWKS3lGZxMVGR/Ie9Gw4kbotkaoEqEpNQsaDZ
 iIaU5v/EcvvToT7T7VHrGg0+vmHgYxm5gSlyASi2IrO2/wJAs0v2pYfuL6gYhXGp
 mhv/pHUv4l9OW+Ubm+zJEEcg337c2RQU5wT/bk4PihxY6nQyEH2Pn5VzdNbZLuMR
 eWnqTH/md+8/bkhmuZJp71wm60oPHoPvbDjvtfVmXAa52AzO+NWSc9Veke3C/QRm
 XgNkrXlzeKopEso3j4gw2iAolqw9t8FHFLGgbTkS+6UCKjAM7vNLiIV02LQqhM50
 qCnKEusMDCRgzeOXxYt+
 =Bg5M
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-4.12-20170316' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

New features:

 - Add 'brstackinsn' field in 'perf script' to reuse the x86 instruction
   decoder used in the Intel PT code to study hot paths to samples (Andi Kleen)

Kernel changes:

 - Default UPROBES_EVENTS to Y (Alexei Starovoitov)

 - Fix check for kretprobe offset within function entry (Naveen N. Rao)

Infrastructure changes:

 - Introduce util func is_sdt_event() (Ravi Bangoria)

 - Make perf_event__synthesize_mmap_events() scale on older kernels where
   reading /proc/pid/maps is way slower than reading /proc/pid/task/pid/maps (Stephane Eranian)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16 17:29:23 +01:00
Arnaldo Carvalho de Melo 61f35d7506 uprobes: Default UPROBES_EVENTS to Y
As it is already turned on by most distros, so just flip the default to
Y.

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: David Miller <davem@davemloft.net>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170316005817.GA6805@ast-mbp.thefacebook.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-16 12:42:02 -03:00
Ingo Molnar 2b95bd7d58 Merge branch 'linus' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16 09:50:50 +01:00
Naveen N. Rao 1d585e7090 trace/kprobes: Fix check for kretprobe offset within function entry
perf specifies an offset from _text and since this offset is fed
directly into the arch-specific helper, kprobes tracer rejects
installation of kretprobes through perf. Fix this by looking up the
actual offset from a function for the specified sym+offset.

Refactor and reuse existing routines to limit code duplication -- we
repurpose kprobe_addr() for determining final kprobe address and we
split out the function entry offset determination into a separate
generic helper.

Before patch:

  naveen@ubuntu:~/linux/tools/perf$ sudo ./perf probe -v do_open%return
  probe-definition(0): do_open%return
  symbol:do_open file:(null) line:0 offset:0 return:1 lazy:(null)
  0 arguments
  Looking at the vmlinux_path (8 entries long)
  Using /boot/vmlinux for symbols
  Open Debuginfo file: /boot/vmlinux
  Try to find probe point from debuginfo.
  Matched function: do_open [2d0c7ff]
  Probe point found: do_open+0
  Matched function: do_open [35d76dc]
  found inline addr: 0xc0000000004ba9c4
  Failed to find "do_open%return",
   because do_open is an inlined function and has no return point.
  An error occurred in debuginfo analysis (-22).
  Trying to use symbols.
  Opening /sys/kernel/debug/tracing//README write=0
  Opening /sys/kernel/debug/tracing//kprobe_events write=1
  Writing event: r:probe/do_open _text+4469776
  Failed to write event: Invalid argument
    Error: Failed to add events. Reason: Invalid argument (Code: -22)
  naveen@ubuntu:~/linux/tools/perf$ dmesg | tail
  <snip>
  [   33.568656] Given offset is not valid for return probe.

After patch:

  naveen@ubuntu:~/linux/tools/perf$ sudo ./perf probe -v do_open%return
  probe-definition(0): do_open%return
  symbol:do_open file:(null) line:0 offset:0 return:1 lazy:(null)
  0 arguments
  Looking at the vmlinux_path (8 entries long)
  Using /boot/vmlinux for symbols
  Open Debuginfo file: /boot/vmlinux
  Try to find probe point from debuginfo.
  Matched function: do_open [2d0c7d6]
  Probe point found: do_open+0
  Matched function: do_open [35d76b3]
  found inline addr: 0xc0000000004ba9e4
  Failed to find "do_open%return",
   because do_open is an inlined function and has no return point.
  An error occurred in debuginfo analysis (-22).
  Trying to use symbols.
  Opening /sys/kernel/debug/tracing//README write=0
  Opening /sys/kernel/debug/tracing//kprobe_events write=1
  Writing event: r:probe/do_open _text+4469808
  Writing event: r:probe/do_open_1 _text+4956344
  Added new events:
    probe:do_open        (on do_open%return)
    probe:do_open_1      (on do_open%return)

  You can now use it in all perf tools, such as:

	  perf record -e probe:do_open_1 -aR sleep 1

  naveen@ubuntu:~/linux/tools/perf$ sudo cat /sys/kernel/debug/kprobes/list
  c000000000041370  k  kretprobe_trampoline+0x0    [OPTIMIZED]
  c0000000004ba0b8  r  do_open+0x8    [DISABLED]
  c000000000443430  r  do_open+0x0    [DISABLED]

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/d8cd1ef420ec22e3643ac332fdabcffc77319a42.1488961018.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-15 17:48:37 -03:00
Masahiro Yamada 505d3085d7 scripts/spelling.txt: add "overide" pattern and fix typo instances
Fix typos and add the following to the scripts/spelling.txt:

  overide||override

While we are here, fix the doubled "address" in the touched line
Documentation/devicetree/bindings/regulator/ti-abb-regulator.txt.

Also, fix the comment block style in the touched hunks in
drivers/media/dvb-frontends/drx39xyj/drx_driver.h.

Link: http://lkml.kernel.org/r/1481573103-11329-21-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09 17:01:09 -08:00
Linus Torvalds c3abcabe81 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "This includes a fix for a crash if certain special addresses are
  kprobed, plus does a rename of two Kconfig variables that were a minor
  misnomer"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Rename CONFIG_[UK]PROBE_EVENT to CONFIG_[UK]PROBE_EVENTS
  kprobes/x86: Fix kernel panic when certain exception-handling addresses are probed
2017-03-07 14:38:16 -08:00
Linus Torvalds f26db9649a There was some breakage with the changes for jump labels in the 4.11 merge
window. Namely powerpc broke as jump labels uses the two LSB bits as flags
 in initialization. A check was added to make sure that all jump label
 entries were 4 bytes aligned, but powerpc didn't work that way for modules.
 Adding an alignment in the module linker script appeared to be the best
 solution.
 
 Jump labels also added an anonymous union to access those LSB bits as a
 normal long. But because this structure had static initialization, it broke
 older compilers that could not statically initialize anonymous unions
 without brackets.
 
 The command line parameter for setting function graph filter broke the
 "EMPTY_HASH" descriptor by modifying it instead of creating a new hash to
 hold the entries.
 
 The command line parameter ftrace_graph_max_depth was added to allow its
 setting at boot time. It uses existing code and only the command line hook
 was added. This is not really a fix, but as it uses existing code without
 affecting anything else, I added it to this release. It was ready before the
 merge window closed, but I wanted to let it sit in linux-next for a couple
 of days first.
 -----BEGIN PGP SIGNATURE-----
 
 iQExBAABCAAbBQJYvNrAFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
 JGQIAMkayeZ0OCyYHRPR4EcCrdE3fATmt1huJWHrMPnT4/fLabL8XQqrOpnOBMq1
 GFZb1SMkBmvGtAHF4GbvCxnIUfDQko6BTQAd8EMea1WM8+Kb66/BLgJawjWIU9I0
 dNYre9ONgR2NOzkz6nfKRXnmy0lRcOweBb09YYGSzY11Md7d8T3T4TUrPNZdYrO9
 8ZMbF4qRd9KLMRHcsWqvhWhBISxWnmtUSlthfweukKgDMy8OKpb7pR0ckjtYwsWX
 RF41jqLqzSUqtd/nE2Sj/aT8XOP4pfrKEUuNM4SBj8q5jmNcZuqi8Q9wItu3LWR2
 jqM/9UKTzaCr9cchwuvUC0i+jWc=
 =kDql
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "There was some breakage with the changes for jump labels in the 4.11
  merge window:

   - powerpc broke as jump labels uses the two LSB bits as flags in
     initialization.

     A check was added to make sure that all jump label entries were 4
     bytes aligned, but powerpc didn't work that way for modules. Adding
     an alignment in the module linker script appeared to be the best
     solution.

   - Jump labels also added an anonymous union to access those LSB bits
     as a normal long. But because this structure had static
     initialization, it broke older compilers that could not statically
     initialize anonymous unions without brackets.

   - The command line parameter for setting function graph filter broke
     the "EMPTY_HASH" descriptor by modifying it instead of creating a
     new hash to hold the entries.

   - The command line parameter ftrace_graph_max_depth was added to
     allow its setting at boot time. It uses existing code and only the
     command line hook was added.

     This is not really a fix, but as it uses existing code without
     affecting anything else, I added it to this release. It was ready
     before the merge window closed, but I wanted to let it sit in
     linux-next for a couple of days first"

* tag 'trace-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace/graph: Add ftrace_graph_max_depth kernel parameter
  tracing: Add #undef to fix compile error
  jump_label: Add comment about initialization order for anonymous unions
  jump_label: Fix anonymous union initialization
  module: set __jump_table alignment to 8
  ftrace/graph: Do not modify the EMPTY_HASH for the function_graph filter
  tracing: Fix code comment for ftrace_ops_get_func()
2017-03-07 09:37:28 -08:00
Ingo Molnar 84e5b54921 perf/core improvements and fixes:
New features:
 
 - Allow sorting by symbol_size in 'perf report' and 'perf top' (Charles Baylis)
 
   E.g.:
 
   # perf report -s symbol_size,symbol
 
   Samples: 9K of event 'cycles:k', Event count (approx.): 2870461623
   Overhead  Symbol size  Symbol
     14.55%          326  [k] flush_tlb_mm_range
      7.20%         1045  [k] filemap_map_pages
      5.82%          124  [k] vma_interval_tree_insert
      5.18%         2430  [k] unmap_page_range
      2.57%          571  [k] vma_interval_tree_remove
      1.94%          494  [k] page_add_file_rmap
      1.82%          740  [k] page_remove_rmap
      1.66%         1017  [k] release_pages
      1.57%         1636  [k] update_blocked_averages
      1.57%           76  [k] unlock_page
 
 - Add support for -p/--pid, -a/--all-cpus and -C/--cpu in 'perf ftrace' (Namhyung Kim)
 
 Change in behaviour:
 
 - Make system wide (-a) the default option if no target was specified and one
   of following conditions is met:
 
   - No workload specified (current behaviour)
 
   - A workload is specified but all requested events are system wide ones,
     like uncore ones. (Jiri Olsa)
 
 Fixes:
 
 - Add missing initialization to the instruction decoder used in the
   intel PT/BTS code, which was causing lots of failures in 'perf test',
   looking for a value when there was none (Adrian Hunter)
 
 Infrastructure:
 
 - Add arch code needed to adopt the kernel's refcount_t to aid in
   catching bugs when using atomic_t as a reference counter, basically
   cmpxchg related functions (Arnaldo Carvalho de Melo)
 
 - Convert the code using atomic_t as reference counts to refcount_t
   (Elena Rashetova)
 
 - Add feature test for sched_getcpu() to more easily check for its
   presence in the many libc implementations and accross different
   versions of such C libraries (Arnaldo Carvalho de Melo)
 
 - Issue a HW watchdog disable hint in 'perf stat' for when some of the
   requested events can't get counted because a PMU counter is taken by that
   watchdog (Borislav Petkov).
 
 - Add mapping for Intel's KnightsMill PMU events (Karol Wachowski)
 
 Documentation:
 
 - Clarify the term 'convergence' in:
 
    perf bench numa numa-mem -h --show_convergence (Jiri Olsa)
 
 Kernel code:
 
 - Ensure probe location is at function entry in kretprobes (Naveen N. Rao)
 
 - Allow return probes with offsets and absolute addresses (Naveen N. Rao)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJYvbmTAAoJENZQFvNTUqpAN80QAJ2ETcTosR9fo06VrT2HqRT4
 +iGe55wSu261TOekIkXOEW+ww321eNPfy4rIZeLCEFcCd9p03n5JceVbFnOjBuAz
 Lk6jrKpaH+Ajp56nCLyWH4r3LYLJXdoIydNay4PZ08rl0GgagGqvevD8ZZCEO0sx
 vjD1TFH2uSOq3UTxKapO++FHwhy+XqZ5S5I+rMuLxg6Qi+rLubXDztzIlcCfQPGx
 g+zFkaJ/ms9TAtWK25xoj34QXsaqpBsF8qkCE1P8Zdjtnkp6zM2Rx3HvvbRDmgVx
 /h0b1iua5IVElgnai/84ttJG3Bi6ovRbf/PFy+IceM4Qfx0eQeWmA3CAtcGOh9Gv
 GTDCcJ7xWZBpM0g1wCk3ks2oApFTA6GkcnIt5alhTse5U3gNmImv3uvuN8d265KL
 2oGKps7MH1nWMgpL4G4BNuZg2oqmM/uX9ERiuNjtCqj6WoHy2QSDyEMJN5Od3lYj
 ar2PPGofHmiacsW3NNMT+LwQ/wL/d2dVfZTopMafeaxRDGTxdqkhLkB/sZT/wexQ
 ySVijQPO+x0eLSIK/BWdEmD8K6JiYGdpIRDWVW+D043I7iiXFvPgui1bFABJEqn5
 mZFa1qT4EQMuuSaLkkxtoOrdoF6YzJJA57sIx2IrouGDapJ2BDegiUOfE0PSp5l0
 oeRuFcJYfpITC/TzntE3
 =h2yo
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-4.11-20170306' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

New features:

- Allow sorting by symbol_size in 'perf report' and 'perf top' (Charles Baylis)

  E.g.:

  # perf report -s symbol_size,symbol

  Samples: 9K of event 'cycles:k', Event count (approx.): 2870461623
  Overhead  Symbol size  Symbol
    14.55%          326  [k] flush_tlb_mm_range
     7.20%         1045  [k] filemap_map_pages
     5.82%          124  [k] vma_interval_tree_insert
     5.18%         2430  [k] unmap_page_range
     2.57%          571  [k] vma_interval_tree_remove
     1.94%          494  [k] page_add_file_rmap
     1.82%          740  [k] page_remove_rmap
     1.66%         1017  [k] release_pages
     1.57%         1636  [k] update_blocked_averages
     1.57%           76  [k] unlock_page

- Add support for -p/--pid, -a/--all-cpus and -C/--cpu in 'perf ftrace' (Namhyung Kim)

Change in behaviour:

- Make system wide (-a) the default option if no target was specified and one
  of following conditions is met:

  - No workload specified (current behaviour)

  - A workload is specified but all requested events are system wide ones,
    like uncore ones. (Jiri Olsa)

Fixes:

- Add missing initialization to the instruction decoder used in the
  intel PT/BTS code, which was causing lots of failures in 'perf test',
  looking for a value when there was none (Adrian Hunter)

Infrastructure changes:

- Add arch code needed to adopt the kernel's refcount_t to aid in
  catching bugs when using atomic_t as a reference counter, basically
  cmpxchg related functions (Arnaldo Carvalho de Melo)

- Convert the code using atomic_t as reference counts to refcount_t
  (Elena Rashetova)

- Add feature test for sched_getcpu() to more easily check for its
  presence in the many libc implementations and accross different
  versions of such C libraries (Arnaldo Carvalho de Melo)

- Issue a HW watchdog disable hint in 'perf stat' for when some of the
  requested events can't get counted because a PMU counter is taken by that
  watchdog (Borislav Petkov).

- Add mapping for Intel's KnightsMill PMU events (Karol Wachowski)

Documentation changes:

- Clarify the term 'convergence' in:

   perf bench numa numa-mem -h --show_convergence (Jiri Olsa)

Kernel code changes:

- Ensure probe location is at function entry in kretprobes (Naveen N. Rao)

- Allow return probes with offsets and absolute addresses (Naveen N. Rao)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-07 08:14:14 +01:00
Steven Rostedt (VMware) d0e02579c2 trace/kprobes: Add back warning about offset in return probes
Let's not remove the warning about offsets and return probes when the
offset is invalid.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170227115204.00f92846@gandalf.local.home
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-03 19:07:19 -03:00
Naveen N. Rao 35b6f55aa9 trace/kprobes: Allow return probes with offsets and absolute addresses
Since the kernel includes many non-global functions with same names, we
will need to use offsets from other symbols (typically _text/_stext) or
absolute addresses to place return probes on specific functions. Also,
the core register_kretprobe() API never forbid use of offsets or
absolute addresses with kretprobes.

Allow its use with the trace infrastructure. To distinguish kernels that
support this, update ftrace README to explicitly call this out.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/183e7ce2921a08c9c755ee9a5da3134febc6695b.1487770934.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-03 19:07:18 -03:00
Todd Brandt 65a50c6562 ftrace/graph: Add ftrace_graph_max_depth kernel parameter
Early trace callgraphs can be extremely large on systems with
several seconds of boot time. The max_depth parameter limits how
deep the graph trace goes and reduces the output size. This
parameter is the same as the max_graph_depth file in tracefs.

Link: http://lkml.kernel.org/r/1488499935-23216-1-git-send-email-todd.e.brandt@linux.intel.com

Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
[ changed comments about debugfs to tracefs ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-03 09:45:01 -05:00
Steven Rostedt (VMware) 92ad18ec26 ftrace/graph: Do not modify the EMPTY_HASH for the function_graph filter
On boot up, if the kernel command line sets a graph funtion with the kernel
command line options "ftrace_graph_filter" or "ftrace_graph_notrace" then it
updates the corresponding function graph hash, ftrace_graph_hash or
ftrace_graph_notrace_hash respectively. Unfortunately, at boot up, these
variables are pointers to the "EMPTY_HASH" which is a constant used as a
placeholder when a hash has no entities. The problem was that the comand
line version to set the hashes updated the actual EMPTY_HASH instead of
creating a new hash for the function graph. This broke the EMPTY_HASH
because not only did it modify a constant (not sure how that was allowed to
happen, except maybe because it was done at early boot, const variables were
still mutable), but it made the filters have functions listed in them when
they were actually empty.

The kernel command line function needs to allocate a new hash for the
function graph filters and assign the necessary variables to that new hash
instead.

Link: http://lkml.kernel.org/r/1488420091.7212.17.camel@linux.intel.com

Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: b9b0c831be ("ftrace: Convert graph filter to use hash tables")
Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-03 09:44:17 -05:00
Ingo Molnar b2d0910310 sched/headers: Prepare to use <linux/rcuupdate.h> instead of <linux/rculist.h> in <linux/sched.h>
We don't actually need the full rculist.h header in sched.h anymore,
we will be able to include the smaller rcupdate.h header instead.

But first update code that relied on the implicit header inclusion.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:38 +01:00
Ingo Molnar 68db0cf106 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h>
We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/task_stack.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:36 +01:00
Ingo Molnar 299300258d sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h>
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:35 +01:00
Ingo Molnar 6e84f31522 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h>
We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/mm.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

The APIs that are going to be moved first are:

   mm_alloc()
   __mmdrop()
   mmdrop()
   mmdrop_async_fn()
   mmdrop_async()
   mmget_not_zero()
   mmput()
   mmput_async()
   get_task_mm()
   mm_access()
   mm_release()

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:28 +01:00
Ingo Molnar ae7e81c077 sched/headers: Prepare for new header dependencies before moving code to <uapi/linux/sched/types.h>
We are going to move scheduler ABI details to <uapi/linux/sched/types.h>,
which will be used from a number of .c files.

Create empty placeholder header that maps to <linux/types.h>.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:27 +01:00
Ingo Molnar e601757102 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h>
We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which
will have to be picked up from other headers and .c files.

Create a trivial placeholder <linux/sched/clock.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:27 +01:00
Anton Blanchard 6b0b755142 perf/core: Rename CONFIG_[UK]PROBE_EVENT to CONFIG_[UK]PROBE_EVENTS
We have uses of CONFIG_UPROBE_EVENT and CONFIG_KPROBE_EVENT as
well as CONFIG_UPROBE_EVENTS and CONFIG_KPROBE_EVENTS.

Consistently use the plurals.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: davem@davemloft.net
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/20170216060050.20866-1-anton@ozlabs.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-01 10:26:39 +01:00
Linus Torvalds 45554b2357 Commit 79c6f448c8 ("tracing: Fix hwlat kthread migration") fixed a
bug that was caused by a race condition in initializing the hwlat
 thread. When fixing this code, I realized that it should have been done
 differently. Instead of doing the rewrite and sending that to stable,
 I just sent the above commit to fix the bug that should be back ported.
 
 This commit is on top of the quick fix commit to rewrite the code the
 way it should have been written in the first place.
 -----BEGIN PGP SIGNATURE-----
 
 iQExBAABCAAbBQJYtDsNFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
 +CQH/0BhSwjiCnlHAkHNFKn47O0yDtxBLj8ar4bUQRacDXeQyAGDP13hn3q3LvG9
 CRzDXaYrusA3fjGcgmtyU33am6LK84dPn5u2HSyEalDZNBel8l6oYLUZVWLgef02
 x43949nOeBy+KO02Y118zGyxFEPtYBnCVpguMa4vdVgnr04gECo2VH5FjnLMslKM
 W1j/WrbVaO8WObh7X01JZzozWwp3McW4x6H8PUWaHjnhN/Iv6+YGtNN/Sa4cq4V/
 CyCfxrZvN/Y/uMSGzVlhuXxeRc2PRsjjmAqN+8P4KZIGW5BstdiWbrVj+KaBLR9z
 6QERD3atiEIYI/QGEep7ZH795PI=
 =Bdwk
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull another tracing update from Steven Rostedt:
 "Commit 79c6f448c8 ("tracing: Fix hwlat kthread migration") fixed a
  bug that was caused by a race condition in initializing the hwlat
  thread. When fixing this code, I realized that it should have been
  done differently. Instead of doing the rewrite and sending that to
  stable, I just sent the above commit to fix the bug that should be
  back ported.

  This commit is on top of the quick fix commit to rewrite the code the
  way it should have been written in the first place"

* tag 'trace-v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Clean up the hwlat binding code
2017-02-27 13:36:19 -08:00
Linus Torvalds 79b17ea740 This release has no new tracing features, just clean ups, minor fixes
and small optimizations.
 -----BEGIN PGP SIGNATURE-----
 
 iQExBAABCAAbBQJYtDiAFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
 KygH/3sxuM9MCeJ29JsjmV49fHcNqryNZdvSadmnysPm+dFPiI6IgIIbh5R8H89b
 2V2gfQSmOTKHu3/wvJr/MprkGP275sWlZPORYFLDl/+NE/3q7g0NKOMWunLcv6dH
 QQRJIFjSMeGawA3KYBEcwBYMlgNd2VgtTxqLqSBhWth5omV6UevJNHhe3xzZ4nEE
 YbRX2mxwOuRHOyFp0Hem+Bqro4z1VXJ6YDxOvae2PP8krrIhIHYw9EI22GK68a2g
 EyKqKPPaEzfU8IjHIQCqIZta5RufnCrDbfHU0CComPANBRGO7g+ZhLO11a/Z316N
 lyV7JqtF680iem7NKcQlwEwhlLE=
 =HJnl
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "This release has no new tracing features, just clean ups, minor fixes
  and small optimizations"

* tag 'trace-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (25 commits)
  tracing: Remove outdated ring buffer comment
  tracing/probes: Fix a warning message to show correct maximum length
  tracing: Fix return value check in trace_benchmark_reg()
  tracing: Use modern function declaration
  jump_label: Reduce the size of struct static_key
  tracing/probe: Show subsystem name in messages
  tracing/hwlat: Update old comment about migration
  timers: Make flags output in the timer_start tracepoint useful
  tracing: Have traceprobe_probes_write() not access userspace unnecessarily
  tracing: Have COMM event filter key be treated as a string
  ftrace: Have set_graph_function handle multiple functions in one write
  ftrace: Do not hold references of ftrace_graph_{notrace_}hash out of graph_lock
  tracing: Reset parser->buffer to allow multiple "puts"
  ftrace: Have set_graph_functions handle write with RDWR
  ftrace: Reset fgd->hash in ftrace_graph_write()
  ftrace: Replace (void *)1 with a meaningful macro name FTRACE_GRAPH_EMPTY
  ftrace: Create a slight optimization on searching the ftrace_hash
  tracing: Add ftrace_hash_key() helper function
  ftrace: Convert graph filter to use hash tables
  ftrace: Expose ftrace_hash_empty and ftrace_lookup_ip
  ...
2017-02-27 13:26:17 -08:00
Chunyu Hu 3a150df945 tracing: Fix code comment for ftrace_ops_get_func()
There is no function 'ftrace_ops_recurs_func' existing in the current code,
it was renamed to ftrace_ops_assist_func() in commit c68c0fa293
("ftrace: Have ftrace_ops_get_func() handle RCU and PER_CPU flags too").
Update the comment to the correct function name.

Link: http://lkml.kernel.org/r/1487723366-14463-1-git-send-email-chuhu@redhat.com

Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-27 11:11:26 -05:00
Linus Torvalds f1ef09fde1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
 "There is a lot here. A lot of these changes result in subtle user
  visible differences in kernel behavior. I don't expect anything will
  care but I will revert/fix things immediately if any regressions show
  up.

  From Seth Forshee there is a continuation of the work to make the vfs
  ready for unpriviled mounts. We had thought the previous changes
  prevented the creation of files outside of s_user_ns of a filesystem,
  but it turns we missed the O_CREAT path. Ooops.

  Pavel Tikhomirov and Oleg Nesterov worked together to fix a long
  standing bug in the implemenation of PR_SET_CHILD_SUBREAPER where only
  children that are forked after the prctl are considered and not
  children forked before the prctl. The only known user of this prctl
  systemd forks all children after the prctl. So no userspace
  regressions will occur. Holding earlier forked children to the same
  rules as later forked children creates a semantic that is sane enough
  to allow checkpoing of processes that use this feature.

  There is a long delayed change by Nikolay Borisov to limit inotify
  instances inside a user namespace.

  Michael Kerrisk extends the API for files used to maniuplate
  namespaces with two new trivial ioctls to allow discovery of the
  hierachy and properties of namespaces.

  Konstantin Khlebnikov with the help of Al Viro adds code that when a
  network namespace exits purges it's sysctl entries from the dcache. As
  in some circumstances this could use a lot of memory.

  Vivek Goyal fixed a bug with stacked filesystems where the permissions
  on the wrong inode were being checked.

  I continue previous work on ptracing across exec. Allowing a file to
  be setuid across exec while being ptraced if the tracer has enough
  credentials in the user namespace, and if the process has CAP_SETUID
  in it's own namespace. Proc files for setuid or otherwise undumpable
  executables are now owned by the root in the user namespace of their
  mm. Allowing debugging of setuid applications in containers to work
  better.

  A bug I introduced with permission checking and automount is now
  fixed. The big change is to mark the mounts that the kernel initiates
  as a result of an automount. This allows the permission checks in sget
  to be safely suppressed for this kind of mount. As the permission
  check happened when the original filesystem was mounted.

  Finally a special case in the mount namespace is removed preventing
  unbounded chains in the mount hash table, and making the semantics
  simpler which benefits CRIU.

  The vfs fix along with related work in ima and evm I believe makes us
  ready to finish developing and merge fully unprivileged mounts of the
  fuse filesystem. The cleanups of the mount namespace makes discussing
  how to fix the worst case complexity of umount. The stacked filesystem
  fixes pave the way for adding multiple mappings for the filesystem
  uids so that efficient and safer containers can be implemented"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc/sysctl: Don't grab i_lock under sysctl_lock.
  vfs: Use upper filesystem inode in bprm_fill_uid()
  proc/sysctl: prune stale dentries during unregistering
  mnt: Tuck mounts under others instead of creating shadow/side mounts.
  prctl: propagate has_child_subreaper flag to every descendant
  introduce the walk_process_tree() helper
  nsfs: Add an ioctl() to return owner UID of a userns
  fs: Better permission checking for submounts
  exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction
  vfs: open() with O_CREAT should not create inodes with unknown ids
  nsfs: Add an ioctl() to return the namespace type
  proc: Better ownership of files for non-dumpable tasks in user namespaces
  exec: Remove LSM_UNSAFE_PTRACE_CAP
  exec: Test the ptracer's saved cred to see if the tracee can gain caps
  exec: Don't reset euid and egid when the tracee has CAP_SETUID
  inotify: Convert to using per-namespace limits
2017-02-23 20:33:51 -08:00
Ross Zwisler d3213e8fd4 tracing: add __print_flags_u64()
Patch series "DAX tracepoints, mm argument simplification", v4.

This contains both my DAX tracepoint code and Dave Jiang's MM argument
simplifications.  Dave's code was written with my tracepoint code as a
baseline, so it seemed simplest to keep them together in a single series.

This patch (of 7):

Add __print_flags_u64() and the helper trace_print_flags_seq_u64() in the
same spirit as __print_symbolic_u64() and trace_print_symbols_seq_u64().
These functions allow us to print symbols associated with flags that are
64 bits wide even on 32 bit machines.

These will be used by the DAX code so that we can print the flags set in a
pfn_t such as PFN_SG_CHAIN, PFN_SG_LAST, PFN_DEV and PFN_MAP.

Without this new function I was getting errors like the following when
compiling for i386:

  include/linux/pfn_t.h:13:22: warning: large integer implicitly truncated to unsigned type [-Woverflow]
   #define PFN_SG_CHAIN (1ULL << (BITS_PER_LONG_LONG - 1))
    ^

Link: http://lkml.kernel.org/r/1484085142-2297-2-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:26 -08:00
Linus Torvalds 3051bf36c2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
      Varadhan.

   2) Simplify classifier state on sk_buff in order to shrink it a bit.
      From Willem de Bruijn.

   3) Introduce SIPHASH and it's usage for secure sequence numbers and
      syncookies. From Jason A. Donenfeld.

   4) Reduce CPU usage for ICMP replies we are going to limit or
      suppress, from Jesper Dangaard Brouer.

   5) Introduce Shared Memory Communications socket layer, from Ursula
      Braun.

   6) Add RACK loss detection and allow it to actually trigger fast
      recovery instead of just assisting after other algorithms have
      triggered it. From Yuchung Cheng.

   7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.

   8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.

   9) Export MPLS packet stats via netlink, from Robert Shearman.

  10) Significantly improve inet port bind conflict handling, especially
      when an application is restarted and changes it's setting of
      reuseport. From Josef Bacik.

  11) Implement TX batching in vhost_net, from Jason Wang.

  12) Extend the dummy device so that VF (virtual function) features,
      such as configuration, can be more easily tested. From Phil
      Sutter.

  13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
      Dumazet.

  14) Add new bpf MAP, implementing a longest prefix match trie. From
      Daniel Mack.

  15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.

  16) Add new aquantia driver, from David VomLehn.

  17) Add bpf tracepoints, from Daniel Borkmann.

  18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
      Florian Fainelli.

  19) Remove custom busy polling in many drivers, it is done in the core
      networking since 4.5 times. From Eric Dumazet.

  20) Support XDP adjust_head in virtio_net, from John Fastabend.

  21) Fix several major holes in neighbour entry confirmation, from
      Julian Anastasov.

  22) Add XDP support to bnxt_en driver, from Michael Chan.

  23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.

  24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.

  25) Support GRO in IPSEC protocols, from Steffen Klassert"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
  Revert "ath10k: Search SMBIOS for OEM board file extension"
  net: socket: fix recvmmsg not returning error from sock_error
  bnxt_en: use eth_hw_addr_random()
  bpf: fix unlocking of jited image when module ronx not set
  arch: add ARCH_HAS_SET_MEMORY config
  net: napi_watchdog() can use napi_schedule_irqoff()
  tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
  net/hsr: use eth_hw_addr_random()
  net: mvpp2: enable building on 64-bit platforms
  net: mvpp2: switch to build_skb() in the RX path
  net: mvpp2: simplify MVPP2_PRS_RI_* definitions
  net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
  net: mvpp2: remove unused register definitions
  net: mvpp2: simplify mvpp2_bm_bufs_add()
  net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
  net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
  net: mvpp2: release reference to txq_cpu[] entry after unmapping
  net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
  net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
  net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
  ...
2017-02-22 10:15:09 -08:00
Jens Axboe 818551e2b2 Merge branch 'for-4.11/next' into for-4.11/linus-merge
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-17 14:08:19 -07:00
Daniel Borkmann c78f8bdfa1 bpf: mark all registered map/prog types as __ro_after_init
All map types and prog types are registered to the BPF core through
bpf_register_map_type() and bpf_register_prog_type() during init and
remain unchanged thereafter. As by design we don't (and never will)
have any pluggable code that can register to that at any later point
in time, lets mark all the existing bpf_{map,prog}_type_list objects
in the tree as __ro_after_init, so they can be moved to read-only
section from then onwards.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 13:40:04 -05:00
Joel Fernandes 67d04bb2bc tracing: Remove outdated ring buffer comment
The comment about ring buffer's organization is outdated and the code sits
elsewhere, remove the comment.
Link: http://lkml.kernel.org/r/20170217041058.23904-1-joelaf@google.com

Cc: Ingo Molnar <mingo@redhat.com>

Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-17 09:56:51 -05:00
Masami Hiramatsu bef5da6059 tracing/probes: Fix a warning message to show correct maximum length
Since tracing/*probe_events will accept a probe definition
up to 4096 - 2 ('\n' and '\0') bytes, it must show 4094 instead
of 4096 in warning message.

Note that there is one possible case of exceed 4094. If user
prepare 4096 bytes null-terminated string and syscall write
it with the count == 4095, then it can be accepted. However,
if user puts a '\n' after that, it must rejected.
So IMHO, the warning message should indicate shorter one,
since it is safer.

Link: http://lkml.kernel.org/r/148673290462.2579.7966778294009665632.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-15 10:32:48 -05:00
Wei Yongjun 8f0994bb8c tracing: Fix return value check in trace_benchmark_reg()
In case of error, the function kthread_run() returns ERR_PTR() and never
returns NULL. The NULL test in the return value check should be replaced
with IS_ERR().

Link: http://lkml.kernel.org/r/20170112135502.28556-1-weiyj.lk@gmail.com

Cc: stable@vger.kernel.org
Fixes: 81dc9f0e ("tracing: Add tracepoint benchmark tracepoint")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-15 09:02:27 -05:00
Arnd Bergmann eb583cd484 tracing: Use modern function declaration
We get a lot of harmless warnings about this header file at W=1 level
because of an unusual function declaration:

kernel/trace/trace.h:766:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration]

This moves the inline statement where it normally belongs, avoiding the
warning.

Link: http://lkml.kernel.org/r/20170123122521.3389010-1-arnd@arndb.de

Fixes: 4046bf023b ("ftrace: Expose ftrace_hash_empty and ftrace_lookup_ip")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-15 09:02:27 -05:00
Masami Hiramatsu 7257634135 tracing/probe: Show subsystem name in messages
Show "trace_probe:", "trace_kprobe:" and "trace_uprobe:"
headers for each warning/error/info message. This will
help people to notice that kprobe/uprobe events caused
those messages.

Link: http://lkml.kernel.org/r/148646647813.24658.16705315294927615333.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-15 09:02:25 -05:00
Luiz Capitulino 8e0f11429a tracing/hwlat: Update old comment about migration
The ftrace hwlat does support a cpumask.

Link: http://lkml.kernel.org/r/20170213122517.6e211955@redhat.com

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-15 09:02:25 -05:00
Steven Rostedt (VMware) 1f9b3546cf tracing: Have traceprobe_probes_write() not access userspace unnecessarily
The code in traceprobe_probes_write() reads up to 4096 bytes from userpace
for each line. If userspace passes in several lines to execute, the code
will do a large read for each line, even though, it is highly likely that
the first read from userspace received all of the lines at once.

I changed the logic to do a single read from userspace, and to only read
from userspace again if not all of the read from userspace made it in.

I tested this by adding printk()s and writing files that would test -1, ==,
and +1 the buffer size, to make sure that there's no overflows and that if a
single line is written with +1 the buffer size, that it fails properly.

Link: http://lkml.kernel.org/r/20170209180458.5c829ab2@gandalf.local.home

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-15 09:00:55 -05:00
Steven Rostedt (VMware) 4c7384131c tracing: Have COMM event filter key be treated as a string
The GLOB operation "~" should be able to work with the COMM filter key in
order to trace programs with a glob. For example

  echo 'COMM ~ "systemd*"' > events/syscalls/filter

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-10 14:19:45 -05:00
David S. Miller 3efa70d78f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The conflict was an interaction between a bug fix in the
netvsc driver in 'net' and an optimization of the RX path
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 16:29:30 -05:00
Daniel Borkmann 3898fac1f4 trace: rename trace_print_hex_seq arg and add kdoc
Steven suggested to improve trace_print_hex_seq() a bit after commit
2acae0d5b0 ("trace: add variant without spacing in trace_print_hex_seq")
in two ways: i) by adding a kdoc comment for the helper function
itself and ii) by renaming 'spacing' argument into 'concatenate'
to better denote that we don't add spaces between each hex bytes.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 15:50:18 -05:00
Steven Rostedt (VMware) e704eff3ff ftrace: Have set_graph_function handle multiple functions in one write
Currently, only one function can be written to set_graph_function and
set_graph_notrace. The last function in the list will have saved, even
though other functions will be added then removed.

Change the behavior to be the same as set_ftrace_function as to allow
multiple functions to be written. If any one fails, none of them will be
added. The addition of the functions are done at the end when the file is
closed.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-03 10:59:52 -05:00
Steven Rostedt (VMware) 649b988b12 ftrace: Do not hold references of ftrace_graph_{notrace_}hash out of graph_lock
The hashs ftrace_graph_hash and ftrace_graph_notrace_hash are modified
within the graph_lock being held. Holding a pointer to them and passing them
along can lead to a use of a stale pointer (fgd->hash). Move assigning the
pointer and its use to within the holding of the lock. Note, it's an
rcu_sched protected data, and other instances of referencing them are done
with preemption disabled. But the file manipuation code must be protected by
the lock.

The fgd->hash pointer is set to NULL when the lock is being released.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-03 10:59:42 -05:00
Steven Rostedt (VMware) 0e684b6578 tracing: Reset parser->buffer to allow multiple "puts"
trace_parser_put() simply frees the allocated parser buffer. But it does not
reset the pointer that was freed. This means that if trace_parser_put() is
called on the same parser more than once, it will corrupt the allocation
system. Setting parser->buffer to NULL after free allows it to be called
more than once without any ill effect.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-03 10:59:31 -05:00
Steven Rostedt (VMware) ae98d27afc ftrace: Have set_graph_functions handle write with RDWR
Since reading the set_graph_functions uses seq functions, which sets the
file->private_data pointer to a seq_file descriptor. On writes the
ftrace_graph_data descriptor is set to file->private_data. But if the file
is opened for RDWR, the ftrace_graph_write() will incorrectly use the
file->private_data descriptor instead of
((struct seq_file *)file->private_data)->private pointer, and this can crash
the kernel.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-03 10:59:23 -05:00
Steven Rostedt (VMware) d4ad9a1cca ftrace: Reset fgd->hash in ftrace_graph_write()
fgd->hash is saved and then freed, but is never reset to either
ftrace_graph_hash nor ftrace_graph_notrace_hash. But if multiple writes are
performed, then the freed hash could be accessed again.

 # cd /sys/kernel/debug/tracing
 # head -1000 available_filter_functions > /tmp/funcs
 # cat /tmp/funcs > set_graph_function

Causes:

 general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
 Modules linked in:  [...]
 CPU: 2 PID: 1337 Comm: cat Not tainted 4.10.0-rc2-test-00010-g6b052e9 #32
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
 task: ffff880113a12200 task.stack: ffffc90001940000
 RIP: 0010:free_ftrace_hash+0x7c/0x160
 RSP: 0018:ffffc90001943db0 EFLAGS: 00010246
 RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: 6b6b6b6b6b6b6b6b
 RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff8800ce1e1d40
 RBP: ffff8800ce1e1d50 R08: 0000000000000000 R09: 0000000000006400
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: ffff8800ce1e1d40 R14: 0000000000004000 R15: 0000000000000001
 FS:  00007f9408a07740(0000) GS:ffff88011e500000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000aee1f0 CR3: 0000000116bb4000 CR4: 00000000001406e0
 Call Trace:
  ? ftrace_graph_write+0x150/0x190
  ? __vfs_write+0x1f6/0x210
  ? __audit_syscall_entry+0x17f/0x200
  ? rw_verify_area+0xdb/0x210
  ? _cond_resched+0x2b/0x50
  ? __sb_start_write+0xb4/0x130
  ? vfs_write+0x1c8/0x330
  ? SyS_write+0x62/0xf0
  ? do_syscall_64+0xa3/0x1b0
  ? entry_SYSCALL64_slow_path+0x25/0x25
 Code: 01 48 85 db 0f 84 92 00 00 00 b8 01 00 00 00 d3 e0 85 c0 7e 3f 83 e8 01 48 8d 6f 10 45 31 e4 4c 8d 34 c5 08 00 00 00 49 8b 45 08 <4a> 8b 34 20 48 85 f6 74 13 48 8b 1e 48 89 ef e8 20 fa ff ff 48
 RIP: free_ftrace_hash+0x7c/0x160 RSP: ffffc90001943db0
 ---[ end trace 999b48216bf4b393 ]---

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-03 10:59:06 -05:00
Steven Rostedt (VMware) 555fc7813e ftrace: Replace (void *)1 with a meaningful macro name FTRACE_GRAPH_EMPTY
When the set_graph_function or set_graph_notrace contains no records, a
banner is displayed of either "#### all functions enabled ####" or
"#### all functions disabled ####" respectively. To tell the seq operations
to do this, (void *)1 is passed as a return value. Instead of using a
hardcoded meaningless variable, define it as a macro.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-03 10:58:48 -05:00
Steven Rostedt (VMware) 2b2c279c81 ftrace: Create a slight optimization on searching the ftrace_hash
This is a micro-optimization, but as it has to deal with a fast path of the
function tracer, these optimizations can be noticed.

The ftrace_lookup_ip() returns true if the given ip is found in the hash. If
it's not found or the hash is NULL, it returns false. But there's some cases
that a NULL hash is a true, and the ftrace_hash_empty() is tested before
calling ftrace_lookup_ip() in those cases. But as ftrace_lookup_ip() tests
that first, that adds a few extra unneeded instructions in those cases.

A new static "always_inlined" function is created that does not perform the
hash empty test. This most only be used by callers that do the check first
anyway, as an empty or NULL hash could cause a crash if a lookup is
performed on it.

Also add kernel doc for the ftrace_lookup_ip() main function.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-03 10:58:22 -05:00
Steven Rostedt (VMware) 2b0cce0e19 tracing: Add ftrace_hash_key() helper function
Replace the couple of use cases that has small logic to produce the ftrace
function key id with a helper function. No need for duplicate code.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-03 10:58:05 -05:00
David S. Miller e2160156bf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All merge conflicts were simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02 16:54:00 -05:00
Omar Sandoval 6ac93117ab blktrace: use existing disk debugfs directory
We may already have a directory to put the blktrace stuff in if

1. The disk uses blk-mq
2. CONFIG_BLK_DEBUG_FS is enabled
3. We are tracing the whole disk and not a partition

Instead of hardcoding this very specific case, let's use the new
debugfs_lookup(). If the directory exists, we use it, otherwise we
create one and clean it up later.

Fixes: 07e4fead45 ("blk-mq: create debugfs directory tree")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02 10:20:16 -07:00
Omar Sandoval 18fbda91c6 block: use same block debugfs directory for blk-mq and blktrace
When I added the blk-mq debugging information to debugfs, I didn't
notice that blktrace also creates a "block" directory in debugfs. Make
them use the same dentry, now created in the core block code. Based on a
patch from Jens.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02 10:20:16 -07:00
Omar Sandoval a428d314eb blktrace: make do_blk_trace_setup() static
This isn't used outside of blktrace.c anymore.

Fixes: 62c2a7d969 ("block: push BKL into blktrace ioctls")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02 10:20:16 -07:00
Arnd Bergmann 26a346f23c tracing/kprobes: Fix __init annotation
clang complains about "__init" being attached to a struct name:

kernel/trace/trace_kprobe.c:1375:15: error: '__section__' attribute only applies to functions and global variables

The intention must have been to mark the function as __init instead of
the type, so move the attribute there.

Link: http://lkml.kernel.org/r/20170201165826.2625888-1-arnd@arndb.de

Fixes: f18f97ac43 ("tracing/kprobes: Add a helper method to return number of probe hits")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-02 10:48:35 -05:00
Eric W. Biederman 93faccbbfa fs: Better permission checking for submounts
To support unprivileged users mounting filesystems two permission
checks have to be performed: a test to see if the user allowed to
create a mount in the mount namespace, and a test to see if
the user is allowed to access the specified filesystem.

The automount case is special in that mounting the original filesystem
grants permission to mount the sub-filesystems, to any user who
happens to stumble across the their mountpoint and satisfies the
ordinary filesystem permission checks.

Attempting to handle the automount case by using override_creds
almost works.  It preserves the idea that permission to mount
the original filesystem is permission to mount the sub-filesystem.
Unfortunately using override_creds messes up the filesystems
ordinary permission checks.

Solve this by being explicit that a mount is a submount by introducing
vfs_submount, and using it where appropriate.

vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let
sget and friends know that a mount is a submount so they can take appropriate
action.

sget and sget_userns are modified to not perform any permission checks
on submounts.

follow_automount is modified to stop using override_creds as that
has proven problemantic.

do_mount is modified to always remove the new MS_SUBMOUNT flag so
that we know userspace will never by able to specify it.

autofs4 is modified to stop using current_real_cred that was put in
there to handle the previous version of submount permission checking.

cifs is modified to pass the mountpoint all of the way down to vfs_submount.

debugfs is modified to pass the mountpoint all of the way down to
trace_automount by adding a new parameter.  To make this change easier
a new typedef debugfs_automount_t is introduced to capture the type of
the debugfs automount function.

Cc: stable@vger.kernel.org
Fixes: 069d5ac9ae ("autofs:  Fix automounts by using current_real_cred()->uid")
Fixes: aeaa4a79ff ("fs: Call d_automount with the filesystems creds")
Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-02-02 04:36:12 +13:00
Steven Rostedt (VMware) f447c196fe tracing: Clean up the hwlat binding code
Instead of initializing the affinity of the hwlat kthread in the thread
itself, simply set up the initial affinity at thread creation. This
simplifies the code.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-01-31 16:48:23 -05:00
Christoph Hellwig 57292b58dd block: introduce blk_rq_is_passthrough
This can be used to check for fs vs non-fs requests and basically
removes all knowledge of BLOCK_PC specific from the block layer,
as well as preparing for removing the cmd_type field in struct request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31 14:00:34 -07:00
Steven Rostedt (VMware) 79c6f448c8 tracing: Fix hwlat kthread migration
The hwlat tracer creates a kernel thread at start of the tracer. It is
pinned to a single CPU and will move to the next CPU after each period of
running. If the user modifies the migration thread's affinity, it will not
change after that happens.

The original code created the thread at the first instance it was called,
but later was changed to destroy the thread after the tracer was finished,
and would not be created until the next instance of the tracer was
established. The code that initialized the affinity was only called on the
initial instantiation of the tracer. After that, it was not initialized, and
the previous affinity did not match the current newly created one, making
it appear that the user modified the thread's affinity when it did not, and
the thread failed to migrate again.

Cc: stable@vger.kernel.org
Fixes: 0330f7aa8e ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-01-31 09:13:49 -05:00
Christoph Hellwig 48b77ad608 block: cleanup tracing
A couple tweaks to the tracing code:

 - trace the request size for all requests
 - trace request sector and nr_sectors only for fs requests, enforced by
   helpers
 - drop SCSI CDB tracing - we have SCSI tracing for this and are going
   to me the CDB out of the generic struct request soon.

With this the tracing code stops to know about BLOCK_PC requests entirely,
it's just FS vs passthrough requests now, where the latter includes any
driver-private requests.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27 15:08:35 -07:00
Daniel Borkmann 2acae0d5b0 trace: add variant without spacing in trace_print_hex_seq
For upcoming tracepoint support for BPF, we want to dump the program's
tag. Format should be similar to __print_hex(), but without spacing.
Add a __print_hex_str() variant for exactly that purpose that reuses
trace_print_hex_seq().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 13:17:47 -05:00
Namhyung Kim b9b0c831be ftrace: Convert graph filter to use hash tables
Use ftrace_hash instead of a static array of a fixed size.  This is
useful when a graph filter pattern matches to a large number of
functions.  Now hash lookup is done with preemption disabled to protect
from the hash being changed/freed.

Link: http://lkml.kernel.org/r/20170120024447.26097-3-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-01-20 14:50:58 -05:00
Namhyung Kim 4046bf023b ftrace: Expose ftrace_hash_empty and ftrace_lookup_ip
It will be used when checking graph filter hashes later.

Link: http://lkml.kernel.org/r/20170120024447.26097-2-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
[ Moved ftrace_hash dec and functions outside of FUNCTION_GRAPH define ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-01-20 14:50:21 -05:00
Gianluca Borello a5e8c07059 bpf: add bpf_probe_read_str helper
Provide a simple helper with the same semantics of strncpy_from_unsafe():

int bpf_probe_read_str(void *dst, int size, const void *unsafe_addr)

This gives more flexibility to a bpf program. A typical use case is
intercepting a file name during sys_open(). The current approach is:

SEC("kprobe/sys_open")
void bpf_sys_open(struct pt_regs *ctx)
{
	char buf[PATHLEN]; // PATHLEN is defined to 256
	bpf_probe_read(buf, sizeof(buf), ctx->di);

	/* consume buf */
}

This is suboptimal because the size of the string needs to be estimated
at compile time, causing more memory to be copied than often necessary,
and can become more problematic if further processing on buf is done,
for example by pushing it to userspace via bpf_perf_event_output(),
since the real length of the string is unknown and the entire buffer
must be copied (and defining an unrolled strnlen() inside the bpf
program is a very inefficient and unfeasible approach).

With the new helper, the code can easily operate on the actual string
length rather than the buffer size:

SEC("kprobe/sys_open")
void bpf_sys_open(struct pt_regs *ctx)
{
	char buf[PATHLEN]; // PATHLEN is defined to 256
	int res = bpf_probe_read_str(buf, sizeof(buf), ctx->di);

	/* consume buf, for example push it to userspace via
	 * bpf_perf_event_output(), but this time we can use
	 * res (the string length) as event size, after checking
	 * its boundaries.
	 */
}

Another useful use case is when parsing individual process arguments or
individual environment variables navigating current->mm->arg_start and
current->mm->env_start: using this helper and the return value, one can
quickly iterate at the right offset of the memory area.

The code changes simply leverage the already existent
strncpy_from_unsafe() kernel function, which is safe to be called from a
bpf program as it is used in bpf_trace_printk().

Signed-off-by: Gianluca Borello <g.borello@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20 12:08:43 -05:00
Namhyung Kim 3e278c0dc1 ftrace: Factor out __ftrace_hash_move()
The __ftrace_hash_move() is to allocates properly-sized hash and move
entries in the src ftrace_hash.  It will be used to set function graph
filters which has nothing to do with the dyn_ftrace records.

Link: http://lkml.kernel.org/r/20170120024447.26097-1-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-01-20 11:40:07 -05:00
Steven Rostedt (VMware) 068f530b3f tracing: Add the constant count for branch tracer
The unlikely/likely branch profiler now gets called even if the if statement
is a constant (always goes in one direction without a compare). Add a value
to denote this in the likely/unlikely tracer as well.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-01-19 08:57:41 -05:00
Steven Rostedt (VMware) 134e6a034c tracing: Show number of constants profiled in likely profiler
Now that constants are traced, it is useful to see the number of constants
that are traced in the likely/unlikely profiler in order to know if they
should be ignored or not.

The likely/unlikely will display a number after the "correct" number if a
"constant" count exists.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-01-19 08:57:14 -05:00
Steven Rostedt (VMware) d45ae1f704 tracing: Process constants for (un)likely() profiler
When running the likely/unlikely profiler, one of the results did not look
accurate. It noted that the unlikely() in link_path_walk() was 100%
incorrect. When I added a trace_printk() to see what was happening there, it
became 80% correct! Looking deeper into what whas happening, I found that
gcc split that if statement into two paths. One where the if statement
became a constant, the other path a variable. The other path had the if
statement always hit (making the unlikely there, always false), but since
the #define unlikely() has:

  #define unlikely() (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))

Where constants are ignored by the branch profiler, the "constant" path
made by the compiler was ignored, even though it was hit 80% of the time.

By just passing the constant value to the __branch_check__() function and
tracing it out of line (as always correct, as likely/unlikely isn't a factor
for constants), then we get back the accurate readings of branches that were
optimized by gcc causing part of the execution to become constant.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-01-17 15:13:05 -05:00
Kenny Yu 6496bb72bf uprobe: Find last occurrence of ':' when parsing uprobe PATH:OFFSET
Previously, `create_trace_uprobe` found the *first* occurence
of the ':' character when parsing `PATH:OFFSET` for a uprobe.
However, if the path contains a ':' character, then the function
would parse the path incorrectly. Even worse, if the path does not
exist, the subsequent call to `kern_path()` would set `ret` to
`ENOENT`, leading to very cryptic errno values in user space.

The fix is to find the *last* occurence of ':'.

How to repro:: The write fails with "No such file or directory", suggesting
incorrectly that the `uprobe_events` file does not exist.

  $ mkdir testing && cd testing
  $ cp /bin/bash .
  $ cp /bin/bash ./bash:with:colon
  $ echo "p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x6" > /sys/kernel/debug/tracing/uprobe_events     # this works
  $ echo "p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x6" >> /sys/kernel/debug/tracing/uprobe_events     # this doesn't
  -bash: echo: write error: No such file or directory

With the patch:

  $ echo "p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x6" > /sys/kernel/debug/tracing/uprobe_events     # this still works
  $ echo "p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x6" >> /sys/kernel/debug/tracing/uprobe_events     # this works now too!
  $ cat /sys/kernel/debug/tracing/uprobe_events
  p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x0000000000000006
  p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x0000000000000006

Link: http://lkml.kernel.org/r/20170113165834.4081016-1-kennyyu@fb.com

Signed-off-by: Kenny Yu <kennyyu@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-01-17 12:57:47 -05:00
Daniel Borkmann 2d071c643f bpf, trace: make ctx access checks more robust
Make sure that ctx cannot potentially be accessed oob by asserting
explicitly that ctx access size into pt_regs for BPF_PROG_TYPE_KPROBE
programs must be within limits. In case some 32bit archs have pt_regs
not being a multiple of 8, then BPF_DW access could cause such access.

BPF_PROG_TYPE_KPROBE progs don't have a ctx conversion function since
there's no extra mapping needed. kprobe_prog_is_valid_access() didn't
enforce sizeof(long) as the only allowed access size, since LLVM can
generate non BPF_W/BPF_DW access to regs from time to time.

For BPF_PROG_TYPE_TRACEPOINT we don't have a ctx conversion either, so
add a BUILD_BUG_ON() check to make sure that BPF_DW access will not be
a similar issue in future (ctx works on event buffer as opposed to
pt_regs there).

Fixes: 2541517c32 ("tracing, perf: Implement BPF programs attached to kprobes")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16 14:41:42 -05:00
Daniel Borkmann 6b8cc1d11e bpf: pass original insn directly to convert_ctx_access
Currently, when calling convert_ctx_access() callback for the various
program types, we pass in insn->dst_reg, insn->src_reg, insn->off from
the original instruction. This information is needed to rewrite the
instruction that is based on the user ctx structure into a kernel
representation for the ctx. As we'd like to allow access size beyond
just BPF_W, we'd need also insn->code for that in order to decode the
original access size. Given that, lets just pass insn directly to the
convert_ctx_access() callback and work on that to not clutter the
callback with even more arguments we need to pass when everything is
already contained in insn. So lets go through that once, no functional
change.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 10:00:31 -05:00
Alexei Starovoitov 39f19ebbf5 bpf: rename ARG_PTR_TO_STACK
since ARG_PTR_TO_STACK is no longer just pointer to stack
rename it to ARG_PTR_TO_MEM and adjust comment.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:56:27 -05:00
Al Viro f81dc7d7d5 splice_pipe_desc: kill ->flags
no users left

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-26 23:53:38 -05:00
Thomas Gleixner a5a1d1c291 clocksource: Use a plain u64 instead of cycle_t
There is no point in having an extra type for extra confusion. u64 is
unambiguous.

Conversion was done with the following coccinelle script:

@rem@
@@
-typedef u64 cycle_t;

@fix@
typedef cycle_t;
@@
-cycle_t
+u64

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
2016-12-25 11:04:12 +01:00
Linus Torvalds 179a7ba680 This release has a few updates:
o STM can hook into the function tracer
  o Function filtering now supports more advance glob matching
  o Ftrace selftests updates and added tests
  o Softirq tag in traces now show only softirqs
  o ARM nop added to non traced locations at compile time
  o New trace_marker_raw file that allows for binary input
  o Optimizations to the ring buffer
  o Removal of kmap in trace_marker
  o Wakeup and irqsoff tracers now adhere to the set_graph_notrace file
  o Other various fixes and clean ups
 
 Note, there are two patches marked for stable. These were discovered
 near the end of the 4.9 rc release cycle. By the time I had them tested
 it was just a matter of days before 4.9 would be released, and I
 figured I would just submit them in the merge window. They are old
 bugs and not critical. Nothing non-root could abuse.
 -----BEGIN PGP SIGNATURE-----
 
 iQExBAABCAAbBQJYUrFHFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
 2+AIAIr20kSQV/nA5htGAeCTobVk3WUxY6bvjd9mIJDKPP19akNLyREW0G3KnfCr
 yhx4aFRZG98fRu/6F8qieRosyN36lADDVYHelMFHMpcTOpE2aZGjaaOuNGxOEA9v
 FmMPTX+K3+dzKyFP4l68R3+5JuQ1/AqLTioTWeLW8IDQ2OOVsjD8+0BuXrNKMJDY
 o6U4Hk5U/vn+zHc6BmgBzloAXemBd7iJ1t5V3FRRGvm8yv3HU85Twc5ofGeYTWvB
 J8PboEywRlIzxg0Kd8mxnMI5PgaKZSEc2ub8E7cY/CZ5PYpDE2xDA2hJmJgfYp00
 1VW+DHRpRZfElsCcya6S6P4bs5Y=
 =MGZ/
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "This release has a few updates:

   - STM can hook into the function tracer
   - Function filtering now supports more advance glob matching
   - Ftrace selftests updates and added tests
   - Softirq tag in traces now show only softirqs
   - ARM nop added to non traced locations at compile time
   - New trace_marker_raw file that allows for binary input
   - Optimizations to the ring buffer
   - Removal of kmap in trace_marker
   - Wakeup and irqsoff tracers now adhere to the set_graph_notrace file
   - Other various fixes and clean ups"

* tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (42 commits)
  selftests: ftrace: Shift down default message verbosity
  kprobes/trace: Fix kprobe selftest for newer gcc
  tracing/kprobes: Add a helper method to return number of probe hits
  tracing/rb: Init the CPU mask on allocation
  tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
  tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too
  fgraph: Handle a case where a tracer ignores set_graph_notrace
  tracing: Replace kmap with copy_from_user() in trace_marker writing
  ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it
  tracing: Allow benchmark to be enabled at early_initcall()
  tracing: Have system enable return error if one of the events fail
  tracing: Do not start benchmark on boot up
  tracing: Have the reg function allow to fail
  ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inline
  ring-buffer: Froce rb_update_write_stamp() to be inlined
  ring-buffer: Force inline of hotpath helper functions
  tracing: Make __buffer_unlock_commit() always_inline
  tracing: Make tracepoint_printk a static_key
  ring-buffer: Always inline rb_event_data()
  ring-buffer: Make rb_reserve_next_event() always inlined
  ...
2016-12-15 13:49:34 -08:00
Linus Torvalds 36869cb93d Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
 "This is the main block pull request this series. Contrary to previous
  release, I've kept the core and driver changes in the same branch. We
  always ended up having dependencies between the two for obvious
  reasons, so makes more sense to keep them together. That said, I'll
  probably try and keep more topical branches going forward, especially
  for cycles that end up being as busy as this one.

  The major parts of this pull request is:

   - Improved support for O_DIRECT on block devices, with a small
     private implementation instead of using the pig that is
     fs/direct-io.c. From Christoph.

   - Request completion tracking in a scalable fashion. This is utilized
     by two components in this pull, the new hybrid polling and the
     writeback queue throttling code.

   - Improved support for polling with O_DIRECT, adding a hybrid mode
     that combines pure polling with an initial sleep. From me.

   - Support for automatic throttling of writeback queues on the block
     side. This uses feedback from the device completion latencies to
     scale the queue on the block side up or down. From me.

   - Support from SMR drives in the block layer and for SD. From Hannes
     and Shaun.

   - Multi-connection support for nbd. From Josef.

   - Cleanup of request and bio flags, so we have a clear split between
     which are bio (or rq) private, and which ones are shared. From
     Christoph.

   - A set of patches from Bart, that improve how we handle queue
     stopping and starting in blk-mq.

   - Support for WRITE_ZEROES from Chaitanya.

   - Lightnvm updates from Javier/Matias.

   - Supoort for FC for the nvme-over-fabrics code. From James Smart.

   - A bunch of fixes from a whole slew of people, too many to name
     here"

* 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits)
  blk-stat: fix a few cases of missing batch flushing
  blk-flush: run the queue when inserting blk-mq flush
  elevator: make the rqhash helpers exported
  blk-mq: abstract out blk_mq_dispatch_rq_list() helper
  blk-mq: add blk_mq_start_stopped_hw_queue()
  block: improve handling of the magic discard payload
  blk-wbt: don't throttle discard or write zeroes
  nbd: use dev_err_ratelimited in io path
  nbd: reset the setup task for NBD_CLEAR_SOCK
  nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME
  nvme-fabrics: Add target support for FC transport
  nvme-fabrics: Add host support for FC transport
  nvme-fabrics: Add FC transport LLDD api definitions
  nvme-fabrics: Add FC transport FC-NVME definitions
  nvme-fabrics: Add FC transport error codes to nvme.h
  Add type 0x28 NVME type code to scsi fc headers
  nvme-fabrics: patch target code in prep for FC transport support
  nvme-fabrics: set sqe.command_id in core not transports
  parser: add u64 number parser
  nvme-rdma: align to generic ib_event logging helper
  ...
2016-12-13 10:19:16 -08:00
Linus Torvalds 52281b38bc Improvements and fixes to pstore subsystem:
- Add additional checks for bad platform data
 
 - Remove bounce buffer in console writer
 
 - Protect read/unlink race with a mutex
 
 - Correctly give up during dump locking failures
 
 - Increase ftrace bandwidth by splitting ftrace buffers per CPU
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJYSJxYAAoJEIly9N/cbcAmYBsQAIAmHDgk3ootLQhyatZ9H2X0
 Nyl24xA7UCPaz13ddF1tUaItI4mYBWfY4gde+3fIVXDitgmFxZZqb8YV68CvFgUt
 Hb8tlTiM0F2z/muGBIgJ5TN5XiB4dO0WgvcKvnQdzyNGPVlAXvowHPkaM9X+iEA1
 y4U2Le7iK9+9fvkH7RM4O3hMiTmpKeUITYTWo1Y8n9LaZo3w5+pqhS+TPu75uyD0
 pLb53EOzZmg1nu9hcac5t4G5W1Lr4ji2EekDXemi/571HAzQnMXxJWc6ZVYLDNfP
 W4D0UGcHAERDzrYwWcGn8HIThYlpbnVw9atSTTodJTiIubtsRt4haycUH1hqMS5o
 4R2myhbAoM0A3zYBqrhwtQHg8apNes2hOR2WycAqgvylZZl1o6zaEs9zc7aafYuy
 N/M0x5tlya3fOgkvkJsmERT5jtqDVMhtBZ2xa8NYfJCHgULaUmjEx25eTr1kF3nW
 ERIX/3IayMvqHwYptP9dOzy2owLpXC8yZlM34AeM+ub93hHj1ELLfG7aN0bklD/+
 wfmIX8HpOA2XGWflOk5fiHLHro6pwRU9zOIIHFJ4Tf60PMoN+rjRfej1fjz+KOhO
 gxUYaCb+/4BlCqLqdFvF54qhQO2qmVuOAg/1BLu+hnZtXSyhVJxePthSs5shyoE8
 owL8rVXDGapjF1xO6WCR
 =UmFL
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore updates from Kees Cook:
 "Improvements and fixes to pstore subsystem:

   - add additional checks for bad platform data

   - remove bounce buffer in console writer

   - protect read/unlink race with a mutex

   - correctly give up during dump locking failures

   - increase ftrace bandwidth by splitting ftrace buffers per CPU"

* tag 'pstore-v4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  ramoops: add pdata NULL check to ramoops_probe
  pstore: Convert console write to use ->write_buf
  pstore: Protect unlink with read_mutex
  pstore: Use global ftrace filters for function trace filtering
  ftrace: Provide API to use global filtering for ftrace ops
  pstore: Clarify context field przs as dprzs
  pstore: improve error report for failed setup
  pstore: Merge per-CPU ftrace records into one
  pstore: Add ftrace timestamp counter
  ramoops: Split ftrace buffer space into per-CPU zones
  pstore: Make ramoops_init_przs generic for other prz arrays
  pstore: Allow prz to control need for locking
  pstore: Warn on PSTORE_TYPE_PMSG using deprecated function
  pstore: Make spinlock per zone instead of global
  pstore: Actually give up during locking failure
2016-12-13 09:16:11 -08:00
Linus Torvalds 9465d9cc31 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "The time/timekeeping/timer folks deliver with this update:

   - Fix a reintroduced signed/unsigned issue and cleanup the whole
     signed/unsigned mess in the timekeeping core so this wont happen
     accidentaly again.

   - Add a new trace clock based on boot time

   - Prevent injection of random sleep times when PM tracing abuses the
     RTC for storage

   - Make posix timers configurable for real tiny systems

   - Add tracepoints for the alarm timer subsystem so timer based
     suspend wakeups can be instrumented

   - The usual pile of fixes and updates to core and drivers"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  timekeeping: Use mul_u64_u32_shr() instead of open coding it
  timekeeping: Get rid of pointless typecasts
  timekeeping: Make the conversion call chain consistently unsigned
  timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion
  alarmtimer: Add tracepoints for alarm timers
  trace: Update documentation for mono, mono_raw and boot clock
  trace: Add an option for boot clock as trace clock
  timekeeping: Add a fast and NMI safe boot clock
  timekeeping/clocksource_cyc2ns: Document intended range limitation
  timekeeping: Ignore the bogus sleep time if pm_trace is enabled
  selftests/timers: Fix spelling mistake "Asyncrhonous" -> "Asynchronous"
  clocksource/drivers/bcm2835_timer: Unmap region obtained by of_iomap
  clocksource/drivers/arm_arch_timer: Map frame with of_io_request_and_map()
  arm64: dts: rockchip: Arch counter doesn't tick in system suspend
  clocksource/drivers/arm_arch_timer: Don't assume clock runs in suspend
  posix-timers: Make them configurable
  posix_cpu_timers: Move the add_device_randomness() call to a proper place
  timer: Move sys_alarm from timer.c to itimer.c
  ptp_clock: Allow for it to be optional
  Kconfig: Regenerate *.c_shipped files after previous changes
  ...
2016-12-12 19:56:15 -08:00
Linus Torvalds e71c3978d6 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp hotplug updates from Thomas Gleixner:
 "This is the final round of converting the notifier mess to the state
  machine. The removal of the notifiers and the related infrastructure
  will happen around rc1, as there are conversions outstanding in other
  trees.

  The whole exercise removed about 2000 lines of code in total and in
  course of the conversion several dozen bugs got fixed. The new
  mechanism allows to test almost every hotplug step standalone, so
  usage sites can exercise all transitions extensively.

  There is more room for improvement, like integrating all the
  pointlessly different architecture mechanisms of synchronizing,
  setting cpus online etc into the core code"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  tracing/rb: Init the CPU mask on allocation
  soc/fsl/qbman: Convert to hotplug state machine
  soc/fsl/qbman: Convert to hotplug state machine
  zram: Convert to hotplug state machine
  KVM/PPC/Book3S HV: Convert to hotplug state machine
  arm64/cpuinfo: Convert to hotplug state machine
  arm64/cpuinfo: Make hotplug notifier symmetric
  mm/compaction: Convert to hotplug state machine
  iommu/vt-d: Convert to hotplug state machine
  mm/zswap: Convert pool to hotplug state machine
  mm/zswap: Convert dst-mem to hotplug state machine
  mm/zsmalloc: Convert to hotplug state machine
  mm/vmstat: Convert to hotplug state machine
  mm/vmstat: Avoid on each online CPU loops
  mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead()
  tracing/rb: Convert to hotplug state machine
  oprofile/nmi timer: Convert to hotplug state machine
  net/iucv: Use explicit clean up labels in iucv_init()
  x86/pci/amd-bus: Convert to hotplug state machine
  x86/oprofile/nmi: Convert to hotplug state machine
  ...
2016-12-12 19:25:04 -08:00
Marcin Nowakowski d4d7ccc834 kprobes/trace: Fix kprobe selftest for newer gcc
Commit 265a5b7ee3 ("kprobes/trace: Fix kprobe selftest for gcc 4.6")
has added __used attribute to kprobe_trace_selftest_target to ensure
that the method is listed in kallsyms table.

However, even though the method remains in the kernel image, the actual
call is optimized away as there are no side effects and the return value
is never checked.

Add a return value check and a 'noinline' attribute to ensure that an
inlined copy of the method is not used by the caller. Also add checks
that verify that the kprobe was really hit, as at the moment the tests
show positive results despite the test method being optimized away.

Finally, add __init annotations to find_trace_probe_file() and
kprobe_trace_selftest_target() as they are only called from within an
__init method.

Link: http://lkml.kernel.org/r/1481293178-3128-2-git-send-email-marcin.nowakowski@imgtec.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-12 21:21:43 -05:00
Marcin Nowakowski f18f97ac43 tracing/kprobes: Add a helper method to return number of probe hits
The number of probe hits is stored in a percpu variable and therefore
can't be read directly. Add a helper method trace_kprobe_nhit() that
performs the required calculation.

It will be used in a follow-up commit that changes kprobe selftests to
verify the number of probe hits.

Link: http://lkml.kernel.org/r/1481293178-3128-1-git-send-email-marcin.nowakowski@imgtec.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-12 21:17:44 -05:00
Sebastian Andrzej Siewior 99e6f6e813 tracing/rb: Init the CPU mask on allocation
Before commit b32614c034 ("tracing/rb: Convert to hotplug state
machine") the allocated cpumask was initialized to the mask of ONLINE or
POSSIBLE CPUs. After the CPU hotplug changes the buffer initialisation
moved to trace_rb_cpu_prepare() but I forgot to initially set the
cpumask to zero. This is done now.

Link: http://lkml.kernel.org/r/20161207133133.hzkcqfllxcdi3joz@linutronix.de

Fixes: b32614c034 ("tracing/rb: Convert to hotplug state machine")
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-12 17:57:26 -05:00
Pavankumar Kondeti c59f29cb14 tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
The 's' flag is supposed to indicate that a softirq is running. This
can be detected by testing the preempt_count with SOFTIRQ_OFFSET.

The current code tests the preempt_count with SOFTIRQ_MASK, which
would be true even when softirqs are disabled but not serving a
softirq.

Link: http://lkml.kernel.org/r/1481300417-3564-1-git-send-email-pkondeti@codeaurora.org

Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-12 13:51:02 -05:00
Steven Rostedt (Red Hat) 1a41442864 tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too
Currently both the wakeup and irqsoff traces do not handle set_graph_notrace
well. The ftrace infrastructure will ignore the return paths of all
functions leaving them hanging without an end:

  # echo '*spin*' > set_graph_notrace
  # cat trace
  [...]
          _raw_spin_lock() {
            preempt_count_add() {
            do_raw_spin_lock() {
          update_rq_clock();

Where the '*spin*' functions should have looked like this:

          _raw_spin_lock() {
            preempt_count_add();
            do_raw_spin_lock();
          }
          update_rq_clock();

Instead, have the wakeup and irqsoff tracers ignore the functions that are
set by the set_graph_notrace like the function_graph tracer does. Move
the logic in the function_graph tracer into a header to allow wakeup and
irqsoff tracers to use it as well.

Cc: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09 09:21:35 -05:00
Steven Rostedt (Red Hat) 794de08a16 fgraph: Handle a case where a tracer ignores set_graph_notrace
Both the wakeup and irqsoff tracers can use the function graph tracer when
the display-graph option is set. The problem is that they ignore the notrace
file, and record the entry of functions that would be ignored by the
function_graph tracer. This causes the trace->depth to be recorded into the
ring buffer. The set_graph_notrace uses a trick by adding a large negative
number to the trace->depth when a graph function is to be ignored.

On trace output, the graph function uses the depth to record a stack of
functions. But since the depth is negative, it accesses the array with a
negative number and causes an out of bounds access that can cause a kernel
oops or corrupt data.

Have the print functions handle cases where a tracer still records functions
even when they are in set_graph_notrace.

Also add warnings if the depth is below zero before accessing the array.

Note, the function graph logic will still prevent the return of these
functions from being recorded, which means that they will be left hanging
without a return. For example:

   # echo '*spin*' > set_graph_notrace
   # echo 1 > options/display-graph
   # echo wakeup > current_tracer
   # cat trace
   [...]
      _raw_spin_lock() {
        preempt_count_add() {
        do_raw_spin_lock() {
      update_rq_clock();

Where it should look like:

      _raw_spin_lock() {
        preempt_count_add();
        do_raw_spin_lock();
      }
      update_rq_clock();

Cc: stable@vger.kernel.org
Cc: Namhyung Kim <namhyung.kim@lge.com>
Fixes: 29ad23b004 ("ftrace: Add set_graph_notrace filter")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09 09:19:28 -05:00
Steven Rostedt (Red Hat) 656c7f0d2d tracing: Replace kmap with copy_from_user() in trace_marker writing
Instead of using get_user_pages_fast() and kmap_atomic() when writing
to the trace_marker file, just allocate enough space on the ring buffer
directly, and write into it via copy_from_user().

Writing into the trace_marker file use to allocate a temporary buffer
to perform the copy_from_user(), as we didn't want to write into the
ring buffer if the copy failed. But as a trace_marker write is suppose
to be extremely fast, and allocating memory causes other tracepoints to
trigger, Peter Zijlstra suggested using get_user_pages_fast() and
kmap_atomic() to keep the user space pages in memory and reading it
directly. But Henrik Austad had issues with this because it required taking
the mm->mmap_sem and causing long delays with the write.

Instead, just allocate the space in the ring buffer and use
copy_from_user() directly. If it faults, return -EFAULT and write
"<faulted>" into the ring buffer.

Link: http://lkml.kernel.org/r/20161208124018.72dd0f86@gandalf.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Henrik Austad <henrik@austad.us>
Cc: Peter Zijlstra <peterz@infradead.org>
Updates: d696b58ca2 "tracing: Do not allocate buffer for trace_marker"
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09 09:18:14 -05:00
Steven Rostedt (Red Hat) 9c1f6bb8c8 tracing: Allow benchmark to be enabled at early_initcall()
The trace event start up selftests fails when the trace benchmark is
enabled, because it is disabled during boot. It really only needs to be
disabled before scheduling is set up, as it creates a thread.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09 09:16:15 -05:00
Steven Rostedt (Red Hat) 989a0a3d24 tracing: Have system enable return error if one of the events fail
If one of the events within a system fails to enable when "1" is written
to the system "enable" file, it should return an error. Note, some events
may still be enabled, but the user should know that something did go wrong.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09 09:15:41 -05:00
Steven Rostedt (Red Hat) 1dd349ab74 tracing: Do not start benchmark on boot up
Trace events are enabled very early on boot up via the boot command line
parameter. The benchmark tool creates a new thread to perform the trace
event benchmarking. But at start up, it is called before scheduling is set
up and because it creates a new thread before the init thread is created,
this crashes the kernel.

Have the benchmark fail to register when started via the kernel command
line.

Also, since the registering of a tracepoint now can handle failure cases,
return -ENOMEM instead of warning if the thread cannot be created.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09 09:14:00 -05:00
Steven Rostedt (Red Hat) 8cf868affd tracing: Have the reg function allow to fail
Some tracepoints have a registration function that gets enabled when the
tracepoint is enabled. There may be cases that the registraction function
must fail (for example, can't allocate enough memory). In this case, the
tracepoint should also fail to register, otherwise the user would not know
why the tracepoint is not working.

Cc: David Howells <dhowells@redhat.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09 09:13:30 -05:00
Sebastian Andrzej Siewior b18cc3de00 tracing/rb: Init the CPU mask on allocation
Before commit b32614c034 ("tracing/rb: Convert to hotplug state machine")
the allocated cpumask was initialized to the mask of online or possible
CPUs. After the CPU hotplug changes the buffer initialization moved to
trace_rb_cpu_prepare() but the cpumask is allocated with alloc_cpumask()
and therefor has random content. As a consequence the cpu buffers are not
initialized and a later access dereferences a NULL pointer.

Use zalloc_cpumask() instead so trace_rb_cpu_prepare() initializes the
buffers properly.

Fixes: b32614c034 ("tracing/rb: Convert to hotplug state machine")
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/20161207133133.hzkcqfllxcdi3joz@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-07 14:36:21 +01:00
Sebastian Andrzej Siewior b32614c034 tracing/rb: Convert to hotplug state machine
Install the callbacks via the state machine. The notifier in struct
ring_buffer is replaced by the multi instance interface.  Upon
__ring_buffer_alloc() invocation, cpuhp_state_add_instance() will invoke
the trace_rb_cpu_prepare() on each CPU.

This callback may now fail. This means __ring_buffer_alloc() will fail and
cleanup (like previously) and during a CPU up event this failure will not
allow the CPU to come up.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161126231350.10321-7-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02 00:52:34 +01:00
Joel Fernandes 80ec355210 trace: Add an option for boot clock as trace clock
Unlike monotonic clock, boot clock as a trace clock will account for
time spent in suspend useful for tracing suspend/resume. This uses
earlier introduced infrastructure for using the fast boot clock.

Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Link: http://lkml.kernel.org/r/1480372524-15181-7-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-29 18:02:59 +01:00
Steven Rostedt (Red Hat) 38e11df134 ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inline
Both rb_end_commit() and rb_set_commit_to_write() are in the fast path of
the ring buffer recording. Make sure they are always inlined.

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23 20:42:31 -05:00
Steven Rostedt (Red Hat) babe3fce95 ring-buffer: Froce rb_update_write_stamp() to be inlined
The function rb_update_write_stamp() is in the hotpath of the ring buffer
recording. Make sure that it is inlined as well. There's not many places
that call it.

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23 20:38:39 -05:00
Steven Rostedt (Red Hat) 2289d5672f ring-buffer: Force inline of hotpath helper functions
There's several small helper functions in ring_buffer.c that are used in the
hot path. For some reason, even though they are marked inline, gcc tends not
to enforce it. Make sure these functions are always inlined.

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23 20:35:32 -05:00
Steven Rostedt (Red Hat) 52ffabe384 tracing: Make __buffer_unlock_commit() always_inline
The function __buffer_unlock_commit() is called in a few places outside of
trace.c. But for the most part, it should really be inlined, as it is in the
hot path of the trace_events. For the callers outside of trace.c, create a
new function trace_buffer_unlock_commit_nostack(), as the reason it was used
was to avoid the stack tracing that trace_buffer_unlock_commit() could do.

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23 20:30:51 -05:00
Steven Rostedt (Red Hat) 4239174570 tracing: Make tracepoint_printk a static_key
Currently, when tracepoint_printk is set (enabled by the "tp_printk" kernel
command line), it causes trace events to print via printk(). This is a very
dangerous operation, but is useful for debugging.

The issue is, it's seldom used, but it is always checked even if it's not
enabled by the kernel command line. Instead of having this feature called by
a branch against a variable, turn that variable into a static key, and this
will remove the test and jump.

To simplify things, the functions output_printk() and
trace_event_buffer_commit() were moved from trace_events.c to trace.c.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23 15:52:45 -05:00
Steven Rostedt (Red Hat) 929ddbf3ef ring-buffer: Always inline rb_event_data()
The rb_event_data() is the fast path of getting the ring buffer data from an
event. Externally, ring_buffer_event_data() is used to access this function.
But unfortunately, rb_event_data() is not inlined, and calling
ring_buffer_event_data() causes that function to be called again. Force
rb_event_data() to be inlined to lower the number of operations needed when
calling ring_buffer_event_data().

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23 11:40:34 -05:00
Steven Rostedt (Red Hat) fa7ffb39ef ring-buffer: Make rb_reserve_next_event() always inlined
The function rb_reserved_next_event() is called by two functions:
ring_buffer_lock_reserve() and ring_buffer_write(). This is in a very hot
path of the tracing code, and it is best that they are not functions. The
two callers are basically wrapers for rb_reserver_next_event(). Removing the
function calls can save execution time in the hotpath of tracing.

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23 11:36:30 -05:00
Steven Rostedt (Red Hat) 3e9a8aadca tracing: Create a always_inlined __trace_buffer_lock_reserve()
As Andi Kleen pointed out in the Link below, the trace events has quite a
bit of code execution. A lot of that happens to be calling functions, where
some of them should simply be inlined. One of these functions happens to be
trace_buffer_lock_reserve() which is also a global, but it is used
throughout the file it is defined in. Create a __trace_buffer_lock_reserve()
that is always inlined that the file can benefit from.

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23 11:29:58 -05:00
Steven Rostedt (Red Hat) 7d43640022 tracing: Add error checks to creation of event files
The creation of the set_event_pid file was assigned to a variable "entry"
but that variable was never used. Ideally, it should be used to check if the
file was created and warn if it was not.

The files header_page, header_event should also be checked and a warning if
they fail to be created.

The "enable" file was moved up, as it is a more crucial file to have and a
hard failure (return -ENOMEM) should be returned if it is not created.

Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-22 18:32:03 -05:00
Chunyan Zhang 478409dd68 tracing: Add hook to function tracing for other subsystems to use
Currently Function traces can be only exported to the ring buffer. This
adds a trace_export concept which can process traces and export
them to a registered destination as an addition to the current
one that outputs to Ftrace - i.e. ring buffer.

In this way, if we want function traces to be sent to other destinations
rather than only to the ring buffer, we just need to register a new
trace_export and implement its own .write() function for writing traces to
storage.

With this patch, only function tracing (trace type is TRACE_FN)
is supported.

Link: http://lkml.kernel.org/r/1479715043-6534-2-git-send-email-zhang.chunyan@linaro.org

Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-22 17:40:00 -05:00
David S. Miller f9aa9dc7d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All conflicts were simple overlapping changes except perhaps
for the Thunder driver.

That driver has a change_mtu method explicitly for sending
a message to the hardware.  If that fails it returns an
error.

Normally a driver doesn't need an ndo_change_mtu method becuase those
are usually just range changes, which are now handled generically.
But since this extra operation is needed in the Thunder driver, it has
to stay.

However, if the message send fails we have to restore the original
MTU before the change because the entire call chain expects that if
an error is thrown by ndo_change_mtu then the MTU did not change.
Therefore code is added to nicvf_change_mtu to remember the original
MTU, and to restore it upon nicvf_update_hw_max_frs() failue.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22 13:27:16 -05:00
Joel Fernandes d032ae8921 ftrace: Provide API to use global filtering for ftrace ops
Currently the global_ops filtering hash is not available to outside users
registering for function tracing. Provide an API for those users to be
able to choose global filtering.

This is in preparation for pstore's ftrace feature to be able to
use the global filters.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-11-15 16:34:30 -08:00
Steven Rostedt fa32e8557b tracing: Add new trace_marker_raw
A new file is created:

 /sys/kernel/debug/tracing/trace_marker_raw

This allows for appications to create data structures and write the binary
data directly into it, and then read the trace data out from trace_pipe_raw
into the same type of data structure. This saves on converting numbers into
ASCII that would be required by trace_marker.

Suggested-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-15 15:13:59 -05:00
Zhou Chengming 8d414bd2f7 tracing: Allow wakeup_dl tracer to be used by instances
Allow wakeup_dl tracer to be used by instances, like wakeup tracer
and wakeup_rt tracer.

Link: http://lkml.kernel.org/r/1479093553-31264-1-git-send-email-zhouchengming1@huawei.com

Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14 16:43:00 -05:00
Steven Rostedt (Red Hat) 3f303fbccf tracing/filter: Define op as the enum that it is
The trace_events_file.c filter logic can be a bit complex. I copy this into
a userspace program where I can debug it a bit easier. One issue is the op
is defined in most places as an int instead of as an enum, and gdb just
gives the value when debugging. Having the actual op name shown in gdb is
more useful.

This has no functionality change, but helps in debugging when the file is
debugged in user space.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14 16:42:59 -05:00
Steven Rostedt (Red Hat) fdf5b67986 tracing: Optimise comparison filters and fix binary and for 64 bit
Currently the filter logic for comparisons (like greater-than and less-than)
are used, they share the same function and a switch statement is used to
jump to the comparison type to perform. This is done in the extreme hot path
of the tracing code, and it does not take much more space to create a
unique comparison function to perform each type of comparison and remove the
switch statement.

Also, a bug was found where the binary and operation for 64 bits could fail
if the resulting bits were greater than 32 bits, because the result was
passed into a 32 bit variable. This was fixed when adding the separate
binary and function.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14 16:42:58 -05:00
Masami Hiramatsu 60f1d5e3ba ftrace: Support full glob matching
Use glob_match() to support flexible glob wildcards (*,?)
and character classes ([) for ftrace.
Since the full glob matching is slower than the current
partial matching routines(*pat, pat*, *pat*), this leaves
those routines and just add MATCH_GLOB for complex glob
expression.

e.g.
----
[root@localhost tracing]# echo 'sched*group' > set_ftrace_filter
[root@localhost tracing]# cat set_ftrace_filter
sched_free_group
sched_change_group
sched_create_group
sched_online_group
sched_destroy_group
sched_offline_group
[root@localhost tracing]# echo '[Ss]y[Ss]_*' > set_ftrace_filter
[root@localhost tracing]# head set_ftrace_filter
sys_arch_prctl
sys_rt_sigreturn
sys_ioperm
SyS_iopl
sys_modify_ldt
SyS_mmap
SyS_set_thread_area
SyS_get_thread_area
SyS_set_tid_address
sys_fork
----

Link: http://lkml.kernel.org/r/147566869501.29136.6462645009894738056.stgit@devbox

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14 16:42:58 -05:00
Steven Rostedt (Red Hat) 546fece4ea ftrace: Add more checks for FTRACE_FL_DISABLED in processing ip records
When a module is first loaded and its function ip records are added to the
ftrace list of functions to modify, they are set to DISABLED, as their text
is still in a read only state. When the module is fully loaded, and can be
updated, the flag is cleared, and if their's any functions that should be
tracing them, it is updated at that moment.

But there's several locations that do record accounting and should ignore
records that are marked as disabled, or they can cause issues.

Alexei already fixed one location, but others need to be addressed.

Cc: stable@vger.kernel.org
Fixes: b7ffffbb46 "ftrace: Add infrastructure for delayed enabling of module functions"
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14 16:31:49 -05:00
Alexei Starovoitov 977c1f9c8c ftrace: Ignore FTRACE_FL_DISABLED while walking dyn_ftrace records
ftrace_shutdown() checks for sanity of ftrace records
and if dyn_ftrace->flags is not zero, it will warn.
It can happen that 'flags' are set to FTRACE_FL_DISABLED at this point,
since some module was loaded, but before ftrace_module_enable()
cleared the flags for this module.

In other words the module.c is doing:
ftrace_module_init(mod); // calls ftrace_update_code() that sets flags=FTRACE_FL_DISABLED
... // here ftrace_shutdown() is called that warns, since
err = prepare_coming_module(mod); // didn't have a chance to clear FTRACE_FL_DISABLED

Fix it by ignoring disabled records.
It's similar to what __ftrace_hash_rec_update() is already doing.

Link: http://lkml.kernel.org/r/1478560460-3818619-1-git-send-email-ast@fb.com

Cc: stable@vger.kernel.org
Fixes: b7ffffbb46 "ftrace: Add infrastructure for delayed enabling of module functions"
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14 16:31:41 -05:00
Christoph Hellwig ef295ecf09 block: better op and flags encoding
Now that we don't need the common flags to overflow outside the range
of a 32-bit type we can encode them the same way for both the bio and
request fields.  This in addition allows us to place the operation
first (and make some room for more ops while we're at it) and to
stop having to shift around the operation values.

In addition this allows passing around only one value in the block layer
instead of two (and eventuall also in the file systems, but we can do
that later) and thus clean up a lot of code.

Last but not least this allows decreasing the size of the cmd_flags
field in struct request to 32-bits.  Various functions passing this
value could also be updated, but I'd like to avoid the churn for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-28 08:48:16 -06:00
Daniel Borkmann 2d0e30c30f bpf: add helper for retrieving current numa node id
Use case is mainly for soreuseport to select sockets for the local
numa node, but since generic, lets also add this for other networking
and tracing program types.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22 17:05:52 -04:00
Linus Torvalds ef6000b4c6 Disable the __builtin_return_address() warning globally after all
This affectively reverts commit 377ccbb483 ("Makefile: Mute warning
for __builtin_return_address(>0) for tracing only") because it turns out
that it really isn't tracing only - it's all over the tree.

We already also had the warning disabled separately for mm/usercopy.c
(which this commit also removes), and it turns out that we will also
want to disable it for get_lock_parent_ip(), that is used for at least
TRACE_IRQFLAGS.  Which (when enabled) ends up being all over the tree.

Steven Rostedt had a patch that tried to limit it to just the config
options that actually triggered this, but quite frankly, the extra
complexity and abstraction just isn't worth it.  We have never actually
had a case where the warning is actually useful, so let's just disable
it globally and not worry about it.

Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-12 10:23:41 -07:00
Linus Torvalds 2ab704a47e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial updates from Jiri Kosina:
 "The usual rocket science from the trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  tracing/syscalls: fix multiline in error message text
  lib/Kconfig.debug: fix DEBUG_SECTION_MISMATCH description
  doc: vfs: fix fadvise() sycall name
  x86/entry: spell EBX register correctly in documentation
  securityfs: fix securityfs_create_dir comment
  irq: Fix typo in tracepoint.xml
2016-10-07 12:24:08 -07:00
Linus Torvalds 95107b30be This release cycle is rather small. Just a few fixes to tracing.
The big change is the addition of the hwlat tracer. It not only detects
 SMIs, but also other latency that's caused by the hardware. I have detected
 some latency from large boxes having bus contention.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJX9a77AAoJEKKk/i67LK/8UPEH/jcqMmOMhQYVQsNaJViA5uJM
 SV96gaLCc9cxXY04Hf7vx8RkVIyIqTCCQZ+RVZt4RSeqpsB2IzZ1u0CNKs2Z0MTv
 MdvQJoazRoDgVuPzKAsdAlDd0ykqHEFA5ayF3XDK4P2J97La+B4rQIqEiJX/aDrz
 i0NQQFg2ZF46mXJXn4oXe6nmr6WnbiEduawVjd7JvgILJO2hojDicOTQlNG41Nys
 68fOV8mLk0OL7sFRjySLGcbdbKhP2YbNhxILXl8geLgS9+CFZXkE8oTRjjy9IMNA
 XrqbFLMWaRVv+Nig7bHIWKE8ZErC5WCYUw4LD2GTLMDx5AkAVLGFFp6TOiO4SG8=
 =ke23
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "This release cycle is rather small.  Just a few fixes to tracing.

  The big change is the addition of the hwlat tracer. It not only
  detects SMIs, but also other latency that's caused by the hardware. I
  have detected some latency from large boxes having bus contention"

* tag 'trace-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Call traceoff trigger after event is recorded
  ftrace/scripts: Add helper script to bisect function tracing problem functions
  tracing: Have max_latency be defined for HWLAT_TRACER as well
  tracing: Add NMI tracing in hwlat detector
  tracing: Have hwlat trace migrate across tracing_cpumask CPUs
  tracing: Add documentation for hwlat_detector tracer
  tracing: Added hardware latency tracer
  ftrace: Access ret_stack->subtime only in the function profiler
  function_graph: Handle TRACE_BPUTS in print_graph_comment
  tracing/uprobe: Drop isdigit() check in create_trace_uprobe
2016-10-06 11:48:41 -07:00
Linus Torvalds 687ee0ad4e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) BBR TCP congestion control, from Neal Cardwell, Yuchung Cheng and
    co. at Google. https://lwn.net/Articles/701165/

 2) Do TCP Small Queues for retransmits, from Eric Dumazet.

 3) Support collect_md mode for all IPV4 and IPV6 tunnels, from Alexei
    Starovoitov.

 4) Allow cls_flower to classify packets in ip tunnels, from Amir Vadai.

 5) Support DSA tagging in older mv88e6xxx switches, from Andrew Lunn.

 6) Support GMAC protocol in iwlwifi mwm, from Ayala Beker.

 7) Support ndo_poll_controller in mlx5, from Calvin Owens.

 8) Move VRF processing to an output hook and allow l3mdev to be
    loopback, from David Ahern.

 9) Support SOCK_DESTROY for UDP sockets. Also from David Ahern.

10) Congestion control in RXRPC, from David Howells.

11) Support geneve RX offload in ixgbe, from Emil Tantilov.

12) When hitting pressure for new incoming TCP data SKBs, perform a
    partial rathern than a full purge of the OFO queue (which could be
    huge). From Eric Dumazet.

13) Convert XFRM state and policy lookups to RCU, from Florian Westphal.

14) Support RX network flow classification to igb, from Gangfeng Huang.

15) Hardware offloading of eBPF in nfp driver, from Jakub Kicinski.

16) New skbmod packet action, from Jamal Hadi Salim.

17) Remove some inefficiencies in snmp proc output, from Jia He.

18) Add FIB notifications to properly propagate route changes to
    hardware which is doing forwarding offloading. From Jiri Pirko.

19) New dsa driver for qca8xxx chips, from John Crispin.

20) Implement RFC7559 ipv6 router solicitation backoff, from Maciej
    Żenczykowski.

21) Add L3 mode to ipvlan, from Mahesh Bandewar.

22) Support 802.1ad in mlx4, from Moshe Shemesh.

23) Support hardware LRO in mediatek driver, from Nelson Chang.

24) Add TC offloading to mlx5, from Or Gerlitz.

25) Convert various drivers to ethtool ksettings interfaces, from
    Philippe Reynes.

26) TX max rate limiting for cxgb4, from Rahul Lakkireddy.

27) NAPI support for ath10k, from Rajkumar Manoharan.

28) Support XDP in mlx5, from Rana Shahout and Saeed Mahameed.

29) UDP replicast support in TIPC, from Richard Alpe.

30) Per-queue statistics for qed driver, from Sudarsana Reddy Kalluru.

31) Support BQL in thunderx driver, from Sunil Goutham.

32) TSO support in alx driver, from Tobias Regnery.

33) Add stream parser engine and use it in kcm.

34) Support async DHCP replies in ipconfig module, from Uwe
    Kleine-König.

35) DSA port fast aging for mv88e6xxx driver, from Vivien Didelot.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1715 commits)
  mlxsw: switchx2: Fix misuse of hard_header_len
  mlxsw: spectrum: Fix misuse of hard_header_len
  net/faraday: Stop NCSI device on shutdown
  net/ncsi: Introduce ncsi_stop_dev()
  net/ncsi: Rework the channel monitoring
  net/ncsi: Allow to extend NCSI request properties
  net/ncsi: Rework request index allocation
  net/ncsi: Don't probe on the reserved channel ID (0x1f)
  net/ncsi: Introduce NCSI_RESERVED_CHANNEL
  net/ncsi: Avoid unused-value build warning from ia64-linux-gcc
  net: Add netdev all_adj_list refcnt propagation to fix panic
  net: phy: Add Edge-rate driver for Microsemi PHYs.
  vmxnet3: Wake queue from reset work
  i40e: avoid NULL pointer dereference and recursive errors on early PCI error
  qed: Add RoCE ll2 & GSI support
  qed: Add support for memory registeration verbs
  qed: Add support for QP verbs
  qed: PD,PKEY and CQ verb support
  qed: Add support for RoCE hw init
  qede: Add qedr framework
  ...
2016-10-05 10:11:24 -07:00
Linus Torvalds 1a4a2bc460 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull low-level x86 updates from Ingo Molnar:
 "In this cycle this topic tree has become one of those 'super topics'
  that accumulated a lot of changes:

   - Add CONFIG_VMAP_STACK=y support to the core kernel and enable it on
     x86 - preceded by an array of changes. v4.8 saw preparatory changes
     in this area already - this is the rest of the work. Includes the
     thread stack caching performance optimization. (Andy Lutomirski)

   - switch_to() cleanups and all around enhancements. (Brian Gerst)

   - A large number of dumpstack infrastructure enhancements and an
     unwinder abstraction. The secret long term plan is safe(r) live
     patching plus maybe another attempt at debuginfo based unwinding -
     but all these current bits are standalone enhancements in a frame
     pointer based debug environment as well. (Josh Poimboeuf)

   - More __ro_after_init and const annotations. (Kees Cook)

   - Enable KASLR for the vmemmap memory region. (Thomas Garnier)"

[ The virtually mapped stack changes are pretty fundamental, and not
  x86-specific per se, even if they are only used on x86 right now. ]

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
  x86/asm: Get rid of __read_cr4_safe()
  thread_info: Use unsigned long for flags
  x86/alternatives: Add stack frame dependency to alternative_call_2()
  x86/dumpstack: Fix show_stack() task pointer regression
  x86/dumpstack: Remove dump_trace() and related callbacks
  x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder
  oprofile/x86: Convert x86_backtrace() to use the new unwinder
  x86/stacktrace: Convert save_stack_trace_*() to use the new unwinder
  perf/x86: Convert perf_callchain_kernel() to use the new unwinder
  x86/unwind: Add new unwind interface and implementations
  x86/dumpstack: Remove NULL task pointer convention
  fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y
  sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK
  lib/syscall: Pin the task stack in collect_syscall()
  x86/process: Pin the target stack in get_wchan()
  x86/dumpstack: Pin the target stack when dumping it
  kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function
  sched/core: Add try_get_task_stack() and put_task_stack()
  x86/entry/64: Fix a minor comment rebase error
  iommu/amd: Don't put completion-wait semaphore on stack
  ...
2016-10-03 16:13:28 -07:00
Linus Torvalds 12b7bcb43e Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "The main kernel side changes were:

   - uprobes enhancements (Masami Hiramatsu)

   - Uncore group events enhancements (David Carrillo-Cisneros)

   - x86 Intel: Add support for Skylake server uncore PMUs (Kan Liang)

   - x86 Intel: LBR cleanups and enhancements, for better branch
     annotation tracking (Peter Zijlstra)

   - x86 Intel: Add support for PTWRITE and power event tracing
     (Alexander Shishkin)

   - ... various fixes, cleanups and smaller enhancements.

  Lots of tooling changes - a couple of highlights:

   - Support event group view with hierarchy mode in 'perf top' and
     'perf report' (Namhyung Kim)

     e.g.:

     $ perf record -e '{cycles,instructions}' make
     $ perf report --hierarchy --stdio
     ...
     #   Overhead  Command / Shared Object / Symbol
     # ......................  ..................................
     ...
     25.74%  27.18%sh
     19.96%  24.14%libc-2.24.so
      9.55%  14.64%[.] __strcmp_sse2
      1.54%   0.00%[.] __tfind
      1.07%   1.13%[.] _int_malloc
      0.95%   0.00%[.] __strchr_sse2
      0.89%   1.39%[.] __tsearch
      0.76%   0.00%[.] strlen

   - Add branch stack / basic block info to 'perf annotate --stdio',
     where for each branch, we add an asm comment after the instruction
     with information on how often it was taken and predicted. See
     example with color output at:

       http://vger.kernel.org/~acme/perf/annotate_basic_blocks.png

     (Peter Zijlstra)

   - Add support for using symbols in address filters with Intel PT and
     ARM CoreSight (hardware assisted tracing facilities) (Adrian
     Hunter, Mathieu Poirier)

   - Add support for interacting with Coresight PMU ETMs/PTMs, that are
     IP blocks to perform hardware assisted tracing on a ARM CPU core
     (Mathieu Poirier)

   - Support generating cross arch probes, i.e. if you specify a vmlinux
     file for different arch than the one in the host machine,

        $ perf probe --definition function_name args

     will generate the probe definition string needed to append to the
     target machine /sys/kernel/debug/tracing/kprobes_events file, using
     scripting (Masami Hiramatsu).

   - Allow configuring the default 'perf report -s' sort order in
     ~/.perfconfig, for instance, "sym,dso" may be more fitting for
     kernel developers. (Arnaldo Carvalho de Melo)

   - ... plus lots of other changes, refactorings, features and fixes"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (149 commits)
  perf tests: Add dwarf unwind test for powerpc
  perf probe: Match linkage name with mangled name
  perf probe: Fix to cut off incompatible chars from group name
  perf probe: Skip if the function address is 0
  perf probe: Ignore the error of finding inline instance
  perf intel-pt: Fix decoding when there are address filters
  perf intel-pt: Enable decoder to handle TIP.PGD with missing IP
  perf intel-pt: Read address filter from AUXTRACE_INFO event
  perf intel-pt: Record address filter in AUXTRACE_INFO event
  perf intel-pt: Add a helper function for processing AUXTRACE_INFO
  perf intel-pt: Fix missing error codes processing auxtrace_info
  perf intel-pt: Add support for recording the max non-turbo ratio
  perf intel-pt: Fix snapshot overlap detection decoder errors
  perf probe: Increase debug level of SDT debug messages
  perf record: Add support for using symbols in address filters
  perf symbols: Add dso__last_symbol()
  perf record: Fix error paths
  perf record: Rename label 'out_symbol_exit'
  perf script: Fix vanished idle symbols
  perf evsel: Add support for address filters
  ...
2016-10-03 12:47:28 -07:00
David S. Miller b50afd203a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Three sets of overlapping changes.  Nothing serious.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-02 22:20:41 -04:00
Thomas Gleixner d7e25c66c9 Merge branch 'x86/urgent' into x86/asm
Get the cr4 fixes so we can apply the final cleanup
2016-09-30 12:38:28 +02:00
Colin Ian King d282b9c0ac tracing/syscalls: fix multiline in error message text
pr_info message spans two lines and the literal string is missing
a white space between words. Add the white space.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-09-29 10:25:23 +02:00
Linus Torvalds 4c04b4b534 Al Viro has been looking at the tracefs code, and has pointed out
some issues. This contains one fix by me and one by Al. I'm sure that
 he'll come up with more but for now I tested these patches and they
 don't appear to have any negative impact on tracing.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJX6FvrAAoJEKKk/i67LK/8EuIH/Arf6vJidYsmbe57WQp8PU3I
 bldem6ePj6zgZ2ZqPlSGCs1J2DcK4Bh3lPVxdx7rRKVWSd/Zoj+i83hvObusR8M7
 Qs1G92bJTvvVO3aPfiN0GvKGdKfGn45L+j0BcBauiTRKqnj3PkhOhIP2/ks0ewSk
 qeq7R3xxo/FDs26AHS69Hm0PIIw7btyhXNX4GB3Il7IIA5/nUknw3C+bjVj86tYX
 R4iElcHEhplgoSjKuLgNIRZGUnEFtsm/fnohYXpHacLTUKNXnTDY230x/OKc1yyB
 1vOfHS/y5s3XSJ1lcgSjYeNc51lK8NiDASaptZSUnOookKSAooUTFELNzpbc0sg=
 =+Fr3
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracefs fixes from Steven Rostedt:
 "Al Viro has been looking at the tracefs code, and has pointed out some
  issues.  This contains one fix by me and one by Al.  I'm sure that
  he'll come up with more but for now I tested these patches and they
  don't appear to have any negative impact on tracing"

* tag 'trace-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  fix memory leaks in tracing_buffers_splice_read()
  tracing: Move mutex to protect against resetting of seq data
2016-09-25 18:40:13 -07:00
Al Viro 1ae2293dd6 fix memory leaks in tracing_buffers_splice_read()
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-25 13:30:13 -04:00
Steven Rostedt (Red Hat) 1245800c0f tracing: Move mutex to protect against resetting of seq data
The iter->seq can be reset outside the protection of the mutex. So can
reading of user data. Move the mutex up to the beginning of the function.

Fixes: d7350c3f45 ("tracing/core: make the read callbacks reentrants")
Cc: stable@vger.kernel.org # 2.6.30+
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-25 10:27:08 -04:00
Masami Hiramatsu a0d0c6216a tracing: Call traceoff trigger after event is recorded
Call traceoff trigger after the event is recorded.
Since current traceoff trigger is called before recording
the event, we can not know what event stopped tracing.

Typical usecase of traceoff/traceon trigger is tracing
function calls and trace events between a pair of events.
For example, trace function calls between syscall entry/exit.
In that case, it is useful if we can see the return code
of the target syscall.

Link: http://lkml.kernel.org/r/147335074530.12462.4526186083406015005.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-23 09:47:59 -04:00
Ingo Molnar d4b80afbba Merge branch 'linus' into x86/asm, to pick up recent fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-15 08:24:53 +02:00
Steven Rostedt (Red Hat) f971cc9aab tracing: Have max_latency be defined for HWLAT_TRACER as well
The hwlat tracer uses tr->max_latency, and if it's the only tracer enabled
that uses it, the build will fail. Add max_latency and its file when the
hwlat tracer is enabled.

Link: http://lkml.kernel.org/r/d6c3b7eb-ba95-1ffa-0453-464e1e24262a@infradead.org

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-12 09:59:46 -04:00
Daniel Borkmann f3694e0012 bpf: add BPF_CALL_x macros for declaring helpers
This work adds BPF_CALL_<n>() macros and converts all the eBPF helper functions
to use them, in a similar fashion like we do with SYSCALL_DEFINE<n>() macros
that are used today. Motivation for this is to hide all the register handling
and all necessary casts from the user, so that it is done automatically in the
background when adding a BPF_CALL_<n>() call.

This makes current helpers easier to review, eases to write future helpers,
avoids getting the casting mess wrong, and allows for extending all helpers at
once (f.e. build time checks, etc). It also helps detecting more easily in
code reviews that unused registers are not instrumented in the code by accident,
breaking compatibility with existing programs.

BPF_CALL_<n>() internals are quite similar to SYSCALL_DEFINE<n>() ones with some
fundamental differences, for example, for generating the actual helper function
that carries all u64 regs, we need to fill unused regs, so that we always end up
with 5 u64 regs as an argument.

I reviewed several 0-5 generated BPF_CALL_<n>() variants of the .i results and
they look all as expected. No sparse issue spotted. We let this also sit for a
few days with Fengguang's kbuild test robot, and there were no issues seen. On
s390, it barked on the "uses dynamic stack allocation" notice, which is an old
one from bpf_perf_event_output{,_tp}() reappearing here due to the conversion
to the call wrapper, just telling that the perf raw record/frag sits on stack
(gcc with s390's -mwarn-dynamicstack), but that's all. Did various runtime tests
and they were fine as well. All eBPF helpers are now converted to use these
macros, getting rid of a good chunk of all the raw castings.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09 19:36:04 -07:00
Daniel Borkmann f035a51536 bpf: add BPF_SIZEOF and BPF_FIELD_SIZEOF macros
Add BPF_SIZEOF() and BPF_FIELD_SIZEOF() macros to improve the code a bit
which otherwise often result in overly long bytes_to_bpf_size(sizeof())
and bytes_to_bpf_size(FIELD_SIZEOF()) lines. So place them into a macro
helper instead. Moreover, we currently have a BUILD_BUG_ON(BPF_FIELD_SIZEOF())
check in convert_bpf_extensions(), but we should rather make that generic
as well and add a BUILD_BUG_ON() test in all BPF_SIZEOF()/BPF_FIELD_SIZEOF()
users to detect any rewriter size issues at compile time. Note, there are
currently none, but we want to assert that it stays this way.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09 19:36:04 -07:00
Ingo Molnar 2cc538412a Merge branch 'perf/urgent' into perf/core, to pick up fixed and resolve conflicts
Conflicts:
	kernel/events/core.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-05 12:09:59 +02:00
Alexei Starovoitov 0515e5999a bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type
Introduce BPF_PROG_TYPE_PERF_EVENT programs that can be attached to
HW and SW perf events (PERF_TYPE_HARDWARE and PERF_TYPE_SOFTWARE
correspondingly in uapi/linux/perf_event.h)

The program visible context meta structure is
struct bpf_perf_event_data {
    struct pt_regs regs;
     __u64 sample_period;
};
which is accessible directly from the program:
int bpf_prog(struct bpf_perf_event_data *ctx)
{
  ... ctx->sample_period ...
  ... ctx->regs.ip ...
}

The bpf verifier rewrites the accesses into kernel internal
struct bpf_perf_event_data_kern which allows changing
struct perf_sample_data without affecting bpf programs.
New fields can be added to the end of struct bpf_perf_event_data
in the future.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02 10:46:44 -07:00
Steven Rostedt (Red Hat) 7b2c862501 tracing: Add NMI tracing in hwlat detector
As NMIs can also cause latency when interrupts are disabled, the hwlat
detectory has no way to know if the latency it detects is from an NMI or an
SMI or some other hardware glitch.

As ftrace_nmi_enter/exit() funtions are no longer used (except for sh, which
isn't supported anymore), I converted those to "arch_ftrace_nmi_enter/exit"
and use ftrace_nmi_enter/exit() to check if hwlat detector is tracing or
not, and if so, it calls into the hwlat utility.

Since the hwlat detector only has a single kthread that is spinning with
interrupts disabled, it marks what CPU it is on, and if the NMI callback
happens on that CPU, it records the time spent in that NMI. This is added to
the output that is generated by the hwlat detector as:

 #3     inner/outer(us):    9/9     ts:1470836488.206734548
 #4     inner/outer(us):    0/8     ts:1470836497.140808588
 #5     inner/outer(us):    0/6     ts:1470836499.140825168 nmi-total:5 nmi-count:1
 #6     inner/outer(us):    9/9     ts:1470836501.140841748

All time is still tracked in microseconds.

The NMI information is only shown when an NMI occurred during the sample.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-02 12:47:55 -04:00
Steven Rostedt (Red Hat) 0330f7aa8e tracing: Have hwlat trace migrate across tracing_cpumask CPUs
Instead of having the hwlat detector thread stay on one CPU, have it migrate
across all the CPUs specified by tracing_cpumask. If the user modifies the
thread's CPU affinity, the migration will stop until the next instance that
the tracer is instantiated. The migration happens at the end of each window
(period).

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-02 12:47:54 -04:00
Steven Rostedt (Red Hat) e7c15cd8a1 tracing: Added hardware latency tracer
The hardware latency tracer has been in the PREEMPT_RT patch for some time.
It is used to detect possible SMIs or any other hardware interruptions that
the kernel is unaware of. Note, NMIs may also be detected, but that may be
good to note as well.

The logic is pretty simple. It simply creates a thread that spins on a
single CPU for a specified amount of time (width) within a periodic window
(window). These numbers may be adjusted by their cooresponding names in

   /sys/kernel/tracing/hwlat_detector/

The defaults are window = 1000000 us (1 second)
                 width  =  500000 us (1/2 second)

The loop consists of:

	t1 = trace_clock_local();
	t2 = trace_clock_local();

Where trace_clock_local() is a variant of sched_clock().

The difference of t2 - t1 is recorded as the "inner" timestamp and also the
timestamp  t1 - prev_t2 is recorded as the "outer" timestamp. If either of
these differences are greater than the time denoted in
/sys/kernel/tracing/tracing_thresh then it records the event.

When this tracer is started, and tracing_thresh is zero, it changes to the
default threshold of 10 us.

The hwlat tracer in the PREEMPT_RT patch was originally written by
Jon Masters. I have modified it quite a bit and turned it into a
tracer.

Based-on-code-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-02 12:47:51 -04:00
Namhyung Kim 8861dd303c ftrace: Access ret_stack->subtime only in the function profiler
The subtime is used only for function profiler with function graph
tracer enabled.  Move the definition of subtime under
CONFIG_FUNCTION_PROFILER to reduce the memory usage.  Also move the
initialization of subtime into the graph entry callback.

Link: http://lkml.kernel.org/r/20160831025529.24018-1-namhyung@kernel.org

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-01 12:19:40 -04:00
Namhyung Kim 613dccdf68 function_graph: Handle TRACE_BPUTS in print_graph_comment
It missed to handle TRACE_BPUTS so messages recorded by trace_bputs()
will be shown with symbol info unnecessarily.

You can see it with the trace_printk sample code:

  # cd /sys/kernel/tracing/
  # echo sys_sync > set_graph_function
  # echo 1 > options/sym-offset
  # echo function_graph > current_tracer

Note that the sys_sync filter was there to prevent recording other
functions and the sym-offset option was needed since the first message
was called from a module init function so kallsyms doesn't have the
symbol and omitted in the output.

  # cd ~/build/kernel
  # insmod samples/trace_printk/trace-printk.ko

  # cd -
  # head trace

Before:

  # tracer: function_graph
  #
  # CPU  DURATION                  FUNCTION CALLS
  # |     |   |                     |   |   |   |
   1)               |  /* 0xffffffffa0002000: This is a static string that will use trace_bputs */
   1)               |  /* This is a dynamic string that will use trace_puts */
   1)               |  /* trace_printk_irq_work+0x5/0x7b [trace_printk]: (irq) This is a static string that will use trace_bputs */
   1)               |  /* (irq) This is a dynamic string that will use trace_puts */
   1)               |  /* (irq) This is a static string that will use trace_bprintk() */
   1)               |  /* (irq) This is a dynamic string that will use trace_printk */

After:

  # tracer: function_graph
  #
  # CPU  DURATION                  FUNCTION CALLS
  # |     |   |                     |   |   |   |
   1)               |  /* This is a static string that will use trace_bputs */
   1)               |  /* This is a dynamic string that will use trace_puts */
   1)               |  /* (irq) This is a static string that will use trace_bputs */
   1)               |  /* (irq) This is a dynamic string that will use trace_puts */
   1)               |  /* (irq) This is a static string that will use trace_bprintk() */
   1)               |  /* (irq) This is a dynamic string that will use trace_printk */

Link: http://lkml.kernel.org/r/20160901024354.13720-1-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-01 11:19:55 -04:00
Dmitry Safonov 5ba8a4a96f tracing/uprobe: Drop isdigit() check in create_trace_uprobe
It's useless. Before:
  [tracing]# echo 'p:test /a:0x0' >> uprobe_events
  [tracing]# echo 'p:test a:0x0' >> uprobe_events
  -bash: echo: write error: No such file or directory
  [tracing]# echo 'p:test 1:0x0' >> uprobe_events
  -bash: echo: write error: Invalid argument

After:
  [tracing]# echo 'p:test 1:0x0' >> uprobe_events
  -bash: echo: write error: No such file or directory

Link: http://lkml.kernel.org/r/20160825152110.25663-3-dsafonov@virtuozzo.com

Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-01 11:18:09 -04:00
David S. Miller 6abdd5f593 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All three conflicts were cases of simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-30 00:54:02 -04:00
Josh Poimboeuf 223918e32a ftrace: Add ftrace_graph_ret_addr() stack unwinding helpers
When function graph tracing is enabled for a function, ftrace modifies
the stack by replacing the original return address with the address of a
hook function (return_to_handler).

Stack unwinders need a way to get the original return address.  Add an
arch-independent helper function for that named ftrace_graph_ret_addr().

This adds two variations of the function: one depends on
HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, and the other relies on an index state
variable.

The former is recommended because, in some cases, the latter can cause
problems when the unwinder skips stack frames.  It can get out of sync
with the ret_stack index and wrong addresses can be reported for the
stack trace.

Once all arches have been ported to use
HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, we can get rid of the distinction.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/36bd90f762fc5e5af3929e3797a68a64906421cf.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:15:14 +02:00
Josh Poimboeuf 9a7c348ba6 ftrace: Add return address pointer to ftrace_ret_stack
Storing this value will help prevent unwinders from getting out of sync
with the function graph tracer ret_stack.  Now instead of needing a
stateful iterator, they can compare the return address pointer to find
the right ret_stack entry.

Note that an array of 50 ftrace_ret_stack structs is allocated for every
task.  So when an arch implements this, it will add either 200 or 400
bytes of memory usage per task (depending on whether it's a 32-bit or
64-bit platform).

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a95cfcc39e8f26b89a430c56926af0bb217bc0a1.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:15:14 +02:00
Josh Poimboeuf daa460a88c ftrace: Only allocate the ret_stack 'fp' field when needed
This saves some memory when HAVE_FUNCTION_GRAPH_FP_TEST isn't defined.
On x86_64 with newer versions of gcc which have -mfentry, it saves 400
bytes per task.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/5c7747d9ea7b5cb47ef0a8ce8a6cea6bf7aa94bf.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:15:14 +02:00
Josh Poimboeuf e4a744ef2f ftrace: Remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config
Make HAVE_FUNCTION_GRAPH_FP_TEST a normal define, independent from
kconfig.  This removes some config file pollution and simplifies the
checking for the fp test.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2c4e5f05054d6d367f702fd153af7a0109dd5c81.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:15:13 +02:00
Masami Hiramatsu bdca79c2bf ftrace: kprobe: uprobe: Show u8/u16/u32/u64 types in decimal
Change kprobe/uprobe-tracer to show the arguments type-casted
with u8/u16/u32/u64 in decimal digits instead of hexadecimal.

To minimize compatibility issue, the arguments without type
casting are typed by x64 (or x32 for 32bit arch) by default.

Note: all arguments set by old perf probe without types are
shown in decimal by default.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Naohiro Aota <naohiro.aota@hgst.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/147151076135.12957.14684546093034343894.stgit@devbox
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-23 17:06:38 -03:00
Masami Hiramatsu 8642562555 ftrace: probe: Add README entries for k/uprobe-events
Add README entries for kprobe-events and uprobe-events.
This allows user to check what options can be acceptable
for running kernel.
E.g. perf tools can choose correct types for the kernel.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Naohiro Aota <naohiro.aota@hgst.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/147151069524.12957.12957179170304055028.stgit@devbox
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-23 15:39:57 -03:00
Masami Hiramatsu 17ce3dc7e5 ftrace: kprobe: uprobe: Add x8/x16/x32/x64 for hexadecimal types
Add x8/x16/x32/x64 for hexadecimal type casting to kprobe/uprobe event
tracer.

These type casts can be used for integer arguments for explicitly
showing them in hexadecimal digits in formatted text.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Naohiro Aota <naohiro.aota@hgst.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/147151067029.12957.11591314629326414783.stgit@devbox
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-23 15:38:09 -03:00
David S. Miller 60747ef4d1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor overlapping changes for both merge conflicts.

Resolution work done by Stephen Rothwell was used
as a reference.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 01:17:32 -04:00
Adrian Hunter 7afafc8a44 block: Fix secure erase
Commit 288dab8a35 ("block: add a separate operation type for secure
erase") split REQ_OP_SECURE_ERASE from REQ_OP_DISCARD without considering
all the places REQ_OP_DISCARD was being used to mean either. Fix those.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 288dab8a35 ("block: add a separate operation type for secure erase")
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-16 09:16:51 -06:00
Alexei Starovoitov 8937bd80fc bpf: allow bpf_get_prandom_u32() to be used in tracing
bpf_get_prandom_u32() was initially introduced for socket filters
and later requested numberous times to be added to tracing bpf programs
for the same reason as in socket filters: to be able to randomly
select incoming events.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-12 21:57:05 -07:00
Sargun Dhillon 60d20f9195 bpf: Add bpf_current_task_under_cgroup helper
This adds a bpf helper that's similar to the skb_in_cgroup helper to check
whether the probe is currently executing in the context of a specific
subset of the cgroupsv2 hierarchy. It does this based on membership test
for a cgroup arraymap. It is invalid to call this in an interrupt, and
it'll return an error. The helper is primarily to be used in debugging
activities for containers, where you may have multiple programs running in
a given top-level "container".

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-12 21:49:41 -07:00
Jens Axboe 1eff9d322a block: rename bio bi_rw to bi_opf
Since commit 63a4cc2486, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.

Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-07 14:41:02 -06:00
Linus Torvalds bf0f500bd0 A few updates and fixes:
. Move the suppressing of the __builtin_return_address >0 warning to the
    tracing directory only.
 
  . metag recordmcount fix for newer glibc's
 
  . Two tracing histogram fixes that were reported by KASAN
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXofc2AAoJEKKk/i67LK/8f7YIAI7YkUnzA7VZ/FmbgD+fu3MI
 XmLLb98dzEOEHKEUrmv/9TSj/W6cTVfgVH2z/U89J6nbPj56GgMf03qL1wn9l/6s
 kwxEt5GopmKwCdtnjGkLYZcg13OWottzmFoyn/koKCXFq7PwfGQdLzhwIQUpsXgG
 MxOk1Iv9TbACzz4k5aG866yhJu6cWDRSdC3cfv7F4xn+Z3GWggzCpW7fknXy66cJ
 iVsdUGZVz5O5jVJAFqzERZHBJQpraozjkKr3lprCdHuXa/EEAYQuuYG5WBxggYaQ
 eJ1my2p5MKkxORz1Nk9cGuFa6DW35spn9+iOOyTt6sRU/8tijGxTPLNWtKfJcVQ=
 =fbRU
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "A few updates and fixes:

   - move the suppressing of the __builtin_return_address >0 warning to
     the tracing directory only.

   - metag recordmcount fix for newer glibc's

   - two tracing histogram fixes that were reported by KASAN"

* tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix use-after-free in hist_register_trigger()
  tracing: Fix use-after-free in hist_unreg_all/hist_enable_unreg_all
  Makefile: Mute warning for __builtin_return_address(>0) for tracing only
  ftrace/recordmcount: Work around for addition of metag magic but not relocations
2016-08-03 12:50:06 -04:00
Tom Zanussi 7522c03ae3 tracing: Fix use-after-free in hist_register_trigger()
This fixes a use-after-free case flagged by KASAN; make sure the test
happens before the potential free in this case.

Link: http://lkml.kernel.org/r/48fd74ab61bebd7dca9714386bb47d7c5ccd6a7b.1467247517.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-08-02 15:16:30 -04:00
Steven Rostedt 47c1856971 tracing: Fix use-after-free in hist_unreg_all/hist_enable_unreg_all
While running tools/testing/selftests test suite with KASAN, Dmitry
Vyukov hit the following use-after-free report:

  ==================================================================
  BUG: KASAN: use-after-free in hist_unreg_all+0x1a1/0x1d0 at addr
  ffff880031632cc0
  Read of size 8 by task ftracetest/7413
  ==================================================================
  BUG kmalloc-128 (Not tainted): kasan: bad access detected
  ------------------------------------------------------------------

This fixes the problem, along with the same problem in
hist_enable_unreg_all().

Link: http://lkml.kernel.org/r/c3d05b79e42555b6e36a3a99aae0e37315ee5304.1467247517.git.tom.zanussi@linux.intel.com

Cc: Dmitry Vyukov <dvyukov@google.com>
[Copied Steve's hist_enable_unreg_all() fix to hist_unreg_all()]
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-08-02 15:16:02 -04:00
Steven Rostedt 377ccbb483 Makefile: Mute warning for __builtin_return_address(>0) for tracing only
With the latest gcc compilers, they give a warning if
__builtin_return_address() parameter is greater than 0. That is because if
it is used by a function called by a top level function (or in the case of
the kernel, by assembly), it can try to access stack frames outside the
stack and crash the system.

The tracing system uses __builtin_return_address() of up to 2! But it is
well aware of the dangers that it may have, and has even added precautions
to protect against it (see the thunk code in arch/x86/entry/thunk*.S)

Linus originally added KBUILD_CFLAGS that would suppress the warning for the
entire kernel, as simply adding KBUILD_CFLAGS to the tracing directory
wouldn't work. The tracing directory plays a bit with the CFLAGS and
requires a little more logic.

This adds that special logic to only suppress the warning for the tracing
directory. If it is used anywhere else outside of tracing, the warning will
still be triggered.

Link: http://lkml.kernel.org/r/20160728223043.51996267@grimm.local.home

Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-08-02 12:57:48 -04:00
Linus Torvalds c624c86615 This is mostly clean ups and small fixes. Some of the more visible
changes are:
 
  . The function pid code uses the event pid filtering logic
  . [ku]probe events have access to current->comm
  . trace_printk now has sample code
  . PCI devices now trace physical addresses
  . stack tracing has less unnessary functions traced
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXl+d2AAoJEKKk/i67LK/83QEH/RDJ0mcfFVsuEeOnZZrZXABm
 4Rxk4FE5UAD+TSrVycwwzcbQab1iPK63mMdYvIBvaOiIC6/OJaEVM7jzZxnNGqmr
 pj0H8bxwOr58pe5pfnP92ow5qTLLzsXraWNl5sRXhSSHON7CXpGVzkErB58GmMYd
 8p6d9ziifQjo8X2O6XC9rGAvYLY5kEkVvyfuE1hI7muNTeOjyOT4EqpkNzxdBk+I
 QkGZGsk3Xhc8II9nu8FPWkaD26TatGJoZtZmVWHOzfsb3HNzG4RXla+WVOQ5u1HV
 noVyB1CJHhkO5CEBPdYIqwBWPQU4B9HfG4gVcUpDDVRxfzMpnEcKi1uwe+uDjfs=
 =XFcv
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "This is mostly clean ups and small fixes.  Some of the more visible
  changes are:

   - The function pid code uses the event pid filtering logic
   - [ku]probe events have access to current->comm
   - trace_printk now has sample code
   - PCI devices now trace physical addresses
   - stack tracing has less unnessary functions traced"

* tag 'trace-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  printk, tracing: Avoiding unneeded blank lines
  tracing: Use __get_str() when manipulating strings
  tracing, RAS: Cleanup on __get_str() usage
  tracing: Use outer () on __get_str() definition
  ftrace: Reduce size of function graph entries
  tracing: Have HIST_TRIGGERS select TRACING
  tracing: Using for_each_set_bit() to simplify trace_pid_write()
  ftrace: Move toplevel init out of ftrace_init_tracefs()
  tracing/function_graph: Fix filters for function_graph threshold
  tracing: Skip more functions when doing stack tracing of events
  tracing: Expose CPU physical addresses (resource values) for PCI devices
  tracing: Show the preempt count of when the event was called
  tracing: Add trace_printk sample code
  tracing: Choose static tp_printk buffer by explicit nesting count
  tracing: expose current->comm to [ku]probe events
  ftrace: Have set_ftrace_pid use the bitmap like events do
  tracing: Move pid_list write processing into its own function
  tracing: Move the pid_list seq_file functions to be global
  tracing: Move filtered_pid helper functions into trace.c
  tracing: Make the pid filtering helper functions global
2016-07-28 18:20:09 -07:00
Linus Torvalds 468fc7ed55 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Unified UDP encapsulation offload methods for drivers, from
    Alexander Duyck.

 2) Make DSA binding more sane, from Andrew Lunn.

 3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.

 4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.

 5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
    packets as soon as the device sees them, with the option to mirror
    the packet on TX via the same interface.  From Brenden Blanco and
    others.

 6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.

 7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.

 8) Simplify netlink conntrack entry layout, from Florian Westphal.

 9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
    Schimmel, Yotam Gigi, and Jiri Pirko.

10) Add SKB array infrastructure and convert tun and macvtap over to it.
    From Michael S Tsirkin and Jason Wang.

11) Support qdisc packet injection in pktgen, from John Fastabend.

12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.

13) Add NV congestion control support to TCP, from Lawrence Brakmo.

14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.

15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.

16) Support MPLS over IPV4, from Simon Horman.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
  xgene: Fix build warning with ACPI disabled.
  be2net: perform temperature query in adapter regardless of its interface state
  l2tp: Correctly return -EBADF from pppol2tp_getname.
  net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
  net: ipmr/ip6mr: update lastuse on entry change
  macsec: ensure rx_sa is set when validation is disabled
  tipc: dump monitor attributes
  tipc: add a function to get the bearer name
  tipc: get monitor threshold for the cluster
  tipc: make cluster size threshold for monitoring configurable
  tipc: introduce constants for tipc address validation
  net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
  MAINTAINERS: xgene: Add driver and documentation path
  Documentation: dtb: xgene: Add MDIO node
  dtb: xgene: Add MDIO node
  drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
  drivers: net: xgene: Use exported functions
  drivers: net: xgene: Enable MDIO driver
  drivers: net: xgene: Add backward compatibility
  drivers: net: phy: xgene: Add MDIO driver
  ...
2016-07-27 12:03:20 -07:00
Linus Torvalds 3fc9d69093 Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
 "This branch also contains core changes.  I've come to the conclusion
  that from 4.9 and forward, I'll be doing just a single branch.  We
  often have dependencies between core and drivers, and it's hard to
  always split them up appropriately without pulling core into drivers
  when that happens.

  That said, this contains:

   - separate secure erase type for the core block layer, from
     Christoph.

   - set of discard fixes, from Christoph.

   - bio shrinking fixes from Christoph, as a followup up to the
     op/flags change in the core branch.

   - map and append request fixes from Christoph.

   - NVMeF (NVMe over Fabrics) code from Christoph.  This is pretty
     exciting!

   - nvme-loop fixes from Arnd.

   - removal of ->driverfs_dev from Dan, after providing a
     device_add_disk() helper.

   - bcache fixes from Bhaktipriya and Yijing.

   - cdrom subchannel read fix from Vchannaiah.

   - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.

   - set of drbd updates and fixes from Fabian, Lars, and Philipp.

   - mg_disk error path fix from Bart.

   - user notification for failed device add for loop, from Minfei.

   - NVMe in general:
        + NVMe delay quirk from Guilherme.
        + SR-IOV support and command retry limits from Keith.
        + fix for memory-less NUMA node from Masayoshi.
        + use UINT_MAX for discard sectors, from Minfei.
        + cancel IO fixes from Ming.
        + don't allocate unused major, from Neil.
        + error code fixup from Dan.
        + use constants for PSDT/FUSE from James.
        + variable init fix from Jay.
        + fabrics fixes from Ming, Sagi, and Wei.
        + various fixes"

* 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
  nvme/pci: Provide SR-IOV support
  nvme: initialize variable before logical OR'ing it
  block: unexport various bio mapping helpers
  scsi/osd: open code blk_make_request
  target: stop using blk_make_request
  block: simplify and export blk_rq_append_bio
  block: ensure bios return from blk_get_request are properly initialized
  virtio_blk: use blk_rq_map_kern
  memstick: don't allow REQ_TYPE_BLOCK_PC requests
  block: shrink bio size again
  block: simplify and cleanup bvec pool handling
  block: get rid of bio_rw and READA
  block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
  block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
  NVMe: don't allocate unused nvme_major
  nvme: avoid crashes when node 0 is memoryless node.
  nvme: Limit command retries
  loop: Make user notify for adding loop device failed
  nvme-loop: fix nvme-loop Kconfig dependencies
  nvmet: fix return value check in nvmet_subsys_alloc()
  ...
2016-07-26 15:37:51 -07:00
Linus Torvalds d05d7f4079 Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:

   - the big change is the cleanup from Mike Christie, cleaning up our
     uses of command types and modified flags.  This is what will throw
     some merge conflicts

   - regression fix for the above for btrfs, from Vincent

   - following up to the above, better packing of struct request from
     Christoph

   - a 2038 fix for blktrace from Arnd

   - a few trivial/spelling fixes from Bart Van Assche

   - a front merge check fix from Damien, which could cause issues on
     SMR drives

   - Atari partition fix from Gabriel

   - convert cfq to highres timers, since jiffies isn't granular enough
     for some devices these days.  From Jan and Jeff

   - CFQ priority boost fix idle classes, from me

   - cleanup series from Ming, improving our bio/bvec iteration

   - a direct issue fix for blk-mq from Omar

   - fix for plug merging not involving the IO scheduler, like we do for
     other types of merges.  From Tahsin

   - expose DAX type internally and through sysfs.  From Toshi and Yigal

* 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
  block: Fix front merge check
  block: do not merge requests without consulting with io scheduler
  block: Fix spelling in a source code comment
  block: expose QUEUE_FLAG_DAX in sysfs
  block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
  Btrfs: fix comparison in __btrfs_map_block()
  block: atari: Return early for unsupported sector size
  Doc: block: Fix a typo in queue-sysfs.txt
  cfq-iosched: Charge at least 1 jiffie instead of 1 ns
  cfq-iosched: Fix regression in bonnie++ rewrite performance
  cfq-iosched: Convert slice_resid from u64 to s64
  block: Convert fifo_time from ulong to u64
  blktrace: avoid using timespec
  block/blk-cgroup.c: Declare local symbols static
  block/bio-integrity.c: Add #include "blk.h"
  block/partition-generic.c: Remove a set-but-not-used variable
  block: bio: kill BIO_MAX_SIZE
  cfq-iosched: temporarily boost queue priority for idle classes
  block: drbd: avoid to use BIO_MAX_SIZE
  block: bio: remove BIO_MAX_SECTORS
  ...
2016-07-26 15:03:07 -07:00
Sargun Dhillon 96ae522795 bpf: Add bpf_probe_write_user BPF helper to be called in tracers
This allows user memory to be written to during the course of a kprobe.
It shouldn't be used to implement any kind of security mechanism
because of TOC-TOU attacks, but rather to debug, divert, and
manipulate execution of semi-cooperative processes.

Although it uses probe_kernel_write, we limit the address space
the probe can write into by checking the space with access_ok.
We do this as opposed to calling copy_to_user directly, in order
to avoid sleeping. In addition we ensure the threads's current fs
/ segment is USER_DS and the thread isn't exiting nor a kernel thread.

Given this feature is meant for experiments, and it has a risk of
crashing the system, and running programs, we print a warning on
when a proglet that attempts to use this helper is installed,
along with the pid and process name.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25 18:07:48 -07:00
Andrew Morton 183fc1537e kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union initialization bug
kernel/trace/bpf_trace.c: In function 'bpf_event_output':
kernel/trace/bpf_trace.c:312: error: unknown field 'next' specified in initializer
kernel/trace/bpf_trace.c:312: warning: missing braces around initializer
kernel/trace/bpf_trace.c:312: warning: (near initialization for 'raw.frag.<anonymous>')

Fixes: 555c8a8623 ("bpf: avoid stack copy and use skb ctx for event output")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 19:27:01 -07:00
Daniel Borkmann 555c8a8623 bpf: avoid stack copy and use skb ctx for event output
This work addresses a couple of issues bpf_skb_event_output()
helper currently has: i) We need two copies instead of just a
single one for the skb data when it should be part of a sample.
The data can be non-linear and thus needs to be extracted via
bpf_skb_load_bytes() helper first, and then copied once again
into the ring buffer slot. ii) Since bpf_skb_load_bytes()
currently needs to be used first, the helper needs to see a
constant size on the passed stack buffer to make sure BPF
verifier can do sanity checks on it during verification time.
Thus, just passing skb->len (or any other non-constant value)
wouldn't work, but changing bpf_skb_load_bytes() is also not
the proper solution, since the two copies are generally still
needed. iii) bpf_skb_load_bytes() is just for rather small
buffers like headers, since they need to sit on the limited
BPF stack anyway. Instead of working around in bpf_skb_load_bytes(),
this work improves the bpf_skb_event_output() helper to address
all 3 at once.

We can make use of the passed in skb context that we have in
the helper anyway, and use some of the reserved flag bits as
a length argument. The helper will use the new __output_custom()
facility from perf side with bpf_skb_copy() as callback helper
to walk and extract the data. It will pass the data for setup
to bpf_event_output(), which generates and pushes the raw record
with an additional frag part. The linear data used in the first
frag of the record serves as programmatically defined meta data
passed along with the appended sample.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:23:56 -07:00
Daniel Borkmann 8e7a3920ac bpf, perf: split bpf_perf_event_output
Split the bpf_perf_event_output() helper as a preparation into
two parts. The new bpf_perf_event_output() will prepare the raw
record itself and test for unknown flags from BPF trace context,
where the __bpf_perf_event_output() does the core work. The
latter will be reused later on from bpf_event_output() directly.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:23:56 -07:00
Daniel Borkmann 7e3f977edd perf, events: add non-linear data support for raw records
This patch adds support for non-linear data on raw records. It
extends raw records to have one or multiple fragments that will
be written linearly into the ring slot, where each fragment can
optionally have a custom callback handler to walk and extract
complex, possibly non-linear data.

If a callback handler is provided for a fragment, then the new
__output_custom() will be used instead of __output_copy() for
the perf_output_sample() part. perf_prepare_sample() does all
the size calculation only once, so perf_output_sample() doesn't
need to redo the same work anymore, meaning real_size and padding
will be cached in the raw record. The raw record becomes 32 bytes
in size without holes; to not increase it further and to avoid
doing unnecessary recalculations in fast-path, we can reuse
next pointer of the last fragment, idea here is borrowed from
ZERO_OR_NULL_PTR(), which should keep the perf_output_sample()
path for PERF_SAMPLE_RAW minimal.

This facility is needed for BPF's event output helper as a first
user that will, in a follow-up, add an additional perf_raw_frag
to its perf_raw_record in order to be able to more efficiently
dump skb context after a linear head meta data related to it.
skbs can be non-linear and thus need a custom output function to
dump buffers. Currently, the skb data needs to be copied twice;
with the help of __output_custom() this work only needs to be
done once. Future users could be things like XDP/BPF programs
that work on different context though and would thus also have
a different callback function.

The few users of raw records are adapted to initialize their frag
data from the raw record itself, no change in behavior for them.
The code is based upon a PoC diff provided by Peter Zijlstra [1].

  [1] http://thread.gmane.org/gmane.linux.network/421294

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:23:56 -07:00
Alexei Starovoitov 606274c5ab bpf: introduce bpf_get_current_task() helper
over time there were multiple requests to access different data
structures and fields of task_struct current, so finally add
the helper to access 'current' as-is. Tracing bpf programs will do
the rest of walking the pointers via bpf_probe_read().
Note that current can be null and bpf program has to deal it with,
but even dumb passing null into bpf_probe_read() is still safe.

Suggested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09 00:00:16 -04:00
Namhyung Kim a4a551b8f1 ftrace: Reduce size of function graph entries
Currently ftrace_graph_ent{,_entry} and ftrace_graph_ret{,_entry} struct
can have padding bytes at the end due to alignment in 64-bit data type.
As these data are recorded so frequently, those paddings waste
non-negligible space.  As the ring buffer maintains alignment properly
for each architecture, just to remove the extra padding using 'packed'
attribute.

  ftrace_graph_ent_entry:  24 -> 20
  ftrace_graph_ret_entry:  48 -> 44

Also I moved the 'overrun' field in struct ftrace_graph_ret to minimize
the padding in the middle.

Tested on x86_64 only.

Link: http://lkml.kernel.org/r/1467197808-13578-1-git-send-email-namhyung@kernel.org

Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-07-05 17:28:30 -04:00
Tom Zanussi 7ad8fb61c4 tracing: Have HIST_TRIGGERS select TRACING
The kbuild test robot reported a compile error if HIST_TRIGGERS was
enabled but nothing else that selected TRACING was configured in.

HIST_TRIGGERS should directly select it and not rely on anything else
to do it.

Link: http://lkml.kernel.org/r/57791866.8080505@linux.intel.com

Reported-by: kbuild test robot <fennguang.wu@intel.com>
Fixes: 7ef224d1d0 ("tracing: Add 'hist' event trigger command")
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-07-05 15:49:01 -04:00
Wei Yongjun 67f20b0845 tracing: Using for_each_set_bit() to simplify trace_pid_write()
Using for_each_set_bit() to simplify the code.

Link: http://lkml.kernel.org/r/1467645004-11169-1-git-send-email-weiyj_lk@163.com

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-07-05 11:22:40 -04:00
Steven Rostedt (Red Hat) 501c237525 ftrace: Move toplevel init out of ftrace_init_tracefs()
Commit 345ddcc882 ("ftrace: Have set_ftrace_pid use the bitmap like events
do") placed ftrace_init_tracefs into the instance creation, and encapsulated
the top level updating with an if conditional, as the top level only gets
updated at boot up. Unfortunately, this triggers section mismatch errors as
the init functions are called from a function that can be called later, and
the section mismatch logic is unaware of the if conditional that would
prevent it from happening at run time.

To make everyone happy, create a separate ftrace_init_tracefs_toplevel()
routine that only gets called by init functions, and this will be what calls
other init functions for the toplevel directory.

Link: http://lkml.kernel.org/r/20160704102139.19cbc0d9@gandalf.local.home

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 345ddcc882 ("ftrace: Have set_ftrace_pid use the bitmap like events do")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-07-05 10:47:03 -04:00
Daniel Borkmann 6816a7ffce bpf, trace: add BPF_F_CURRENT_CPU flag for bpf_perf_event_read
Follow-up commit to 1e33759c78 ("bpf, trace: add BPF_F_CURRENT_CPU
flag for bpf_perf_event_output") to add the same functionality into
bpf_perf_event_read() helper. The split of index into flags and index
component is also safe here, since such large maps are rejected during
map allocation time.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30 05:54:40 -04:00
Daniel Borkmann d793133031 bpf, trace: fetch current cpu only once
We currently have two invocations, which is unnecessary. Fetch it only
once and use the smp_processor_id() variant, so we also get preemption
checks along with it when DEBUG_PREEMPT is set.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30 05:54:40 -04:00
Daniel Borkmann 1ca1cc98bf bpf: minor cleanups on fd maps and helpers
Some minor cleanups: i) Remove the unlikely() from fd array map lookups
and let the CPU branch predictor do its job, scenarios where there is not
always a map entry are very well valid. ii) Move the attribute type check
in the bpf_perf_event_read() helper a bit earlier so it's consistent wrt
checks with bpf_perf_event_output() helper as well. iii) remove some
comments that are self-documenting in kprobe_prog_is_valid_access() and
therefore make it consistent to tp_prog_is_valid_access() as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30 05:54:40 -04:00
David S. Miller ee58b57100 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of overlapping changes, except the packet scheduler
conflicts which deal with the addition of the free list parameter
to qdisc_enqueue().

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30 05:03:36 -04:00
Linus Torvalds 32826ac41f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "I've been traveling so this accumulates more than week or so of bug
  fixing.  It perhaps looks a little worse than it really is.

   1) Fix deadlock in ath10k driver, from Ben Greear.

   2) Increase scan timeout in iwlwifi, from Luca Coelho.

   3) Unbreak STP by properly reinjecting STP packets back into the
      stack.  Regression fix from Ido Schimmel.

   4) Mediatek driver fixes (missing malloc failure checks, leaking of
      scratch memory, wrong indexing when mapping TX buffers, etc.) from
      John Crispin.

   5) Fix endianness bug in icmpv6_err() handler, from Hannes Frederic
      Sowa.

   6) Fix hashing of flows in UDP in the ruseport case, from Xuemin Su.

   7) Fix netlink notifications in ovs for tunnels, delete link messages
      are never emitted because of how the device registry state is
      handled.  From Nicolas Dichtel.

   8) Conntrack module leaks kmemcache on unload, from Florian Westphal.

   9) Prevent endless jump loops in nft rules, from Liping Zhang and
      Pablo Neira Ayuso.

  10) Not early enough spinlock initialization in mlx4, from Eric
      Dumazet.

  11) Bind refcount leak in act_ipt, from Cong WANG.

  12) Missing RCU locking in HTB scheduler, from Florian Westphal.

  13) Several small MACSEC bug fixes from Sabrina Dubroca (missing RCU
      barrier, using heap for SG and IV, and erroneous use of async flag
      when allocating AEAD conext.)

  14) RCU handling fix in TIPC, from Ying Xue.

  15) Pass correct protocol down into ipv4_{update_pmtu,redirect}() in
      SIT driver, from Simon Horman.

  16) Socket timer deadlock fix in TIPC from Jon Paul Maloy.

  17) Fix potential deadlock in team enslave, from Ido Schimmel.

  18) Memory leak in KCM procfs handling, from Jiri Slaby.

  19) ESN generation fix in ipv4 ESP, from Herbert Xu.

  20) Fix GFP_KERNEL allocations with locks held in act_ife, from Cong
      WANG.

  21) Use after free in netem, from Eric Dumazet.

  22) Uninitialized last assert time in multicast router code, from Tom
      Goff.

  23) Skip raw sockets in sock_diag destruction broadcast, from Willem
      de Bruijn.

  24) Fix link status reporting in thunderx, from Sunil Goutham.

  25) Limit resegmentation of retransmit queue so that we do not
      retransmit too large GSO frames.  From Eric Dumazet.

  26) Delay bpf program release after grace period, from Daniel
      Borkmann"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (141 commits)
  openvswitch: fix conntrack netlink event delivery
  qed: Protect the doorbell BAR with the write barriers.
  neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()
  e1000e: keep VLAN interfaces functional after rxvlan off
  cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC header
  qlcnic: use the correct ring in qlcnic_83xx_process_rcv_ring_diag()
  bpf, perf: delay release of BPF prog after grace period
  net: bridge: fix vlan stats continue counter
  tcp: do not send too big packets at retransmit time
  ibmvnic: fix to use list_for_each_safe() when delete items
  net: thunderx: Fix TL4 configuration for secondary Qsets
  net: thunderx: Fix link status reporting
  net/mlx5e: Reorganize ethtool statistics
  net/mlx5e: Fix number of PFC counters reported to ethtool
  net/mlx5e: Prevent adding the same vxlan port
  net/mlx5e: Check for BlueFlame capability before allocating SQ uar
  net/mlx5e: Change enum to better reflect usage
  net/mlx5: Add ConnectX-5 PCIe 4.0 to list of supported devices
  net/mlx5: Update command strings
  net: marvell: Add separate config ANEG function for Marvell 88E1111
  ...
2016-06-29 11:50:42 -07:00
Joel Fernandes 7fa8b7171a tracing/function_graph: Fix filters for function_graph threshold
Function graph tracer currently ignores filters if tracing_thresh is set.
For example, even if set_ftrace_pid is set, then its ignored if tracing_thresh
set, resulting in all processes being traced.

To fix this, we reuse the same entry function as when tracing_thresh is not
set and do everything as in the regular case except for writing the function entry
to the ring buffer.

Link: http://lkml.kernel.org/r/1466228694-2677-1-git-send-email-agnel.joel@gmail.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Joel Fernandes <agnel.joel@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-27 13:29:24 -04:00
Steven Rostedt (Red Hat) be54f69c26 tracing: Skip more functions when doing stack tracing of events
# echo 1 > options/stacktrace
 # echo 1 > events/sched/sched_switch/enable
 # cat trace
          <idle>-0     [002] d..2  1982.525169: <stack trace>
 => save_stack_trace
 => __ftrace_trace_stack
 => trace_buffer_unlock_commit_regs
 => event_trigger_unlock_commit
 => trace_event_buffer_commit
 => trace_event_raw_event_sched_switch
 => __schedule
 => schedule
 => schedule_preempt_disabled
 => cpu_startup_entry
 => start_secondary

The above shows that we are seeing 6 functions before ever making it to the
caller of the sched_switch event.

 # echo stacktrace > events/sched/sched_switch/trigger
 # cat trace
          <idle>-0     [002] d..3  2146.335208: <stack trace>
 => trace_event_buffer_commit
 => trace_event_raw_event_sched_switch
 => __schedule
 => schedule
 => schedule_preempt_disabled
 => cpu_startup_entry
 => start_secondary

The stacktrace trigger isn't as bad, because it adds its own skip to the
stacktracing, but still has two events extra.

One issue is that if the stacktrace passes its own "regs" then there should
be no addition to the skip, as the regs will not include the functions being
called. This was an issue that was fixed by commit 7717c6be69 ("tracing:
Fix stacktrace skip depth in trace_buffer_unlock_commit_regs()" as adding
the skip number for kprobes made the probes not have any stack at all.

But since this is only an issue when regs is being used, a skip should be
added if regs is NULL. Now we have:

 # echo 1 > options/stacktrace
 # echo 1 > events/sched/sched_switch/enable
 # cat trace
          <idle>-0     [000] d..2  1297.676333: <stack trace>
 => __schedule
 => schedule
 => schedule_preempt_disabled
 => cpu_startup_entry
 => rest_init
 => start_kernel
 => x86_64_start_reservations
 => x86_64_start_kernel

 # echo stacktrace > events/sched/sched_switch/trigger
 # cat trace
          <idle>-0     [002] d..3  1370.759745: <stack trace>
 => __schedule
 => schedule
 => schedule_preempt_disabled
 => cpu_startup_entry
 => start_secondary

And kprobes are not touched.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-23 18:48:56 -04:00
Bjorn Helgaas 9a51933e36 tracing: Expose CPU physical addresses (resource values) for PCI devices
Previously, mmio_print_pcidev() put "user" addresses in the trace buffer.
On most architectures, these are the same as CPU physical addresses, but on
microblaze, mips, powerpc, and sparc, they may be something else, typically
a raw BAR value (a bus address as opposed to a CPU address).

Always expose the CPU physical address to avoid this arch-dependent
behavior.

This change should have no user-visible effect because this file currently
depends on CONFIG_HAVE_MMIOTRACE_SUPPORT, which is only defined for x86,
and pci_resource_to_user() is a no-op on x86.

Link: http://lkml.kernel.org/r/20160511190657.5898.4248.stgit@bhelgaas-glaptop2.roam.corp.google.com

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20 09:54:22 -04:00
Steven Rostedt (Red Hat) e947841c0d tracing: Show the preempt count of when the event was called
Because tracepoint callbacks are done with preemption enabled, the trace
events are always called with preempt disable due to the
rcu_read_lock_sched_notrace() in __DO_TRACE(). This causes the preempt count
shown in the recorded trace event to be inaccurate. It is always one more
that what the preempt_count was when the tracepoint was called.

If CONFIG_PREEMPT is enabled, subtract 1 from the preempt_count before
recording it in the trace buffer.

Link: http://lkml.kernel.org/r/20160525132537.GA10808@linutronix.de

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20 09:54:21 -04:00
Andy Lutomirski e2ace00117 tracing: Choose static tp_printk buffer by explicit nesting count
Currently, the trace_printk code chooses which static buffer to use based
on what type of atomic context (NMI, IRQ, etc) it's in.  Simplify the
code and make it more robust: simply count the nesting depth and choose
a buffer based on the current nesting depth.

The new code will only drop an event if we nest more than 4 deep,
and the old code was guaranteed to malfunction if that happened.

Link: http://lkml.kernel.org/r/07ab03aecfba25fcce8f9a211b14c9c5e2865c58.1464289095.git.luto@kernel.org

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20 09:54:20 -04:00
Omar Sandoval 35abb67de7 tracing: expose current->comm to [ku]probe events
ftrace is very quick to give up on saving the task command line (see
`trace_save_cmdline()`). The workaround for events which really care
about the command line is to explicitly assign it as part of the entry.
However, this doesn't work for kprobe events, as there's no
straightforward way to get access to current->comm. Add a kprobe/uprobe
event variable $comm which provides exactly that.

Link: http://lkml.kernel.org/r/f59b472033b943a370f5f48d0af37698f409108f.1465435894.git.osandov@fb.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20 09:54:19 -04:00
Steven Rostedt (Red Hat) 345ddcc882 ftrace: Have set_ftrace_pid use the bitmap like events do
Convert set_ftrace_pid to use the bitmap like set_event_pid does. This
allows for instances to use the pid filtering as well, and will allow for
function-fork option to set if the children of a traced function should be
traced or not.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20 09:54:19 -04:00
Steven Rostedt (Red Hat) 76c813e266 tracing: Move pid_list write processing into its own function
The addition of PIDs into a pid_list via the write operation of
set_event_pid is a bit complex. The same operation will be needed for
function tracing pids. Move the code into its own generic function in
trace.c, so that we can avoid duplication of this code.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20 09:54:18 -04:00