Commit graph

162 commits

Author SHA1 Message Date
Steven Rostedt (VMware) bcea3f96e1 tracing: Add SPDX License format tags to tracing files
Add the SPDX License header to ease license compliance management.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-16 19:08:06 -04:00
Steven Rostedt (VMware) e0a568dcd1 tracing: Fix synchronizing to event changes with tracepoint_synchronize_unregister()
Now that some trace events can be protected by srcu_read_lock(tracepoint_srcu),
we need to make sure all locations that depend on this are also protected.
There were many places that did a synchronize_sched() thinking that it was
enough to protect againts access to trace events. This use to be the case,
but now that we use SRCU for _rcuidle() trace events, they may not be
protected by synchronize_sched(), as they may be called in paths that RCU is
not watching for preempt disable.

Fixes: e6753f23d9 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-10 15:12:01 -04:00
Steven Rostedt (VMware) f90658725b tracing: Make create_filter() code match the comments
The comment in create_filter() states that the passed in filter pointer
(filterp) will either be NULL or contain an error message stating why the
filter failed. But it also expects the filter pointer to point to NULL when
passed in. If it is not, the function create_filter_start() will warn and
return an error message without updating the filter pointer. This is not
what the comment states.

As we always expect the pointer to point to NULL, if it is not, trigger a
WARN_ON(), set it to NULL, and then continue the path as the rest will work
as the comment states. Also update the comment to state it must point to
NULL.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-03 18:14:40 -04:00
Steven Rostedt (VMware) 70303420b5 tracing: Check for no filter when processing event filters
The syzkaller detected a out-of-bounds issue with the events filter code,
specifically here:

	prog[N].pred = NULL;					/* #13 */
	prog[N].target = 1;		/* TRUE */
	prog[N+1].pred = NULL;
	prog[N+1].target = 0;		/* FALSE */
->	prog[N-1].target = N;
	prog[N-1].when_to_branch = false;

As that's the first reference to a "N-1" index, it appears that the code got
here with N = 0, which means the filter parser found no filter to parse
(which shouldn't ever happen, but apparently it did).

Add a new error to the parsing code that will check to make sure that N is
not zero before going into this part of the code. If N = 0, then -EINVAL is
returned, and a error message is added to the filter.

Cc: stable@vger.kernel.org
Fixes: 80765597bc ("tracing: Rewrite filter logic to be simpler and faster")
Reported-by: air icy <icytxw@gmail.com>
bugzilla url: https://bugzilla.kernel.org/show_bug.cgi?id=200019
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-06-21 15:12:42 -04:00
Kees Cook 6da2ec5605 treewide: kmalloc() -> kmalloc_array()
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:

        kmalloc(a * b, gfp)

with:
        kmalloc_array(a * b, gfp)

as well as handling cases of:

        kmalloc(a * b * c, gfp)

with:

        kmalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kmalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kmalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kmalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kmalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kmalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kmalloc
+ kmalloc_array
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kmalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kmalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kmalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kmalloc(sizeof(THING) * C2, ...)
|
  kmalloc(sizeof(TYPE) * C2, ...)
|
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Steven Rostedt (VMware) 10f20e9f9d tracing: Have zero size length in filter logic be full string
As strings in trace events may not have a nul terminating character, the
filter string compares use the defined string length for the field for the
compares.

The trace_marker records data slightly different than do normal events. It's
size is zero, meaning that the string is the rest of the array, and that the
string also ends with '\0'.

If the size is zero, assume that the string is nul terminated and read the
string in question as is.

Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-05-29 08:29:04 -04:00
Steven Rostedt (VMware) dc432c3d7f tracing: Fix regex_match_front() to not over compare the test string
The regex match function regex_match_front() in the tracing filter logic,
was fixed to test just the pattern length from testing the entire test
string. That is, it went from strncmp(str, r->pattern, len) to
strcmp(str, r->pattern, r->len).

The issue is that str is not guaranteed to be nul terminated, and if r->len
is greater than the length of str, it can access more memory than is
allocated.

The solution is to add a simple test if (len < r->len) return 0.

Cc: stable@vger.kernel.org
Fixes: 285caad415 ("tracing/filters: Fix MATCH_FRONT_ONLY filter matching")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-05-11 10:56:42 -04:00
Ravi Bangoria ba16293dad tracing: Fix kernel crash while using empty filter with perf
Kernel is crashing when user tries to record 'ftrace:function' event
with empty filter:

  # perf record -e ftrace:function --filter="" ls

  # dmesg
  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  Oops: 0000 [#1] SMP PTI
  ...
  RIP: 0010:ftrace_profile_set_filter+0x14b/0x2d0
  RSP: 0018:ffffa4a7c0da7d20 EFLAGS: 00010246
  RAX: ffffa4a7c0da7d64 RBX: 0000000000000000 RCX: 0000000000000006
  RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff8c48ffc968f0
  ...
  Call Trace:
   _perf_ioctl+0x54a/0x6b0
   ? rcu_all_qs+0x5/0x30
  ...

After patch:
  # perf record -e ftrace:function --filter="" ls
  failed to set filter "" on event ftrace:function with 22 (Invalid argument)

Also, if user tries to echo "" > filter, it used to throw an error.
This behavior got changed by commit 80765597bc ("tracing: Rewrite
filter logic to be simpler and faster"). This patch restores the
behavior as a side effect:

Before patch:
  # echo "" > filter
  #

After patch:
  # echo "" > filter
  bash: echo: write error: Invalid argument
  #

Link: http://lkml.kernel.org/r/20180420150758.19787-1-ravi.bangoria@linux.ibm.com

Fixes: 80765597bc ("tracing: Rewrite filter logic to be simpler and faster")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-25 10:27:55 -04:00
Steven Rostedt (VMware) 0b3dec05db tracing: Enforce passing in filter=NULL to create_filter()
There's some inconsistency with what to set the output parameter filterp
when passing to create_filter(..., struct event_filter **filterp).

Whatever filterp points to, should be NULL when calling this function. The
create_filter() calls create_filter_start() with a pointer to a local
"filter" variable that is set to NULL. The create_filter_start() has a
WARN_ON() if the passed in pointer isn't pointing to a value set to NULL.

Ideally, create_filter() should pass the filterp variable it received to
create_filter_start() and not hide it as with a local variable, this allowed
create_filter() to fail, and not update the passed in filter, and the caller
of create_filter() then tried to free filter, which was never initialized to
anything, causing memory corruption.

Link: http://lkml.kernel.org/r/00000000000032a0c30569916870@google.com

Fixes: 80765597bc ("tracing: Rewrite filter logic to be simpler and faster")
Reported-by: syzbot+dadcc936587643d7f568@syzkaller.appspotmail.com
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-11 11:31:08 -04:00
Jérémy Lefaure 0a4d0564f0 tracing: Use ARRAY_SIZE() macro instead of open coding it
It is useless to re-invent the ARRAY_SIZE macro so let's use it instead
of DATA_CNT.

Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
 (sizeof(E)@p /sizeof(*E))
|
 (sizeof(E)@p /sizeof(E[...]))
|
 (sizeof(E)@p /sizeof(T))
)

Link: http://lkml.kernel.org/r/20171016012250.26453-1-jeremy.lefaure@lse.epita.fr

Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
[ Removed useless include of kernel.h ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-11 11:30:33 -04:00
Steven Rostedt (VMware) 8ec8405f08 tracing: Add rcu dereference annotation for test func that touches filter->prog
A boot up test function update_pred_fn() dereferences filter->prog without
the proper rcu annotation.

To do this, we must also take the event_mutex first. Normally, this isn't
needed because this test function can not race with other use cases that
touch the event filters (it is disabled if any events are enabled).

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 80765597bc ("tracing: Rewrite filter logic to be simpler and faster")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06 08:56:54 -04:00
Steven Rostedt (VMware) 1f3b0faa3e tracing: Add rcu dereference annotation for filter->prog
ftrace_function_set_filter() referenences filter->prog without annotation
and sparse complains about it. It needs a rcu_dereference_protected()
wrapper.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 80765597bc ("tracing: Rewrite filter logic to be simpler and faster")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06 08:56:53 -04:00
Steven Rostedt (VMware) 80765597bc tracing: Rewrite filter logic to be simpler and faster
Al Viro reviewed the filter logic of ftrace trace events and found it to be
very troubling. It creates a binary tree based on the logic operators and
walks it during tracing. He sent myself and Tom Zanussi a long explanation
(and formal proof) of how to do the string parsing better and end up with a
program array that can be simply iterated to come up with the correct
results.

I took his ideas and his pseudo code and rewrote the filter logic based on
them. In doing so, I was able to remove a lot of code, and have a much more
condensed filter logic in the process. I wrote a very long comment
describing the methadology that Al proposed in my own words. For more info
on how this works, read the comment above predicate_parse().

Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-14 12:35:39 -04:00
Steven Rostedt (VMware) 478325f188 tracing: Clean up and document pred_funcs_##type creation and use
The pred_funcs_##type arrays consist of five functions that are assigned
based on the ops. The array must be in the same order of the ops each
function represents. The PRED_FUNC_START macro denotes the op enum that
starts the op that maps to the pred_funcs_##type arrays. This is all very
subtle and prone to bugs if the code is changed.

Add comments describing how PRED_FUNC_START and pred_funcs_##type array is
used, and also a PRED_FUNC_MAX that is the maximum number of functions in
the arrays.

Clean up select_comparison_fn() that assigns the predicates to the
pred_funcs_##type array function as well as add protection in case an op is
passed in that does not map correctly to the array.

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-14 12:35:20 -04:00
Steven Rostedt (VMware) e9baef0d86 tracing: Combine enum and arrays into single macro in filter code
Instead of having a separate enum that is the index into another array, like
a string array, make a single macro that combines them into a single list,
and then the two can not get out of sync. This makes it easier to add and
remove items.

The macro trick is:

 #define DOGS				\
  C( JACK,     "Jack Russell")		\
  C( ITALIAN,  "Italian Greyhound")	\
  C( GERMAN,   "German Shepherd")

 #undef C
 #define C(a, b) a

 enum { DOGS };

 #undef C
 #define C(a, b) b

 static char dogs[] = { DOGS };

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-14 12:32:18 -04:00
Steven Rostedt (VMware) 567f6989fd tracing: Embed replace_filter_string() helper function
The replace_filter_string() frees the current string and then copies a given
string. But in the two locations that it was used, the allocation happened
right after the filter was allocated (nothing to replace). There's no need
for this to be a helper function. Embedding the allocation in the two places
where it was called will make changing the code in the future easier.

Also make the variable consistent (always use "filter_string" as the name,
as it was used in one instance as "filter_str")

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10 16:06:07 -05:00
Steven Rostedt (VMware) 404a3add43 tracing: Only add filter list when needed
replace_system_preds() creates a filter list to free even when it doesn't
really need to have it. Only save filters that require synchronize_sched()
in the filter list to free. This will allow the code to be updated a bit
easier in the future.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10 16:06:07 -05:00
Steven Rostedt (VMware) c7399708b3 tracing: Remove filter allocator helper
The __alloc_filter() function does nothing more that allocate the filter.
There's no reason to have it as a helper function.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10 16:06:06 -05:00
Steven Rostedt (VMware) 559d421267 tracing: Use trace_seq instead of open code string appending
The filter code does open code string appending to produce an error message.
Instead it can be simplified by using trace_seq function helpers.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10 16:06:06 -05:00
Steven Rostedt (VMware) a0ff08fd4e tracing: Remove BUG_ON() from append_filter_string()
There's no reason to BUG if there's a bug in the filtering code. Simply do a
warning and return.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10 16:06:05 -05:00
Steven Rostedt (VMware) 0723402141 tracing: Fix parsing of globs with a wildcard at the beginning
Al Viro reported:

    For substring - sure, but what about something like "*a*b" and "a*b"?
    AFAICS, filter_parse_regex() ends up with identical results in both
    cases - MATCH_GLOB and *search = "a*b".  And no way for the caller
    to tell one from another.

Testing this with the following:

 # cd /sys/kernel/tracing
 # echo '*raw*lock' > set_ftrace_filter
 bash: echo: write error: Invalid argument

With this patch:

 # echo '*raw*lock' > set_ftrace_filter
 # cat set_ftrace_filter
_raw_read_trylock
_raw_write_trylock
_raw_read_unlock
_raw_spin_unlock
_raw_write_unlock
_raw_spin_trylock
_raw_spin_lock
_raw_write_lock
_raw_read_lock

Al recommended not setting the search buffer to skip the first '*' unless we
know we are not using MATCH_GLOB. This implements his suggested logic.

Link: http://lkml.kernel.org/r/20180127170748.GF13338@ZenIV.linux.org.uk

Cc: stable@vger.kernel.org
Fixes: 60f1d5e3ba ("ftrace: Support full glob matching")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Suggsted-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-02-08 10:11:47 -05:00
Michal Hocko 0ee931c4e3 mm: treewide: remove GFP_TEMPORARY allocation flag
GFP_TEMPORARY was introduced by commit e12ba74d8f ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation.  As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.

The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory.  So
this is rather misleading and hard to evaluate for any benefits.

I have checked some random users and none of them has added the flag
with a specific justification.  I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring.  This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.

I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL.  Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.

I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.

This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
seems to be a heuristic without any measured advantage for most (if not
all) its current users.  The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers.  So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.

[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org

[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
  Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-13 18:53:16 -07:00
Steven Rostedt (VMware) 8b0db1a5bd tracing: Fix freeing of filter in create_filter() when set_str is false
Performing the following task with kmemleak enabled:

 # cd /sys/kernel/tracing/events/irq/irq_handler_entry/
 # echo 'enable_event:kmem:kmalloc:3 if irq >' > trigger
 # echo 'enable_event:kmem:kmalloc:3 if irq > 31' > trigger
 # echo scan > /sys/kernel/debug/kmemleak
 # cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8800b9290308 (size 32):
  comm "bash", pid 1114, jiffies 4294848451 (age 141.139s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81cef5aa>] kmemleak_alloc+0x4a/0xa0
    [<ffffffff81357938>] kmem_cache_alloc_trace+0x158/0x290
    [<ffffffff81261c09>] create_filter_start.constprop.28+0x99/0x940
    [<ffffffff812639c9>] create_filter+0xa9/0x160
    [<ffffffff81263bdc>] create_event_filter+0xc/0x10
    [<ffffffff812655e5>] set_trigger_filter+0xe5/0x210
    [<ffffffff812660c4>] event_enable_trigger_func+0x324/0x490
    [<ffffffff812652e2>] event_trigger_write+0x1a2/0x260
    [<ffffffff8138cf87>] __vfs_write+0xd7/0x380
    [<ffffffff8138f421>] vfs_write+0x101/0x260
    [<ffffffff8139187b>] SyS_write+0xab/0x130
    [<ffffffff81cfd501>] entry_SYSCALL_64_fastpath+0x1f/0xbe
    [<ffffffffffffffff>] 0xffffffffffffffff

The function create_filter() is passed a 'filterp' pointer that gets
allocated, and if "set_str" is true, it is up to the caller to free it, even
on error. The problem is that the pointer is not freed by create_filter()
when set_str is false. This is a bug, and it is not up to the caller to free
the filter on error if it doesn't care about the string.

Link: http://lkml.kernel.org/r/1502705898-27571-2-git-send-email-chuhu@redhat.com

Cc: stable@vger.kernel.org
Fixes: 38b78eb85 ("tracing: Factorize filter creation")
Reported-by: Chunyu Hu <chuhu@redhat.com>
Tested-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-24 10:07:38 -04:00
Steven Rostedt (Red Hat) 3f303fbccf tracing/filter: Define op as the enum that it is
The trace_events_file.c filter logic can be a bit complex. I copy this into
a userspace program where I can debug it a bit easier. One issue is the op
is defined in most places as an int instead of as an enum, and gdb just
gives the value when debugging. Having the actual op name shown in gdb is
more useful.

This has no functionality change, but helps in debugging when the file is
debugged in user space.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14 16:42:59 -05:00
Steven Rostedt (Red Hat) fdf5b67986 tracing: Optimise comparison filters and fix binary and for 64 bit
Currently the filter logic for comparisons (like greater-than and less-than)
are used, they share the same function and a switch statement is used to
jump to the comparison type to perform. This is done in the extreme hot path
of the tracing code, and it does not take much more space to create a
unique comparison function to perform each type of comparison and remove the
switch statement.

Also, a bug was found where the binary and operation for 64 bits could fail
if the resulting bits were greater than 32 bits, because the result was
passed into a 32 bit variable. This was fixed when adding the separate
binary and function.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14 16:42:58 -05:00
Masami Hiramatsu 60f1d5e3ba ftrace: Support full glob matching
Use glob_match() to support flexible glob wildcards (*,?)
and character classes ([) for ftrace.
Since the full glob matching is slower than the current
partial matching routines(*pat, pat*, *pat*), this leaves
those routines and just add MATCH_GLOB for complex glob
expression.

e.g.
----
[root@localhost tracing]# echo 'sched*group' > set_ftrace_filter
[root@localhost tracing]# cat set_ftrace_filter
sched_free_group
sched_change_group
sched_create_group
sched_online_group
sched_destroy_group
sched_offline_group
[root@localhost tracing]# echo '[Ss]y[Ss]_*' > set_ftrace_filter
[root@localhost tracing]# head set_ftrace_filter
sys_arch_prctl
sys_rt_sigreturn
sys_ioperm
SyS_iopl
sys_modify_ldt
SyS_mmap
SyS_set_thread_area
SyS_get_thread_area
SyS_set_tid_address
sys_fork
----

Link: http://lkml.kernel.org/r/147566869501.29136.6462645009894738056.stgit@devbox

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14 16:42:58 -05:00
Steven Rostedt (Red Hat) 0fc1b09ff1 tracing: Use temp buffer when filtering events
Filtering of events requires the data to be written to the ring buffer
before it can be decided to filter or not. This is because the parameters of
the filter are based on the result that is written to the ring buffer and
not on the parameters that are passed into the trace functions.

The ftrace ring buffer is optimized for writing into the ring buffer and
committing. The discard procedure used when filtering decides the event
should be discarded is much more heavy weight. Thus, using a temporary
filter when filtering events can speed things up drastically.

Without a temp buffer we have:

 # trace-cmd start -p nop
 # perf stat -r 10 hackbench 50
       0.790706626 seconds time elapsed ( +-  0.71% )

 # trace-cmd start -e all
 # perf stat -r 10 hackbench 50
       1.566904059 seconds time elapsed ( +-  0.27% )

 # trace-cmd start -e all -f 'common_preempt_count==20'
 # perf stat -r 10 hackbench 50
       1.690598511 seconds time elapsed ( +-  0.19% )

 # trace-cmd start -e all -f 'common_preempt_count!=20'
 # perf stat -r 10 hackbench 50
       1.707486364 seconds time elapsed ( +-  0.30% )

The first run above is without any tracing, just to get a based figure.
hackbench takes ~0.79 seconds to run on the system.

The second run enables tracing all events where nothing is filtered. This
increases the time by 100% and hackbench takes 1.57 seconds to run.

The third run filters all events where the preempt count will equal "20"
(this should never happen) thus all events are discarded. This takes 1.69
seconds to run. This is 10% slower than just committing the events!

The last run enables all events and filters where the filter will commit all
events, and this takes 1.70 seconds to run. The filtering overhead is
approximately 10%. Thus, the discard and commit of an event from the ring
buffer may be about the same time.

With this patch, the numbers change:

 # trace-cmd start -p nop
 # perf stat -r 10 hackbench 50
       0.778233033 seconds time elapsed ( +-  0.38% )

 # trace-cmd start -e all
 # perf stat -r 10 hackbench 50
       1.582102692 seconds time elapsed ( +-  0.28% )

 # trace-cmd start -e all -f 'common_preempt_count==20'
 # perf stat -r 10 hackbench 50
       1.309230710 seconds time elapsed ( +-  0.22% )

 # trace-cmd start -e all -f 'common_preempt_count!=20'
 # perf stat -r 10 hackbench 50
       1.786001924 seconds time elapsed ( +-  0.20% )

The first run is again the base with no tracing.

The second run is all tracing with no filtering. It is a little slower, but
that may be well within the noise.

The third run shows that discarding all events only took 1.3 seconds. This
is a speed up of 23%! The discard is much faster than even the commit.

The one downside is shown in the last run. Events that are not discarded by
the filter will take longer to add, this is due to the extra copy of the
event.

Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-03 17:59:24 -04:00
Steven Rostedt (Red Hat) dcb0b5575d tracing: Remove TRACE_EVENT_FL_USE_CALL_FILTER logic
Nothing sets TRACE_EVENT_FL_USE_CALL_FILTER anymore. Remove it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-02 21:30:04 -04:00
Tom Zanussi 4ef56902fb tracing: Make ftrace_event_field checking functions available
Make is_string_field() and is_function_field() accessible outside of
trace_event_filters.c for other users of ftrace_event_fields.

Link: http://lkml.kernel.org/r/2d3f00d3311702e556e82eed7754bae6f017939f.1449767187.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-03-08 11:19:29 -05:00
Steven Rostedt (Red Hat) e57cbaf0eb tracing: Do not have 'comm' filter override event 'comm' field
Commit 9f61668073 "tracing: Allow triggers to filter for CPU ids and
process names" added a 'comm' filter that will filter events based on the
current tasks struct 'comm'. But this now hides the ability to filter events
that have a 'comm' field too. For example, sched_migrate_task trace event.
That has a 'comm' field of the task to be migrated.

 echo 'comm == "bash"' > events/sched_migrate_task/filter

will now filter all sched_migrate_task events for tasks named "bash" that
migrates other tasks (in interrupt context), instead of seeing when "bash"
itself gets migrated.

This fix requires a couple of changes.

1) Change the look up order for filter predicates to look at the events
   fields before looking at the generic filters.

2) Instead of basing the filter function off of the "comm" name, have the
   generic "comm" filter have its own filter_type (FILTER_COMM). Test
   against the type instead of the name to assign the filter function.

3) Add a new "COMM" filter that works just like "comm" but will filter based
   on the current task, even if the trace event contains a "comm" field.

Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU".

Cc: stable@vger.kernel.org # v4.3+
Fixes: 9f61668073 "tracing: Allow triggers to filter for CPU ids and process names"
Reported-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-03-04 09:57:10 -05:00
Yaowei Bai 907bff917a tracing: is_legal_op() can return boolean
Make is_legal_op() return bool to improve readability due to this particular
function only using either one or zero as its return value.

No functional change.

Link: http://lkml.kernel.org/r/1443537816-5788-8-git-send-email-bywxiaobai@163.com

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02 14:26:51 -05:00
Daniel Wagner 9f61668073 tracing: Allow triggers to filter for CPU ids and process names
By extending the filter rules by more generic fields
we can write triggers filters like

  echo 'stacktrace if cpu == 1' > \
	/sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger

or

  echo 'stacktrace if comm == sshd'  > \
	/sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger

CPU and COMM are not part of struct trace_entry. We could add the two
new fields to ftrace_common_field list and fix up all depending
sides. But that looks pretty ugly. Another thing I would like to
avoid that the 'format' file contents changes.

All this can be avoided by introducing another list which contains
non field members of struct trace_entry.

Link: http://lkml.kernel.org/r/1439210146-24707-1-git-send-email-daniel.wagner@bmw-carit.de

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-08-11 18:01:06 -04:00
Linus Torvalds e382608254 This patch series contains several clean ups and even a new trace clock
"monitonic raw". Also some enhancements to make the ring buffer even
 faster. But the biggest and most noticeable change is the renaming of
 the ftrace* files, structures and variables that have to deal with
 trace events.
 
 Over the years I've had several developers tell me about their confusion
 with what ftrace is compared to events. Technically, "ftrace" is the
 infrastructure to do the function hooks, which include tracing and also
 helps with live kernel patching. But the trace events are a separate
 entity altogether, and the files that affect the trace events should
 not be named "ftrace". These include:
 
   include/trace/ftrace.h	->	include/trace/trace_events.h
   include/linux/ftrace_event.h	->	include/linux/trace_events.h
 
 Also, functions that are specific for trace events have also been renamed:
 
   ftrace_print_*()		->	trace_print_*()
   (un)register_ftrace_event()	->	(un)register_trace_event()
   ftrace_event_name()		->	trace_event_name()
   ftrace_trigger_soft_disabled()->	trace_trigger_soft_disabled()
   ftrace_define_fields_##call() ->	trace_define_fields_##call()
   ftrace_get_offsets_##call()	->	trace_get_offsets_##call()
 
 Structures have been renamed:
 
   ftrace_event_file		->	trace_event_file
   ftrace_event_{call,class}	->	trace_event_{call,class}
   ftrace_event_buffer		->	trace_event_buffer
   ftrace_subsystem_dir		->	trace_subsystem_dir
   ftrace_event_raw_##call	->	trace_event_raw_##call
   ftrace_event_data_offset_##call->	trace_event_data_offset_##call
   ftrace_event_type_funcs_##call ->	trace_event_type_funcs_##call
 
 And a few various variables and flags have also been updated.
 
 This has been sitting in linux-next for some time, and I have not heard
 a single complaint about this rename breaking anything. Mostly because
 these functions, variables and structures are mostly internal to the
 tracing system and are seldom (if ever) used by anything external to that.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJViYhVAAoJEEjnJuOKh9ldcJ0IAI+mytwoMAN/CWDE8pXrTrgs
 aHlcr1zorSzZ0Lq6lKsWP+V0VGVhP8KWO16vl35HaM5ZB9U+cDzWiGobI8JTHi/3
 eeTAPTjQdgrr/L+ZO1ApzS1jYPhN3Xi5L7xublcYMJjKfzU+bcYXg/x8gRt0QbG3
 S9QN/kBt0JIIjT7McN64m5JVk2OiU36LxXxwHgCqJvVCPHUrriAdIX7Z5KRpEv13
 zxgCN4d7Jiec/FsMW8dkO0vRlVAvudZWLL7oDmdsvNhnLy8nE79UOeHos2c1qifQ
 LV4DeQ+2Hlu7w9wxixHuoOgNXDUEiQPJXzPc/CuCahiTL9N/urQSGQDoOVMltR4=
 =hkdz
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "This patch series contains several clean ups and even a new trace
  clock "monitonic raw".  Also some enhancements to make the ring buffer
  even faster.  But the biggest and most noticeable change is the
  renaming of the ftrace* files, structures and variables that have to
  deal with trace events.

  Over the years I've had several developers tell me about their
  confusion with what ftrace is compared to events.  Technically,
  "ftrace" is the infrastructure to do the function hooks, which include
  tracing and also helps with live kernel patching.  But the trace
  events are a separate entity altogether, and the files that affect the
  trace events should not be named "ftrace".  These include:

    include/trace/ftrace.h         ->    include/trace/trace_events.h
    include/linux/ftrace_event.h   ->    include/linux/trace_events.h

  Also, functions that are specific for trace events have also been renamed:

    ftrace_print_*()               ->    trace_print_*()
    (un)register_ftrace_event()    ->    (un)register_trace_event()
    ftrace_event_name()            ->    trace_event_name()
    ftrace_trigger_soft_disabled() ->    trace_trigger_soft_disabled()
    ftrace_define_fields_##call()  ->    trace_define_fields_##call()
    ftrace_get_offsets_##call()    ->    trace_get_offsets_##call()

  Structures have been renamed:

    ftrace_event_file              ->    trace_event_file
    ftrace_event_{call,class}      ->    trace_event_{call,class}
    ftrace_event_buffer            ->    trace_event_buffer
    ftrace_subsystem_dir           ->    trace_subsystem_dir
    ftrace_event_raw_##call        ->    trace_event_raw_##call
    ftrace_event_data_offset_##call->    trace_event_data_offset_##call
    ftrace_event_type_funcs_##call ->    trace_event_type_funcs_##call

  And a few various variables and flags have also been updated.

  This has been sitting in linux-next for some time, and I have not
  heard a single complaint about this rename breaking anything.  Mostly
  because these functions, variables and structures are mostly internal
  to the tracing system and are seldom (if ever) used by anything
  external to that"

* tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
  ring_buffer: Allow to exit the ring buffer benchmark immediately
  ring-buffer-benchmark: Fix the wrong type
  ring-buffer-benchmark: Fix the wrong param in module_param
  ring-buffer: Add enum names for the context levels
  ring-buffer: Remove useless unused tracing_off_permanent()
  ring-buffer: Give NMIs a chance to lock the reader_lock
  ring-buffer: Add trace_recursive checks to ring_buffer_write()
  ring-buffer: Allways do the trace_recursive checks
  ring-buffer: Move recursive check to per_cpu descriptor
  ring-buffer: Add unlikelys to make fast path the default
  tracing: Rename ftrace_get_offsets_##call() to trace_event_get_offsets_##call()
  tracing: Rename ftrace_define_fields_##call() to trace_event_define_fields_##call()
  tracing: Rename ftrace_event_type_funcs_##call to trace_event_type_funcs_##call
  tracing: Rename ftrace_data_offset_##call to trace_event_data_offset_##call
  tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call
  tracing: Rename ftrace_trigger_soft_disabled() to trace_trigger_soft_disabled()
  tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_*
  tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir
  tracing: Rename ftrace_event_name() to trace_event_name()
  tracing: Rename FTRACE_MAX_EVENT to TRACE_EVENT_TYPE_MAX
  ...
2015-06-26 14:02:43 -07:00
Linus Torvalds fcbc1777ce After fixing the previous filter issue reported by Vince Weaver,
I could not come up with a situation where the operand counter (cnt)
 could go below zero, so I added a WARN_ON_ONCE(cnt < 0). Vince was
 able to trigger that warn on with his fuzzer test, but didn't have
 a filter input that caused it.
 
 Later, Sasha Levin was able to trigger that same warning, and was
 able to give me the filter string that triggered it. It was simply
 a single operation ">".
 
 I wrapped the filtering code in a userspace program such that I could
 single step through the logic. With a single operator the operand
 counter can legitimately go below zero, and should be reported to the
 user as an error, but should not produce a kernel warning. The
 WARN_ON_ONCE(cnt < 0) should be just a "if (cnt < 0) break;" and the
 code following it will produce the error message for the user.
 
 While debugging this, I found that there was another bug that let
 the pointer to the filter string go beyond the filter string.
 This too was fixed.
 
 Finally, there was a typo in a stub function that only gets compiled
 if trace events is disabled but tracing is enabled (I'm not even sure
 that's possible).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVjWh2AAoJEEjnJuOKh9ldOn0IANHPW82++0O87U1pEe3hHnKv
 gSTKiNPVNC3GBt9DHnawA0EuyPfPa+Wj5X2xgrstWA+KRADZErZzdWpzbh/iHosJ
 0kaUFqFcaKBheOSqhHfz3WQshD16pb1lQYbV7vbdzMjpcIpYT3VcuKQq3zQVb5Pr
 njPmgZXK+I9ITYQ8E+DysnTg0+Mo+l/2P/tqnBoIkAVmuZitfJS5okTtVw1GNzyR
 7VRMGBE3G0GxB++57T/xILXjFc9sSGQH5lZgLHQhEh36YgWuDvc0R2FfxDKROmeq
 b/xw68uCO1Hv8oEng6r/UceVtUoaXhf+JamSJqxztBTsjsqR/iXCV78Jac1vnPY=
 =cmr8
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "This isn't my 4.2 pull request (yet).  I found a few more bugs that I
  would have sent to fix 4.1, but since 4.1 is already out, I'm sending
  this before sending my 4.2 request (which is ready to go).

  After fixing the previous filter issue reported by Vince Weaver, I
  could not come up with a situation where the operand counter (cnt)
  could go below zero, so I added a WARN_ON_ONCE(cnt < 0).  Vince was
  able to trigger that warn on with his fuzzer test, but didn't have a
  filter input that caused it.

  Later, Sasha Levin was able to trigger that same warning, and was able
  to give me the filter string that triggered it.  It was simply a
  single operation ">".

  I wrapped the filtering code in a userspace program such that I could
  single step through the logic.  With a single operator the operand
  counter can legitimately go below zero, and should be reported to the
  user as an error, but should not produce a kernel warning.  The
  WARN_ON_ONCE(cnt < 0) should be just a "if (cnt < 0) break;" and the
  code following it will produce the error message for the user.

  While debugging this, I found that there was another bug that let the
  pointer to the filter string go beyond the filter string.  This too
  was fixed.

  Finally, there was a typo in a stub function that only gets compiled
  if trace events is disabled but tracing is enabled (I'm not even sure
  that's possible)"

* tag 'trace-fixes-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix typo from "static inlin" to "static inline"
  tracing/filter: Do not allow infix to exceed end of string
  tracing/filter: Do not WARN on operand count going below zero
2015-06-26 13:56:55 -07:00
Rasmus Villemoes 1bb564718f kernel/trace/trace_events_filter.c: use strreplace()
There's no point in starting over every time we see a ','...

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:40 -07:00
Steven Rostedt (Red Hat) 6b88f44e16 tracing/filter: Do not allow infix to exceed end of string
While debugging a WARN_ON() for filtering, I found that it is possible
for the filter string to be referenced after its end. With the filter:

 # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter

The filter_parse() function can call infix_get_op() which calls
infix_advance() that updates the infix filter pointers for the cnt
and tail without checking if the filter is already at the end, which
will put the cnt to zero and the tail beyond the end. The loop then calls
infix_next() that has

	ps->infix.cnt--;
	return ps->infix.string[ps->infix.tail++];

The cnt will now be below zero, and the tail that is returned is
already passed the end of the filter string. So far the allocation
of the filter string usually has some buffer that is zeroed out, but
if the filter string is of the exact size of the allocated buffer
there's no guarantee that the charater after the nul terminating
character will be zero.

Luckily, only root can write to the filter.

Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-25 18:18:17 -04:00
Steven Rostedt (Red Hat) b4875bbe7e tracing/filter: Do not WARN on operand count going below zero
When testing the fix for the trace filter, I could not come up with
a scenario where the operand count goes below zero, so I added a
WARN_ON_ONCE(cnt < 0) to the logic. But there is legitimate case
that it can happen (although the filter would be wrong).

 # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter

That is, a single operation without any operands will hit the path
where the WARN_ON_ONCE() can trigger. Although this is harmless,
and the filter is reported as a error. But instead of spitting out
a warning to the kernel dmesg, just fail nicely and report it via
the proper channels.

Link: http://lkml.kernel.org/r/558C6082.90608@oracle.com

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-25 18:02:29 -04:00
Steven Rostedt 2cf30dc180 tracing: Have filter check for balanced ops
When the following filter is used it causes a warning to trigger:

 # cd /sys/kernel/debug/tracing
 # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
-bash: echo: write error: Invalid argument
 # cat events/ext4/ext4_truncate_exit/filter
((dev==1)blocks==2)
^
parse_error: No error

 ------------[ cut here ]------------
 WARNING: CPU: 2 PID: 1223 at kernel/trace/trace_events_filter.c:1640 replace_preds+0x3c5/0x990()
 Modules linked in: bnep lockd grace bluetooth  ...
 CPU: 3 PID: 1223 Comm: bash Tainted: G        W       4.1.0-rc3-test+ #450
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
  0000000000000668 ffff8800c106bc98 ffffffff816ed4f9 ffff88011ead0cf0
  0000000000000000 ffff8800c106bcd8 ffffffff8107fb07 ffffffff8136b46c
  ffff8800c7d81d48 ffff8800d4c2bc00 ffff8800d4d4f920 00000000ffffffea
 Call Trace:
  [<ffffffff816ed4f9>] dump_stack+0x4c/0x6e
  [<ffffffff8107fb07>] warn_slowpath_common+0x97/0xe0
  [<ffffffff8136b46c>] ? _kstrtoull+0x2c/0x80
  [<ffffffff8107fb6a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff81159065>] replace_preds+0x3c5/0x990
  [<ffffffff811596b2>] create_filter+0x82/0xb0
  [<ffffffff81159944>] apply_event_filter+0xd4/0x180
  [<ffffffff81152bbf>] event_filter_write+0x8f/0x120
  [<ffffffff811db2a8>] __vfs_write+0x28/0xe0
  [<ffffffff811dda43>] ? __sb_start_write+0x53/0xf0
  [<ffffffff812e51e0>] ? security_file_permission+0x30/0xc0
  [<ffffffff811dc408>] vfs_write+0xb8/0x1b0
  [<ffffffff811dc72f>] SyS_write+0x4f/0xb0
  [<ffffffff816f5217>] system_call_fastpath+0x12/0x6a
 ---[ end trace e11028bd95818dcd ]---

Worse yet, reading the error message (the filter again) it says that
there was no error, when there clearly was. The issue is that the
code that checks the input does not check for balanced ops. That is,
having an op between a closed parenthesis and the next token.

This would only cause a warning, and fail out before doing any real
harm, but it should still not caues a warning, and the error reported
should work:

 # cd /sys/kernel/debug/tracing
 # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
-bash: echo: write error: Invalid argument
 # cat events/ext4/ext4_truncate_exit/filter
((dev==1)blocks==2)
^
parse_error: Meaningless filter expression

And give no kernel warning.

Link: http://lkml.kernel.org/r/20150615175025.7e809215@gandalf.local.home

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: stable@vger.kernel.org # 2.6.31+
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-17 07:13:30 -04:00
Steven Rostedt (Red Hat) a723776573 tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The ftrace_raw_##call structures are built
by macros for trace events. They have nothing to do with function tracing.
Rename them.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13 21:48:40 -04:00
Steven Rostedt (Red Hat) 5d6ad960a7 tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_*
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The FTRACE_EVENT_FL_* flags are flags to
do with the trace_event files in the tracefs directory. They are not related
to function tracing. Rename them to a more descriptive name.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13 15:24:57 -04:00
Steven Rostedt (Red Hat) 7967b3e0c4 tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structure ftrace_subsystem_dir holds
the information about trace event subsystems. It should not be named
ftrace, rename it to trace_subsystem_dir.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13 14:59:40 -04:00
Steven Rostedt (Red Hat) 2425bcb924 tracing: Rename ftrace_event_{call,class} to trace_event_{call,class}
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structures ftrace_event_call and
ftrace_event_class have nothing to do with the function hooks, and are
really trace_event structures. Rename ftrace_event_* to trace_event_*.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13 14:06:10 -04:00
Steven Rostedt (Red Hat) 7f1d2f8210 tracing: Rename ftrace_event_file to trace_event_file
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structure ftrace_event_file is really
about trace events and not "ftrace". Rename it to trace_event_file.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13 14:05:16 -04:00
Steven Rostedt (Red Hat) eabb8980a9 tracing: Allow NOT to filter AND and OR clauses
Add support to allow not "!" for and (&&) and (||). That is:

 !(field1 == X && field2 == Y)

Where the value of the full clause will be notted.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-03 10:00:27 -05:00
Steven Rostedt (Red Hat) e12c09cf30 tracing: Add NOT to filtering logic
Ted noticed that he could not filter on an event for a bit being cleared.
That's because the filtering logic only tests event fields with a limited
number of comparisons which, for bit logic, only include "&", which can
test if a bit is set, but there's no good way to see if a bit is clear.

This adds a way to do: !(field & 2048)

Which returns true if the bit is not set, and false otherwise.

Note, currently !(field1 == 10 && field2 == 15) is not supported.
That is, the 'not' only works for direct comparisons, not for the
AND and OR logic.

Link: http://lkml.kernel.org/r/20141202021912.GA29096@thunk.org
Link: http://lkml.kernel.org/r/20141202120430.71979060@gandalf.local.home

Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Suggested-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-03 10:00:13 -05:00
Oleg Nesterov 6355d54438 tracing: Kill "filter_string" arg of replace_preds()
Cosmetic, but replace_preds() doesn't need/use "char *filter_string".
Remove it to microsimplify the code.

Link: http://lkml.kernel.org/p/20140715184832.GA20519@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16 14:58:53 -04:00
Oleg Nesterov bb9ef1cb7d tracing: Change apply_subsystem_event_filter() paths to check file->system == dir
filter_free_subsystem_preds(), filter_free_subsystem_filters() and
replace_system_preds() can simply check file->system->subsystem and
avoid strcmp(call->class->system).

Better yet, we can pass "struct ftrace_subsystem_dir *dir" instead of
event_subsystem and just check file->system == dir.

Thanks to Namhyung Kim who pointed out that replace_system_preds() can
be changed too.

Link: http://lkml.kernel.org/p/20140715184829.GA20516@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16 14:33:35 -04:00
Oleg Nesterov b5d09db5ac tracing: Kill call_filter_disable()
It seems that the only purpose of call_filter_disable() is to
make filter_disable() less clear and symmetrical, remove it.

Link: http://lkml.kernel.org/p/20140715184821.GA20498@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16 14:23:12 -04:00
Oleg Nesterov 57375747b6 tracing: Kill destroy_call_preds()
Remove destroy_call_preds(). Its only caller, __trace_remove_event_call(),
can use free_event_filter() and nullify ->filter by hand.

Perhaps we could keep this trivial helper although imo it is pointless, but
then it should be static in trace_events.c.

Link: http://lkml.kernel.org/p/20140715184816.GA20495@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16 14:22:32 -04:00
Oleg Nesterov 3e5454d656 tracing: Kill destroy_preds() and destroy_file_preds()
destroy_preds() makes no sense.

The only caller, event_remove(), actually wants destroy_file_preds().
__trace_remove_event_call() does destroy_call_preds() which takes care
of call->filter.

And after the previous change we can simply remove destroy_preds() from
event_remove(), we are going to call remove_event_from_tracers() which
in turn calls remove_event_file_dir()->free_event_filter().

Link: http://lkml.kernel.org/p/20140715184813.GA20488@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16 14:19:55 -04:00