Commit graph

379 commits

Author SHA1 Message Date
Paul E. McKenney 1dfa55e019 Merge branches 'cond_resched.2017.12.04a', 'dyntick.2017.11.28a', 'fixes.2017.12.11a', 'srbd.2017.12.05a' and 'torture.2017.12.11a' into HEAD
cond_resched.2017.12.04a: Convert cond_resched_rcu_qs() to cond_resched()
dyntick.2017.11.28a: Make RCU dynticks handle interrupts from NMI
fixes.2017.12.11a: Miscellaneous fixes
srbd.2017.12.05a: Remove now-redundant smp_read_barrier_depends()
torture.2017.12.11a: Torture-testing update
2017-12-11 09:21:58 -08:00
Paul E. McKenney efd88b02bb rcu: Add comment giving debug strategy for double call_rcu()
The following statement has for some reason proven non-intuitive:

	WARN_ON_ONCE(rcu_segcblist_empty(&rdp->cblist) != (count == 0));

This commit therefore adds a comment that states that this warning
usually triggers in response to a double call_rcu(), which is sort
of like a double free.  The comment also suggests building with
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y to track down the double call_rcu().

Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-12-11 09:17:39 -08:00
Paul E. McKenney e68bbb266d rcu: Simplify rcu_eqs_{enter,exit}() non-idle task debug code
The code that checks for non-idle non-nohz_idle-usermode tasks invoking
rcu_eqs_enter() and rcu_eqs_exit() prints a considerable quantity of
helpful information.  However, these checks fire rarely, so the extra
complexity is no longer worth it.  This commit therefore replaces this
debug code with simple WARN_ON_ONCE() statements.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:21 -08:00
Paul E. McKenney 9dd238e286 rcu: Fold rcu_eqs_exit_common() into rcu_eqs_exit()
There is now only one call to rcu_eqs_exit_common() and there is no other
reason to keep it separate.  This commit therefore inlines it into its
sole call site, saving a few lines of code in the process.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:21 -08:00
Paul E. McKenney 215bba9f59 rcu: Fold rcu_eqs_enter_common() into rcu_eqs_enter()
There is now only one call to rcu_eqs_enter_common() and there is no other
reason to keep it separate.  This commit therefore inlines it into its
sole call site, saving a few lines of code in the process.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:20 -08:00
Paul E. McKenney 2342172fd6 rcu: Avoid ->dynticks_nesting store tearing
Although ->dynticks_nesting is updated only by process level, it is
accessed from hardirq to check for interrupt-from-idle quiescent states.
Store tearing is thus possible, so this commit applies WRITE_ONCE()
to ->dynticks_nesting stores.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:20 -08:00
Paul E. McKenney 914955e18c rcu: Stop duplicating lockdep checks in RCU's idle-entry code
The three RCU_LOCKDEP_WARN() calls in rcu_eqs_enter_common() are
redundant with other lockdep checks, so this commit removes them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:19 -08:00
Paul E. McKenney dec98900ea rcu: Add ->dynticks field to rcu_dyntick trace event
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:19 -08:00
Paul E. McKenney 84585aa8b6 rcu: Shrink ->dynticks_{nmi_,}nesting from long long to long
Because the ->dynticks_nesting field now only contains the process-based
nesting level instead of a value encoding both the process nesting level
and the irq "nesting" level, we no longer need a long long, even on
32-bit systems.  This commit therefore changes both the ->dynticks_nesting
and ->dynticks_nmi_nesting fields to long.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:18 -08:00
Paul E. McKenney bd2b879a1c rcu: Add tracing to irq/NMI dyntick-idle transitions
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:18 -08:00
Paul E. McKenney 844ccdd7dc rcu: Eliminate rcu_irq_enter_disabled()
Now that the irq path uses the rcu_nmi_{enter,exit}() algorithm,
rcu_irq_enter() and rcu_irq_exit() may be used from any context.  There is
thus no need for rcu_irq_enter_disabled() and for the checks using it.
This commit therefore eliminates rcu_irq_enter_disabled().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:42:03 -08:00
Paul E. McKenney 51a1fd30f1 rcu: Make ->dynticks_nesting be a simple counter
Now that ->dynticks_nesting counts only process-level dyntick-idle
entry and exit, there is no need for the elaborate segmented counter
with its guard fields and overflow checking.  This commit therefore
makes ->dynticks_nesting be a simple counter.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:42:03 -08:00
Paul E. McKenney 58721f5da4 rcu: Define rcu_irq_{enter,exit}() in terms of rcu_nmi_{enter,exit}()
RCU currently uses two different mechanisms for tracking irqs and NMIs.
This is unnecessary complexity: Given that NMIs can nest and given that
RCU's tracking handles such nesting, the NMI tracking mechanism can also
be used to track irqs.  This commit therefore defines rcu_irq_enter()
in terms of rcu_nmi_enter() and rcu_irq_exit() in terms of rcu_nmi_exit().

Unfortunately, callers must still distinguish between the irq and NMI
functions because additional actions are taken when an irq interrupts
idle or nohz_full usermode execution, and these actions cannot always
be taken from NMI handlers.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:42:03 -08:00
Paul E. McKenney 6136d6e48a rcu: Clamp ->dynticks_nmi_nesting at eqs entry/exit
In preparation for merging dyntick-idle irq handling into the NMI
algorithm, clamp ->dynticks_nmi_nesting value to allow for interrupts
that enter but never leave and vice versa.

It is important that the clamping happen outside of the extended quiescent
state.  Otherwise, there will be short windows where irqs and NMIs fail
to convince RCU to start watching.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:40:10 -08:00
Paul E. McKenney fd581a91ac rcu: Move rcu_nmi_{enter,exit}() to prepare for consolidation
This is a code-motion-only commit that prepares to define rcu_irq_enter()
in terms of rcu_nmi_enter() and rcu_irq_exit() in terms of rcu_irq_exit().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:40:10 -08:00
Paul E. McKenney a0eb22bf64 rcu: Reduce dyntick-idle state space
Both extended-quiescent-state entry and exit first update the nesting
counter and then adjust the dyntick-idle state.  This means that there
are four states: (1) Both nesting and dyntick idle indicate idle,
(2) Nesting indicates idle but dyntick idle does not, (3) Nesting indicates
non-idle and dyntick idle does not, and (4) Both nesting and dyntick
idle indicate non-idle.  This commit simplifies the state space by
eliminating #3, reversing the order of updates on exit from extended
quiescent state.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:40:10 -08:00
Paul E. McKenney 2e672ab2d4 rcu: Avoid ->dynticks_nmi_nesting store tearing
NMIs can nest, and store tearing could in theory happen on carries
from one byte to the next.  This commit therefore adds the WRITE_ONCE()
macros preventing this.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:40:09 -08:00
Linus Torvalds 8e9a2dba86 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Another attempt at enabling cross-release lockdep dependency
     tracking (automatically part of CONFIG_PROVE_LOCKING=y), this time
     with better performance and fewer false positives. (Byungchul Park)

   - Introduce lockdep_assert_irqs_enabled()/disabled() and convert
     open-coded equivalents to lockdep variants. (Frederic Weisbecker)

   - Add down_read_killable() and use it in the VFS's iterate_dir()
     method. (Kirill Tkhai)

   - Convert remaining uses of ACCESS_ONCE() to
     READ_ONCE()/WRITE_ONCE(). Most of the conversion was Coccinelle
     driven. (Mark Rutland, Paul E. McKenney)

   - Get rid of lockless_dereference(), by strengthening Alpha atomics,
     strengthening READ_ONCE() with smp_read_barrier_depends() and thus
     being able to convert users of lockless_dereference() to
     READ_ONCE(). (Will Deacon)

   - Various micro-optimizations:

        - better PV qspinlocks (Waiman Long),
        - better x86 barriers (Michael S. Tsirkin)
        - better x86 refcounts (Kees Cook)

   - ... plus other fixes and enhancements. (Borislav Petkov, Juergen
     Gross, Miguel Bernal Marin)"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
  locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE
  rcu: Use lockdep to assert IRQs are disabled/enabled
  netpoll: Use lockdep to assert IRQs are disabled/enabled
  timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled
  sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabled
  irq_work: Use lockdep to assert IRQs are disabled/enabled
  irq/timings: Use lockdep to assert IRQs are disabled/enabled
  perf/core: Use lockdep to assert IRQs are disabled/enabled
  x86: Use lockdep to assert IRQs are disabled/enabled
  smp/core: Use lockdep to assert IRQs are disabled/enabled
  timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled
  timers/nohz: Use lockdep to assert IRQs are disabled/enabled
  workqueue: Use lockdep to assert IRQs are disabled/enabled
  irq/softirqs: Use lockdep to assert IRQs are disabled/enabled
  locking/lockdep: Add IRQs disabled/enabled assertion APIs: lockdep_assert_irqs_enabled()/disabled()
  locking/pvqspinlock: Implement hybrid PV queued/unfair locks
  locking/rwlocks: Fix comments
  x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized
  block, locking/lockdep: Assign a lock_class per gendisk used for wait_for_completion()
  workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes
  ...
2017-11-13 12:38:26 -08:00
Frederic Weisbecker b04db8e19f rcu: Use lockdep to assert IRQs are disabled/enabled
Lockdep now has an integrated IRQs disabled/enabled sanity check. Just
use it instead of the ad-hoc RCU version.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1509980490-4285-15-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08 11:13:55 +01:00
Ingo Molnar 72bc286b81 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

 - Documentation updates
 - Miscellaneous fixes
 - RCU CPU stall-warning updates
 - Torture-test updates

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-24 10:49:44 +02:00
Paul E. McKenney ad4e25a3a1 Merge branches 'doc.2017.10.20a', 'fixes.2017.10.19a', 'stall.2017.10.09a' and 'torture.2017.10.09a' into HEAD
doc.2017.10.20a: Documentation updates.
fixes.2017.10.19a: Miscellaneous fixes.
stall.2017.10.09a: RCU CPU stall-warning updates.
torture.2017.10.09a: Torture-test updates.
2017-10-20 11:11:15 -07:00
Paul E. McKenney 27fdb35fe9 doc: Fix various RCU docbook comment-header problems
Because many of RCU's files have not been included into docbook, a
number of errors have accumulated.  This commit fixes them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-19 22:26:11 -04:00
Paul E. McKenney c0da313e09 rcu: Add extended-quiescent-state testing advice
If you add or remove calls to rcu_idle_enter(), rcu_user_enter(),
rcu_irq_exit(), rcu_irq_exit_irqson(), rcu_idle_exit(), rcu_user_exit(),
rcu_irq_enter(), rcu_irq_enter_irqson(), rcu_nmi_enter(), or
rcu_nmi_exit(), you should run a full set of tests on a kernel built
with CONFIG_RCU_EQS_DEBUG=y.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-19 12:13:36 -07:00
Paul E. McKenney 9b9500da81 rcu: Make RCU CPU stall warnings check for irq-disabled CPUs
One common question upon seeing an RCU CPU stall warning is "did
the stalled CPUs have interrupts disabled?"  However, the current
stall warnings are silent on this point.  This commit therefore
uses irq_work to check whether stalled CPUs still respond to IPIs,
and flags this state in the RCU CPU stall warning console messages.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-09 14:25:17 -07:00
Paul E. McKenney f79c3ad618 sched,rcu: Make cond_resched() provide RCU quiescent state
There is some confusion as to which of cond_resched() or
cond_resched_rcu_qs() should be added to long in-kernel loops.
This commit therefore eliminates the decision by adding RCU quiescent
states to cond_resched().  This commit also simplifies the code that
used to interact with cond_resched_rcu_qs(), and that now interacts with
cond_resched(), to reduce its overhead.  This reduction is necessary to
allow the heavier-weight cond_resched_rcu_qs() mechanism to be invoked
everywhere that cond_resched() is invoked.

Part of that reduction in overhead converts the jiffies_till_sched_qs
kernel parameter to read-only at runtime, thus eliminating the need for
bounds checking.

Reported-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
[ paulmck: Keep PREEMPT=n cond_resched a no-op, per Peter Zijlstra. ]
2017-10-09 14:25:17 -07:00
Linus Torvalds 013a8ee628 Two updates.
- A memory fix with left over code from spliting out ftrace_ops
    and function graph tracer, where the function graph tracer could
    reset the trampoline pointer, leaving the old trampoline not to
    be freed (memory leak).
 
  - The update to Paul's patch that added the unnecessary READ_ONCE().
    This removes the unnecessary READ_ONCE() instead of having to rebase
    the branch to update the patch that added it.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEQEw9Eu0DdyUUkuUUybkF8mrZjcsFAlnU++sUHHJvc3RlZHRA
 Z29vZG1pcy5vcmcACgkQybkF8mrZjcujzgf/ebIzGKe5vQKNrL4ITAcIz0T7Hvzl
 pWw4uJp8kqO9x9EHMnztAkltQigvjvgDKZozJpUGgtNsFLuvdgQSBMK24YV8vLHs
 UmXEnQ2tSB/2Sg2ccEnpjVXaMzL9aqlbeTmACbdd9UgZnvPiUYPejq2jFfECFQjb
 k/gZT911ukBtx4mXYKzGFbTEZHdc/YUs6Y/wzB1ox5BBIUh71ZDZXxQTUHfXHlwS
 Cst69/9dKl4nBEGDGas6/95iR+ORVv85osI/pqPtjSj4EkRnWfVRotaH1kNuSQil
 gDIHSoy35NfXJx77/5IFHfrjFBAkr0IYRNL/jZaWazwM7rdqfAN8TwMQuA==
 =4CtF
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.14-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixlets from Steven Rostedt:
 "Two updates:

   - A memory fix with left over code from spliting out ftrace_ops and
     function graph tracer, where the function graph tracer could reset
     the trampoline pointer, leaving the old trampoline not to be freed
     (memory leak).

   - The update to Paul's patch that added the unnecessary READ_ONCE().
     This removes the unnecessary READ_ONCE() instead of having to
     rebase the branch to update the patch that added it"

* tag 'trace-v4.14-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  rcu: Remove extraneous READ_ONCE()s from rcu_irq_{enter,exit}()
  ftrace: Fix kmemleak in unregister_ftrace_graph
2017-10-04 08:34:01 -07:00
Paul E. McKenney f39b536ce9 rcu: Remove extraneous READ_ONCE()s from rcu_irq_{enter,exit}()
The read of ->dynticks_nmi_nesting in rcu_irq_enter() and rcu_irq_exit()
is currently protected with READ_ONCE().  However, this protection is
unnecessary because (1) ->dynticks_nmi_nesting is updated only by the
current CPU, (2) Although NMI handlers can update this field, they reset
it back to its old value before return, and (3) Interrupts are disabled,
so nothing else can modify it.  The value of ->dynticks_nmi_nesting is
thus effectively constant, and so no protection is required.

This commit therefore removes the READ_ONCE() protection from these
two accesses.

Link: http://lkml.kernel.org/r/20170926031902.GA2074@linux.vnet.ibm.com

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-10-03 10:27:32 -04:00
Linus Torvalds ac0a36461f Stack tracing and RCU has been having issues with each other and lockdep
has been pointing out constant problems. The changes have been going into
 the stack tracer, but it has been discovered that the problem isn't
 with the stack tracer itself, but it is with calling save_stack_trace()
 from within the internals of RCU. The stack tracer is the one that
 can trigger the issue the easiest, but examining the problem further,
 it could also happen from a WARN() in the wrong place, or even if
 an NMI happened in this area and it did an rcu_read_lock().
 
 The critical area is where RCU is not watching. Which can happen while
 going to and from idle, or bringing up or taking down a CPU.
 
 The final fix was to put the protection in kernel_text_address() as it
 is the one that requires RCU to be watching while doing the stack trace.
 
 To make this work properly, Paul had to allow rcu_irq_enter() happen after
 rcu_nmi_enter(). This should have been done anyway, since an NMI can
 page fault (reading vmalloc area), and a page fault triggers rcu_irq_enter().
 
 One patch is just a consolidation of code so that the fix only needed
 to be done in one location.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEQEw9Eu0DdyUUkuUUybkF8mrZjcsFAlnGyXoUHHJvc3RlZHRA
 Z29vZG1pcy5vcmcACgkQybkF8mrZjctKtwf8CeKGqOdlqkZEafIpWaIASXmAVMO/
 WE+hQK+rCydWFvzADgb/rOmsR0ou8WGEXcuUPxVxmvMyqhKhZ6AU1hE/7Y8P0pMq
 F4bev+j3lAJC65ezFAh+ZQcIjaRIH4MFVPsUTaibSPSN7xziMNIpbf9VOVfpUm8A
 jf9p6YAmyhFVi6DstCc29SWnywEVwC2ZWRVKRPXKry8/dPxjfVcLclGX680Eqi9I
 EnYaOdC/mGbtvHPOUSs/P0cfxExHmyEErQHeOV8FPymj6KJ6+KoYIiELNlTHUBj/
 eeKzrHc/b3j+lz0RPlA8WxYmpmEm4SE5cV3vRebdBNUBrABSN1RxeOozyQ==
 =1KkS
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Stack tracing and RCU has been having issues with each other and
  lockdep has been pointing out constant problems.

  The changes have been going into the stack tracer, but it has been
  discovered that the problem isn't with the stack tracer itself, but it
  is with calling save_stack_trace() from within the internals of RCU.

  The stack tracer is the one that can trigger the issue the easiest,
  but examining the problem further, it could also happen from a WARN()
  in the wrong place, or even if an NMI happened in this area and it did
  an rcu_read_lock().

  The critical area is where RCU is not watching. Which can happen while
  going to and from idle, or bringing up or taking down a CPU.

  The final fix was to put the protection in kernel_text_address() as it
  is the one that requires RCU to be watching while doing the stack
  trace.

  To make this work properly, Paul had to allow rcu_irq_enter() happen
  after rcu_nmi_enter(). This should have been done anyway, since an NMI
  can page fault (reading vmalloc area), and a page fault triggers
  rcu_irq_enter().

  One patch is just a consolidation of code so that the fix only needed
  to be done in one location"

* tag 'trace-v4.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Remove RCU work arounds from stack tracer
  extable: Enable RCU if it is not watching in kernel_text_address()
  extable: Consolidate *kernel_text_address() functions
  rcu: Allow for page faults in NMI handlers
2017-09-25 15:22:31 -07:00
Paul E. McKenney 28585a8326 rcu: Allow for page faults in NMI handlers
A number of architecture invoke rcu_irq_enter() on exception entry in
order to allow RCU read-side critical sections in the exception handler
when the exception is from an idle or nohz_full CPU.  This works, at
least unless the exception happens in an NMI handler.  In that case,
rcu_nmi_enter() would already have exited the extended quiescent state,
which would mean that rcu_irq_enter() would (incorrectly) cause RCU
to think that it is again in an extended quiescent state.  This will
in turn result in lockdep splats in response to later RCU read-side
critical sections.

This commit therefore causes rcu_irq_enter() and rcu_irq_exit() to
take no action if there is an rcu_nmi_enter() in effect, thus avoiding
the unscheduled return to RCU quiescent state.  This in turn should
make the kernel safe for on-demand RCU voyeurism.

Link: http://lkml.kernel.org/r/20170922211022.GA18084@linux.vnet.ibm.com

Cc: stable@vger.kernel.org
Fixes: 0be964be0 ("module: Sanitize RCU usage and locking")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-09-23 16:49:42 -04:00
Alexey Dobriyan 9b130ad5bb treewide: make "nr_cpu_ids" unsigned
First, number of CPUs can't be negative number.

Second, different signnnedness leads to suboptimal code in the following
cases:

1)
	kmalloc(nr_cpu_ids * sizeof(X));

"int" has to be sign extended to size_t.

2)
	while (loff_t *pos < nr_cpu_ids)

MOVSXD is 1 byte longed than the same MOV.

Other cases exist as well. Basically compiler is told that nr_cpu_ids
can't be negative which can't be deduced if it is "int".

Code savings on allyesconfig kernel: -3KB

	add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
	function                                     old     new   delta
	coretemp_cpu_online                          450     512     +62
	rcu_init_one                                1234    1272     +38
	pci_device_probe                             374     399     +25

				...

	pgdat_reclaimable_pages                      628     556     -72
	select_fallback_rq                           446     369     -77
	task_numa_find_cpu                          1923    1807    -116

Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:48 -07:00
Paul E. McKenney 656e7c0c0a Merge branches 'doc.2017.08.17a', 'fixes.2017.08.17a', 'hotplug.2017.07.25b', 'misc.2017.08.17a', 'spin_unlock_wait_no.2017.08.17a', 'srcu.2017.07.27c' and 'torture.2017.07.24c' into HEAD
doc.2017.08.17a: Documentation updates.
fixes.2017.08.17a: RCU fixes.
hotplug.2017.07.25b: CPU-hotplug updates.
misc.2017.08.17a: Miscellaneous fixes outside of RCU (give or take conflicts).
spin_unlock_wait_no.2017.08.17a: Remove spin_unlock_wait().
srcu.2017.07.27c: SRCU updates.
torture.2017.07.24c: Torture-test updates.
2017-08-17 08:10:04 -07:00
Paul E. McKenney 16c0b10607 rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter()
The rcu_idle_exit() and rcu_idle_enter() functions are exported because
they were originally used by RCU_NONIDLE(), which was intended to
be usable from modules.  However, RCU_NONIDLE() now instead uses
rcu_irq_enter_irqson() and rcu_irq_exit_irqson(), which are not
exported, and there have been no complaints.

This commit therefore removes the exports from rcu_idle_exit() and
rcu_idle_enter().

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:25 -07:00
Paul E. McKenney d4db30af51 rcu: Add warning to rcu_idle_enter() for irqs enabled
All current callers of rcu_idle_enter() have irqs disabled, and
rcu_idle_enter() relies on this, but doesn't check.  This commit
therefore adds a RCU_LOCKDEP_WARN() to add some verification to the trust.
While we are there, pass "true" rather than "1" to rcu_eqs_enter().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:25 -07:00
Peter Zijlstra (Intel) 3a60799269 rcu: Make rcu_idle_enter() rely on callers disabling irqs
All callers to rcu_idle_enter() have irqs disabled, so there is no
point in rcu_idle_enter disabling them again.  This commit therefore
replaces the irq disabling with a RCU_LOCKDEP_WARN().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:24 -07:00
Paul E. McKenney 2dee9404fa rcu: Add assertions verifying blocked-tasks list
This commit adds assertions verifying the consistency of the rcu_node
structure's ->blkd_tasks list and its ->gp_tasks, ->exp_tasks, and
->boost_tasks pointers.  In particular, the ->blkd_tasks lists must be
empty except for leaf rcu_node structures.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:23 -07:00
Masami Hiramatsu 35fe723bda rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit()
Set disable_rcu_irq_enter on not only rcu_eqs_enter_common() but also
rcu_eqs_exit(), since rcu_eqs_exit() suffers from the same issue as was
fixed for rcu_eqs_enter_common() by commit 03ecd3f48e ("rcu/tracing:
Add rcu_disabled to denote when rcu_irq_enter() will not work").

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:23 -07:00
Paul E. McKenney d8db2e86d8 rcu: Add TPS() protection for _rcu_barrier_trace strings
The _rcu_barrier_trace() function is a wrapper for trace_rcu_barrier(),
which needs TPS() protection for strings passed through the second
argument.  However, it has escaped prior TPS()-ification efforts because
it _rcu_barrier_trace() does not start with "trace_".  This commit
therefore adds the needed TPS() protection

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-17 07:26:22 -07:00
Luis R. Rodriguez d5374226c3 rcu: Use idle versions of swait to make idle-hack clear
These RCU waits were set to use interruptible waits to avoid the kthreads
contributing to system load average, even though they are not interruptible
as they are spawned from a kthread. Use the new TASK_IDLE swaits which makes
our goal clear, and removes confusion about these paths possibly being
interruptible -- they are not.

When the system is idle the RCU grace-period kthread will spend all its time
blocked inside the swait_event_interruptible(). If the interruptible() was
not used, then this kthread would contribute to the load average. This means
that an idle system would have a load average of 2 (or 3 if PREEMPT=y),
rather than the load average of 0 that almost fifty years of UNIX has
conditioned sysadmins to expect.

The same argument applies to swait_event_interruptible_timeout() use. The
RCU grace-period kthread spends its time blocked inside this call while
waiting for grace periods to complete. In particular, if there was only one
busy CPU, but that CPU was frequently invoking call_rcu(), then the RCU
grace-period kthread would spend almost all its time blocked inside the
swait_event_interruptible_timeout(). This would mean that the load average
would be 2 rather than the expected 1 for the single busy CPU.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:15 -07:00
Paul E. McKenney 09efeeee17 rcu: Move callback-list warning to irq-disable region
After adopting callbacks from a newly offlined CPU, the adopting CPU
checks to make sure that its callback list's count is zero only if the
list has no callbacks and vice versa.  Unfortunately, it does so after
enabling interrupts, which means that false positives are possible due to
interrupt handlers invoking call_rcu().  Although these false positives
are improbable, rcutorture did make it happen once.

This commit therefore moves this check to an irq-disabled region of code,
thus suppressing the false positive.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:50 -07:00
Paul E. McKenney f2dbe4a562 rcu: Localize rcu_state ->orphan_pend and ->orphan_done
Given that the rcu_state structure's >orphan_pend and ->orphan_done
fields are used only during migration of callbacks from the recently
offlined CPU to a surviving CPU, if rcu_send_cbs_to_orphanage() and
rcu_adopt_orphan_cbs() are combined, these fields can become local
variables in the combined function.  This commit therefore combines
rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() into a new
rcu_segcblist_merge() function and removes the ->orphan_pend and
->orphan_done fields.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:49 -07:00
Paul E. McKenney 21cc248384 rcu: Advance callbacks after migration
When migrating callbacks from a newly offlined CPU, we are already
holding the root rcu_node structure's lock, so it costs almost nothing
to advance and accelerate the newly migrated callbacks.  This patch
therefore makes this advancing and acceleration happen.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:48 -07:00
Paul E. McKenney 537b85c870 rcu: Eliminate rcu_state ->orphan_lock
The ->orphan_lock is acquired and released only within the
rcu_migrate_callbacks() function, which now acquires the root rcu_node
structure's ->lock.  This commit therefore eliminates the ->orphan_lock
in favor of the root rcu_node structure's ->lock.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:48 -07:00
Paul E. McKenney 9fa46fb8c9 rcu: Advance outgoing CPU's callbacks before migrating them
It is possible that the outgoing CPU is unaware of recent grace periods,
and so it is also possible that some of its pending callbacks are actually
ready to be invoked.  The current callback-migration code would needlessly
force these callbacks to pass through another grace period.  This commit
therefore invokes rcu_advance_cbs() on the outgoing CPU's callbacks in
order to give them full credit for having passed through any recent
grace periods.

This also fixes an odd theoretical bug where there are no callbacks in
the system except for those on the outgoing CPU, none of those callbacks
have yet been associated with a grace-period number, there is never again
another callback registered, and the surviving CPU never again takes a
scheduling-clock interrupt, never goes idle, and never enters nohz_full
userspace execution.  Yes, this is (just barely) possible.  It requires
that the surviving CPU be a nohz_full CPU, that its scheduler-clock
interrupt be shut off, and that it loop forever in the kernel.  You get
bonus points if you can make this one happen!  ;-)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:47 -07:00
Paul E. McKenney b1a2d79fe7 rcu: Make NOCB CPUs migrate CBs directly from outgoing CPU
RCU's CPU-hotplug callback-migration code first moves the outgoing
CPU's callbacks to ->orphan_done and ->orphan_pend, and only then
moves them to the NOCB callback list.  This commit avoids the
extra step (and simplifies the code) by moving the callbacks directly
from the outgoing CPU's callback list to the NOCB callback list.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:47 -07:00
Paul E. McKenney 95335c0355 rcu: Check for NOCB CPUs and empty lists earlier in CB migration
The current CPU-hotplug RCU-callback-migration code checks
for the source (newly offlined) CPU being a NOCBs CPU down in
rcu_send_cbs_to_orphanage().  This commit simplifies callback migration a
bit by moving this check up to rcu_migrate_callbacks().  This commit also
adds a check for the source CPU having no callbacks, which eases analysis
of the rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() functions.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:46 -07:00
Paul E. McKenney c47e067a3c rcu: Remove orphan/adopt event-tracing fields
The rcu_node structure's ->n_cbs_orphaned and ->n_cbs_adopted fields
are updated, but never read.  This commit therefore removes them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:46 -07:00
Paul E. McKenney 313517fc44 rcu: Make expedited GPs correctly handle hardware CPU insertion
The update of the ->expmaskinitnext and of ->ncpus are unsynchronized,
with the value of ->ncpus being incremented long before the corresponding
->expmaskinitnext mask is updated.  If an RCU expedited grace period
sees ->ncpus change, it will update the ->expmaskinit masks from the new
->expmaskinitnext masks.  But it is possible that ->ncpus has already
been updated, but the ->expmaskinitnext masks still have their old values.
For the current expedited grace period, no harm done.  The CPU could not
have been online before the grace period started, so there is no need to
wait for its non-existent pre-existing readers.

But the next RCU expedited grace period is in a world of hurt.  The value
of ->ncpus has already been updated, so this grace period will assume
that the ->expmaskinitnext masks have not changed.  But they have, and
they won't be taken into account until the next never-been-online CPU
comes online.  This means that RCU will be ignoring some CPUs that it
should be paying attention to.

The solution is to update ->ncpus and ->expmaskinitnext while holding
the ->lock for the rcu_node structure containing the ->expmaskinitnext
mask.  Because smp_store_release() is now used to update ->ncpus and
smp_load_acquire() is now used to locklessly read it, if the expedited
grace period sees ->ncpus change, then the updating CPU has to
already be holding the corresponding ->lock.  Therefore, when the
expedited grace period later acquires that ->lock, it is guaranteed
to see the new value of ->expmaskinitnext.

On the other hand, if the expedited grace period loads ->ncpus just
before an update, earlier full memory barriers guarantee that
the incoming CPU isn't far enough along to be running any RCU readers.

This commit therefore makes the required change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:45 -07:00
Paul E. McKenney a58163d8ca rcu: Migrate callbacks earlier in the CPU-offline timeline
RCU callbacks must be migrated away from an outgoing CPU, and this is
done near the end of the CPU-hotplug operation, after the outgoing CPU is
long gone.  Unfortunately, this means that other CPU-hotplug callbacks
can execute while the outgoing CPU's callbacks are still immobilized
on the long-gone CPU's callback lists.  If any of these CPU-hotplug
callbacks must wait, either directly or indirectly, for the invocation
of any of the immobilized RCU callbacks, the system will hang.

This commit avoids such hangs by migrating the callbacks away from the
outgoing CPU immediately upon its departure, shortly after the return
from __cpu_die() in takedown_cpu().  Thus, RCU is able to advance these
callbacks and invoke them, which allows all the after-the-fact CPU-hotplug
callbacks to wait on these RCU callbacks without risk of a hang.

While in the neighborhood, this commit also moves rcu_send_cbs_to_orphanage()
and rcu_adopt_orphan_cbs() under a pre-existing #ifdef to avoid including
dead code on the one hand and to avoid define-without-use warnings on the
other hand.

Reported-by: Jeffrey Hugo <jhugo@codeaurora.org>
Link: http://lkml.kernel.org/r/db9c91f6-1b17-6136-84f0-03c3c2581ab4@codeaurora.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Richard Weinberger <richard@nod.at>
2017-07-25 13:03:43 -07:00
Paul E. McKenney 96036c4306 rcu: Add last-CPU to GP-kthread starvation messages
This commit augments the grace-period-kthread starvation debugging
messages by adding the last CPU that ran the kthread.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:18 -07:00
Paul E. McKenney fe5ac724d8 rcu: Remove nohz_full full-system-idle state machine
The NO_HZ_FULL_SYSIDLE full-system-idle capability was added in 2013
by commit 0edd1b1784 ("nohz_full: Add full-system-idle state machine"),
but has not been used.  This commit therefore removes it.

If it turns out to be needed later, this commit can always be reverted.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-08 18:52:39 -07:00