remarkable-linux/kernel
Paul E. McKenney ef631b0ca0 rcu: Make hierarchical RCU less IPI-happy
This patch fixes a hierarchical-RCU performance bug located by Anton
Blanchard.  The problem stems from a misguided attempt to provide a
work-around for jiffies-counter failure.  This work-around uses a per-CPU
n_rcu_pending counter, which is incremented on each call to rcu_pending(),
which in turn is called from each scheduling-clock interrupt.  Each CPU
then treats this counter as a surrogate for the jiffies counter, so
that if the jiffies counter fails to advance, the per-CPU n_rcu_pending
counter will cause RCU to invoke force_quiescent_state(), which in turn
will (among other things) send resched IPIs to CPUs that have thus far
failed to pass through an RCU quiescent state.

Unfortunately, each CPU resets only its own counter after sending a
batch of IPIs.  This means that the other CPUs will also (needlessly)
send -another- round of IPIs, for a full N-squared set of IPIs in the
worst case every three scheduler-clock ticks until the grace period
finally ends.  It is not reasonable for a given CPU to reset each and
every n_rcu_pending for all the other CPUs, so this patch instead simply
disables the jiffies-counter "training wheels", thus eliminating the
excessive IPIs.

Note that the jiffies-counter IPIs do not have this problem due to
the fact that the jiffies counter is global, so that the CPU sending
the IPIs can easily reset things, thus preventing the other CPUs from
sending redundant IPIs.

Note also that the n_rcu_pending counter remains, as it will continue to
be used for tracing.  It may also see use to update the jiffies counter,
should an appropriate kick-the-jiffies-counter API appear.

Located-by: Anton Blanchard <anton@au1.ibm.com>
Tested-by: Anton Blanchard <anton@au1.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: anton@samba.org
Cc: akpm@linux-foundation.org
Cc: dipankar@in.ibm.com
Cc: manfred@colorfullife.com
Cc: cl@linux-foundation.org
Cc: josht@linux.vnet.ibm.com
Cc: schamp@sgi.com
Cc: niv@us.ibm.com
Cc: dvhltc@us.ibm.com
Cc: ego@in.ibm.com
Cc: laijs@cn.fujitsu.com
Cc: rostedt@goodmis.org
Cc: peterz@infradead.org
Cc: penberg@cs.helsinki.fi
Cc: andi@firstfloor.org
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
LKML-Reference: <12396834793575-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 11:31:50 +02:00
..
irq Merge branches 'core-fixes-for-linus', 'irq-fixes-for-linus' and 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-09 10:35:30 -07:00
power PM/Hibernate: Wait for SCSI devices scan to complete during resume 2009-04-13 11:37:07 -07:00
time Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-03-26 16:05:42 -07:00
trace tracing/filters: return proper error code when writing filter file 2009-04-12 11:59:29 +02:00
.gitignore
acct.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
async.c async: remove the temporary (2.6.29) "async is off by default" code 2009-03-28 13:05:30 -07:00
audit.c Audit: remove spaces from audit_log_d_path 2009-04-05 13:49:04 -04:00
audit.h fixing audit rule ordering mess, part 1 2009-01-04 15:14:41 -05:00
audit_tree.c audit: incorrect ref counting in audit tree tag_chunk 2009-04-05 13:48:26 -04:00
auditfilter.c make the e->rule.xxx shorter in kernel auditfilter.c 2009-04-05 13:40:33 -04:00
auditsc.c Audit: remove spaces from audit_log_d_path 2009-04-05 13:49:04 -04:00
backtracetest.c
bounds.c
capability.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
cgroup.c memcg: fix OOM killer under memcg 2009-04-02 19:04:55 -07:00
cgroup_debug.c debug cgroup: remove unneeded cgroup_lock 2009-04-02 19:04:54 -07:00
cgroup_freezer.c freezer_cg: disable writing freezer.state of root cgroup 2008-11-12 17:17:16 -08:00
compat.c Allow times and time system calls to return small negative values 2009-01-06 15:59:13 -08:00
configs.c kernel/configs.c: remove useless comments 2008-10-20 08:52:34 -07:00
cpu.c cpumask: use set_cpu_active in init/main.c 2009-03-30 22:05:12 +10:30
cpuset.c cpusets: prevent PF_THREAD_BOUND tasks from attaching to non-root cpusets 2009-04-02 19:04:57 -07:00
cred-internals.h CRED: Inaugurate COW credentials 2008-11-14 10:39:23 +11:00
cred.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 2009-01-09 13:59:25 -08:00
delayacct.c schedstat: consolidate per-task cpu runtime stats 2008-12-18 13:54:01 +01:00
dma-coherent.c dma-coherent: Restore dma_alloc_from_coherent() large alloc fall back policy. 2009-01-21 18:51:53 +09:00
dma.c kernel/dma.c: remove a CVS keyword 2008-10-16 11:21:30 -07:00
exec_domain.c Get rid of indirect include of fs_struct.h 2009-03-31 23:00:27 -04:00
exit.c Merge branch 'irq/threaded' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-07 14:07:52 -07:00
extable.c Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-05 11:04:19 -07:00
fork.c Merge branches 'core-fixes-for-linus', 'irq-fixes-for-linus' and 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-09 10:35:30 -07:00
freezer.c freezer_cg: use thaw_process() in unfreeze_cgroup() 2008-10-30 11:38:45 -07:00
futex.c futex: comment requeue key reference semantics 2009-04-02 23:39:53 +02:00
futex_compat.c CRED: Use RCU to access another task's creds and to release a task's own creds 2008-11-14 10:39:19 +11:00
hrtimer.c hrtimer: fix rq->lock inversion (again) 2009-03-31 14:52:52 +02:00
hung_task.c softlockup: ensure the task has been switched out once 2009-02-11 11:04:16 +01:00
itimer.c timers: split process wide cpu clocks/timers 2009-02-05 13:04:33 +01:00
kallsyms.c Ksplice: Add functions for walking kallsyms symbols 2009-03-31 13:05:32 +10:30
Kconfig.freezer container freezer: implement freezer cgroup subsystem 2008-10-20 08:52:34 -07:00
Kconfig.hz
Kconfig.preempt rcu: provide RCU options on non-preempt architectures too 2008-12-25 09:31:28 +01:00
kexec.c kexec: vmcoreinfo_data[] can become static 2009-04-02 19:05:04 -07:00
kfifo.c
kgdb.c kgdb: call touch_softlockup_watchdog on resume 2008-10-06 13:50:59 -05:00
kmod.c module: create a request_module_nowait() 2009-03-31 13:05:35 +10:30
kprobes.c kprobes: support per-kprobe disabling 2009-04-07 08:31:08 -07:00
ksysfs.c kernel/ksysfs.c:fix dependence on CONFIG_NET 2009-01-06 10:44:31 -08:00
kthread.c kthread: move sched-realeted initialization from kthreadd context 2009-04-09 09:50:37 +09:30
latencytop.c sched, latencytop: incorporate review feedback from Andrew Morton 2009-02-11 10:18:04 +01:00
lockdep.c Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-06 13:37:30 -07:00
lockdep_internals.h lockdep: get_user_chars() redo 2009-02-14 23:28:22 +01:00
lockdep_proc.c lockstat: warn about disabled lock debugging 2009-02-14 23:28:28 +01:00
lockdep_states.h lockdep: move state bit definitions around 2009-02-14 23:27:59 +01:00
Makefile Merge branch 'linus' into core/softlockup 2009-04-07 11:15:40 +02:00
marker.c markers/tracpoints: fix non-modular build 2008-11-16 09:52:03 +01:00
module.c async: Fix module loading async-work regression 2009-04-11 12:44:49 -07:00
mutex-debug.c mutex: implement adaptive spinning 2009-01-14 18:09:02 +01:00
mutex-debug.h mutex: implement adaptive spinning 2009-01-14 18:09:02 +01:00
mutex.c mutex: have non-spinning mutexes on s390 by default 2009-04-09 19:28:24 +02:00
mutex.h mutex: implement adaptive spinning 2009-01-14 18:09:02 +01:00
notifier.c Merge commit 'v2.6.28-rc6' into core/debug 2008-11-26 08:22:50 +01:00
ns_cgroup.c cgroups: relax ns_can_attach checks to allow attaching to grandchild cgroups 2009-04-02 19:04:53 -07:00
nsproxy.c User namespaces: set of cleanups (v2) 2008-11-24 18:57:41 -05:00
panic.c lockdep: continue lock debugging despite some taints 2009-04-12 16:10:52 +02:00
params.c param: fix charp parameters set via sysfs 2009-03-31 13:05:30 +10:30
pid.c pids: refactor vnr/nr_ns helpers to make them safe 2009-04-02 19:05:02 -07:00
pid_namespace.c signals: zap_pid_ns_process() should use force_sig() 2009-04-02 19:04:58 -07:00
pm_qos_params.c
posix-cpu-timers.c Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-09 10:37:28 -07:00
posix-timers.c [CVE-2009-0029] System call wrappers part 05 2009-01-14 14:15:20 +01:00
printk.c Merge branch 'printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-05 10:23:25 -07:00
profile.c profiling: fix broken profiling regression 2009-02-10 00:50:37 +01:00
ptrace.c ptrace: fix exit_ptrace() vs ptrace_traceme() race 2009-04-13 15:04:31 -07:00
rcuclassic.c kmemtrace, rcu: fix linux/rcutree.h and linux/rcuclassic.h dependencies 2009-04-03 12:23:02 +02:00
rcupdate.c rcu: rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu 2009-03-31 00:09:37 +02:00
rcupreempt.c kmemtrace, rcu: fix rcupreempt.c data structure dependencies 2009-04-03 12:23:04 +02:00
rcupreempt_trace.c "Tree RCU": scalable classic RCU implementation 2008-12-18 21:56:04 +01:00
rcutorture.c cpumask: convert rcutorture.c 2009-03-30 22:05:16 +10:30
rcutree.c rcu: Make hierarchical RCU less IPI-happy 2009-04-14 11:31:50 +02:00
rcutree.h kmemtrace, rcu: fix rcu_tree_trace.c data structure dependencies 2009-04-03 12:23:03 +02:00
rcutree_trace.c rcu: Make hierarchical RCU less IPI-happy 2009-04-14 11:31:50 +02:00
relay.c Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-05 11:04:19 -07:00
res_counter.c memcg: memory cgroup resource counters for hierarchy 2009-01-08 08:31:05 -08:00
resource.c resources: fix parameter name and kernel-doc 2009-01-15 16:39:38 -08:00
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rtmutex_common.h
rwsem.c
sched.c Merge commit 'v2.6.30-rc1' into sched/urgent 2009-04-08 17:26:00 +02:00
sched_clock.c Merge branch 'tracing/core-v2' into tracing-for-linus 2009-04-02 00:49:02 +02:00
sched_cpupri.c sched_rt: don't allocate cpumask in fastpath 2009-04-01 13:24:51 +02:00
sched_cpupri.h cpumask: remove cpumask_t from core 2009-03-30 22:05:17 +10:30
sched_debug.c sched: remove unused fields from struct rq 2009-03-24 23:16:51 +01:00
sched_fair.c Merge branch 'sched/urgent'; commit 'v2.6.29-rc5' into sched/core 2009-02-15 21:15:16 +01:00
sched_features.h Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-03-30 17:17:35 -07:00
sched_idletask.c sched: add CONFIG_SMP consistency 2008-10-22 10:01:52 +02:00
sched_rt.c Merge commit 'v2.6.30-rc1' into sched/urgent 2009-04-08 17:26:00 +02:00
sched_stats.h sched: remove unused fields from struct rq 2009-03-24 23:16:51 +01:00
seccomp.c x86-64: seccomp: fix 32/64 syscall hole 2009-03-02 15:41:30 -08:00
semaphore.c
signal.c signals: SI_USER: Masquerade si_pid when crossing pid ns boundary 2009-04-02 19:04:58 -07:00
slow-work.c Document the slow work thread pool 2009-04-03 16:42:35 +01:00
smp.c generic-ipi: eliminate WARN_ON()s during oops/panic 2009-03-13 10:47:34 +01:00
softirq.c Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-06 13:37:30 -07:00
softlockup.c softlockup: decouple hung tasks check from softlockup detection 2009-01-16 14:06:04 +01:00
spinlock.c Allow rwlocks to re-enable interrupts 2009-04-02 19:05:11 -07:00
srcu.c
stacktrace.c stacktrace: provide save_stack_trace_tsk() weak alias 2008-12-25 11:44:43 +01:00
stop_machine.c cpumask: remove cpumask_t from core 2009-03-30 22:05:17 +10:30
sys.c kernel/sys.c: clean up sys_shutdown exit path 2009-04-13 15:04:32 -07:00
sys_ni.c [CVE-2009-0029] Make sys_syslog a conditional system call 2009-01-14 14:15:16 +01:00
sysctl.c mm: move the scan_unevictable_pages sysctl to the vm table 2009-04-13 15:04:28 -07:00
sysctl_check.c net: add ARP notify option for devices 2009-02-01 01:04:33 -08:00
taskstats.c cpumask: convert rest of files in kernel/ 2009-01-01 10:12:28 +10:30
test_kprobes.c kprobes: add tests for register_kprobes 2009-01-06 15:59:20 -08:00
time.c [CVE-2009-0029] System call wrappers part 01 2009-01-14 14:15:18 +01:00
timeconst.pl
timer.c Merge branches 'core-fixes-for-linus', 'irq-fixes-for-linus' and 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-09 10:35:30 -07:00
tracepoint.c tracepoints: dont update zero-sized tracepoint sections 2009-03-18 19:55:00 +01:00
tsacct.c Fix fixpoint divide exception in acct_update_integrals 2009-03-09 08:13:35 -07:00
uid16.c [CVE-2009-0029] System call wrappers part 19 2009-01-14 14:15:26 +01:00
up.c smp_call_function_single(): be slightly less stupid, fix #2 2009-01-12 16:04:37 +01:00
user.c Merge branch 'master' into next 2009-03-24 10:52:46 +11:00
user_namespace.c Fix recursive lock in free_uid()/free_user_ns() 2009-02-27 16:26:21 -08:00
utsname.c
utsname_sysctl.c proc_sysctl: use CONFIG_PROC_SYSCTL around ipc and utsname proc_handlers 2009-04-02 19:05:01 -07:00
wait.c wait: prevent exclusive waiter starvation 2009-02-05 12:56:48 -08:00
workqueue.c work_on_cpu(): rewrite it to create a kernel thread on demand 2009-04-09 09:50:37 +09:30