1
0
Fork 0
alistair23-linux/kernel
Aaron Lu 3fc5b3b6a8 smp: Avoid sending needless IPI in smp_call_function_many()
Inter-Processor-Interrupt(IPI) is needed when a page is unmapped and the
process' mm_cpumask() shows the process has ever run on other CPUs. page
migration, page reclaim all need IPIs. The number of IPI needed to send
to different CPUs is especially large for multi-threaded workload since
mm_cpumask() is per process.

For smp_call_function_many(), whenever a CPU queues a CSD to a target
CPU, it will send an IPI to let the target CPU to handle the work.
This isn't necessary - we need only send IPI when queueing a CSD
to an empty call_single_queue.

The reason:

flush_smp_call_function_queue() that is called upon a CPU receiving an
IPI will empty the queue and then handle all of the CSDs there. So if
the target CPU's call_single_queue is not empty, we know that:
i.  An IPI for the target CPU has already been sent by 'previous queuers';
ii. flush_smp_call_function_queue() hasn't emptied that CPU's queue yet.
Thus, it's safe for us to just queue our CSD there without sending an
addtional IPI. And for the 'previous queuers', we can limit it to the
first queuer.

To demonstrate the effect of this patch, a multi-thread workload that
spawns 80 threads to equally consume 100G memory is used. This is tested
on a 2 node broadwell-EP which has 44cores/88threads and 32G memory. So
after 32G memory is used up, page reclaiming starts to happen a lot.

With this patch, IPI number dropped 88% and throughput increased about
15% for the above workload.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/20170519075331.GE2084@aaronlu.sh.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-23 10:01:32 +02:00
..
bpf net: Make IP alignment calulations clearer. 2017-05-22 12:27:07 -04:00
cgroup Merge branch 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2017-05-01 13:52:24 -07:00
configs config: android-base: enable hardened usercopy and kernel ASLR 2017-02-27 18:43:46 -08:00
debug sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h> 2017-03-02 08:42:34 +01:00
events perf/callchain: Force USER_DS when invoking perf_callchain_user() 2017-05-10 07:54:00 +02:00
gcov gcov: support GCC 7.1 2017-05-12 15:57:15 -07:00
irq Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-05-21 11:45:26 -07:00
livepatch Merge branches 'for-4.12/upstream' and 'for-4.12/klp-hybrid-consistency-model' into for-linus 2017-05-01 21:49:28 +02:00
locking Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-05-10 10:30:46 -07:00
power Merge branches 'pm-sleep' and 'powercap' 2017-05-22 20:32:05 +02:00
printk TTY/Serial patches for 4.12-rc1 2017-05-08 18:49:23 -07:00
rcu Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-05-10 10:30:46 -07:00
sched Merge branch 'linus' into sched/core, to pick up fixes 2017-05-23 09:50:35 +02:00
time sched/clock: Remove unused argument to sched_clock_idle_wakeup_event() 2017-05-15 10:15:18 +02:00
trace This fixes a bug caused by not cleaning up the new instance unique triggers 2017-05-20 23:39:03 -07:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/mutex: Allow MUTEX_SPIN_ON_OWNER when DEBUG_MUTEXES 2016-10-25 11:31:51 +02:00
Kconfig.preempt
Makefile crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_CORE 2017-05-08 17:15:11 -07:00
acct.c sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h> 2017-03-02 08:42:39 +01:00
async.c
audit.c Merge branch 'stable-4.12' of git://git.infradead.org/users/pcmoore/audit 2017-05-03 09:21:59 -07:00
audit.h audit: Use timespec64 to represent audit timestamps 2017-05-02 10:16:05 -04:00
audit_fsnotify.c Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-05-03 11:05:15 -07:00
audit_tree.c Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-05-03 11:05:15 -07:00
audit_watch.c Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-05-03 11:05:15 -07:00
auditfilter.c audit: kernel generated netlink traffic should have a portid of 0 2017-05-02 10:16:05 -04:00
auditsc.c Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-05-03 11:05:15 -07:00
backtracetest.c
bounds.c
capability.c capability: export has_capability 2017-01-12 07:01:56 -07:00
compat.c time: Change k_clock nsleep() to use timespec64 2017-04-14 21:49:56 +02:00
configs.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
context_tracking.c
cpu.c Merge branch 'linus' into locking/core, to pick up fixes 2017-04-14 10:29:40 +02:00
cpu_pm.c
crash_core.c ia64: reuse append_elf_note() and final_note() functions 2017-05-08 17:15:11 -07:00
crash_dump.c
cred.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/coredump.h> 2017-03-02 08:42:28 +01:00
delayacct.c sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h> 2017-03-02 08:42:39 +01:00
dma.c
elfcore.c
exec_domain.c
exit.c userfaultfd: non-cooperative: rollback userfaultfd_exit 2017-03-09 17:01:09 -08:00
extable.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-02-22 10:15:09 -08:00
fork.c pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes() 2017-05-13 17:26:02 -05:00
freezer.c freezer, oom: check TIF_MEMDIE on the correct task 2016-07-28 16:07:41 -07:00
futex.c futex: Clarify mark_wake_futex memory barrier usage 2017-04-15 16:03:46 +02:00
futex_compat.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
groups.c mm, vmalloc: use __GFP_HIGHMEM implicitly 2017-05-08 17:15:13 -07:00
hung_task.c kernel/hung_task.c: defer showing held locks 2017-05-08 17:15:10 -07:00
irq_work.c
jump_label.c This release has no new tracing features, just clean ups, minor fixes 2017-02-27 13:26:17 -08:00
kallsyms.c bpf: make jited programs visible in traces 2017-02-17 13:40:05 -05:00
kcmp.c
kcov.c kcov: simplify interrupt check 2017-05-08 17:15:12 -07:00
kexec.c kexec: allow architectures to override boot mapping 2016-08-02 19:35:27 -04:00
kexec_core.c ia64: reuse append_elf_note() and final_note() functions 2017-05-08 17:15:11 -07:00
kexec_file.c kexec, x86/purgatory: Unbreak it and clean it up 2017-03-10 20:55:09 +01:00
kexec_internal.h kexec, x86/purgatory: Unbreak it and clean it up 2017-03-10 20:55:09 +01:00
kmod.c sched/headers, vfs/execve: Prepare to move the do_execve*() prototypes from <linux/sched.h> to <linux/binfmts.h> 2017-03-02 08:42:39 +01:00
kprobes.c kprobes: Document how optimized kprobes are removed from module unload 2017-05-17 21:55:58 -04:00
ksysfs.c crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_CORE 2017-05-08 17:15:11 -07:00
kthread.c cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups 2017-03-17 10:18:47 -04:00
latencytop.c sched/headers: Prepare to move sched_info_on() and force_schedstat_enabled() from <linux/sched.h> to <linux/sched/stat.h> 2017-03-02 08:42:39 +01:00
membarrier.c Fix: Disable sys_membarrier when nohz_full is enabled 2017-01-23 11:32:16 -08:00
memremap.c mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash 2017-05-01 09:15:53 +02:00
module-internal.h
module.c kernel/module.c: use set_memory.h header 2017-05-08 17:15:14 -07:00
module_signing.c
notifier.c kernel/notifier.c: simplify expression 2017-02-24 17:46:56 -08:00
nsproxy.c perf: Add PERF_RECORD_NAMESPACES to include namespaces related info 2017-03-13 15:57:41 -03:00
padata.c padata: get_next is never NULL 2017-04-21 20:30:46 +08:00
panic.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h> 2017-03-02 08:42:34 +01:00
params.c boot/param: Move next_arg() function to lib/cmdline.c for later reuse 2017-04-18 10:37:13 +02:00
pid.c pidns: disable pid allocation if pid_ns_prepare_proc() is failed in alloc_pid() 2017-05-08 17:15:12 -07:00
pid_namespace.c pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes 2017-05-13 17:26:01 -05:00
profile.c sched/headers: Prepare to move sched_info_on() and force_schedstat_enabled() from <linux/sched.h> to <linux/sched/stat.h> 2017-03-02 08:42:39 +01:00
ptrace.c ptrace: fix PTRACE_LISTEN race corrupting task->state 2017-04-08 00:47:48 -07:00
range.c
reboot.c
relay.c Merge branch 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-02 11:38:06 -07:00
resource.c
seccomp.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> 2017-03-02 08:42:36 +01:00
signal.c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-05-10 10:30:46 -07:00
smp.c smp: Avoid sending needless IPI in smp_call_function_many() 2017-05-23 10:01:32 +02:00
smpboot.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h> 2017-03-02 08:42:35 +01:00
smpboot.h
softirq.c sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags() 2017-04-11 09:06:32 +02:00
stacktrace.c stacktrace/x86: add function for detecting reliable stack traces 2017-03-08 09:18:02 +01:00
stop_machine.c locking/core, stop_machine: Yield the CPU during stop machine() 2016-11-16 10:15:09 +01:00
sys.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2017-05-05 11:08:43 -07:00
sys_ni.c move aio compat to fs/aio.c 2016-12-22 22:58:37 -05:00
sysctl.c proc/sysctl: fix the int overflow for jiffies conversion 2017-05-08 17:15:10 -07:00
sysctl_binary.c net: Remove NET_CORE_BUDGET_USECS from sysctl binary interface. 2017-04-21 15:59:52 -04:00
task_work.c task_work: use READ_ONCE/lockless_dereference, avoid pi_lock if !task_works 2016-08-02 19:35:02 -04:00
taskstats.c taskstats: add e/u/stime for TGID command 2017-05-08 17:15:12 -07:00
test_kprobes.c
torture.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> 2017-03-02 08:42:27 +01:00
tracepoint.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h> 2017-03-02 08:42:35 +01:00
tsacct.c sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h> 2017-03-02 08:42:39 +01:00
ucount.c ucount: Remove the atomicity from ucount->count 2017-03-06 15:26:37 -06:00
uid16.c sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
up.c smp: Add function to execute a function synchronously on a CPU 2016-09-05 13:52:39 +02:00
user-return-notifier.c
user.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/user.h> 2017-03-02 08:42:29 +01:00
user_namespace.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> 2017-03-02 08:42:29 +01:00
utsname.c sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
utsname_sysctl.c sched/headers: Remove <linux/rwsem.h> from <linux/sched.h> 2017-03-03 01:45:36 +01:00
watchdog.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h> 2017-03-02 08:42:34 +01:00
watchdog_hld.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h> 2017-03-02 08:42:34 +01:00
workqueue.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-05-01 19:12:53 -07:00
workqueue_internal.h