alistair23-linux/kernel
David Carrillo-Cisneros 058fe1c044 perf/core: Make cgroup switch visit only cpuctxs with cgroup events
This patch follows from a conversation in CQM/CMT's last series about
speeding up the context switch for cgroup events:

  https://patchwork.kernel.org/patch/9478617/

This is a low-hanging fruit optimization. It replaces the iteration over
the "pmus" list in cgroup switch by an iteration over a new list that
contains only cpuctxs with at least one cgroup event.

This is necessary because the number of PMUs have increased over the years
e.g modern x86 server systems have well above 50 PMUs.

The iteration over the full PMU list is unneccessary and can be costly in
heavy cache contention scenarios.

Below are some instrumentation measurements with 10, 50 and 90 percentiles
of the total cost of context switch before and after this optimization for
a simple array read/write microbenchark.

  Contention
    Level    Nr events      Before (us)            After (us)       Median
  L2    L3     types      (10%, 50%, 90%)       (10%, 50%, 90%     Speedup
  --------------------------------------------------------------------------
  Low   Low       1       (1.72, 2.42, 5.85)    (1.35, 1.64, 5.46)     29%
  High  Low       1       (2.08, 4.56, 19.8)    (1720, 2.20, 13.7)     51%
  High  High      1       (2.86, 10.4, 12.7)    (2.54, 4.32, 12.1)     58%

  Low   Low       2       (1.98, 3.20, 6.89)    (1.68, 2.41, 8.89)     24%
  High  Low       2       (2.48, 5.28, 22.4)    (2150, 3.69, 14.6)     30%
  High  High      2       (3.32, 8.09, 13.9)    (2.80, 5.15, 13.7)     36%

where:

  1 event type  = cycles
  2 event types = cycles,intel_cqm/llc_occupancy/

   Contention L2 Low: workset  <  L2 cache size.
                 High:  "     >>  L2   "     " .
   Contention L3 Low: workset of task on all sockets  <  L3 cache size.
                 High:   "     "   "   "   "    "    >>  L3   "     " .

   Median Speedup is (50%ile Before - 50%ile After) /  50%ile Before

Unsurprisingly, the benefits of this optimization decrease with the number
of cpuctxs with a cgroup events, yet, is never detrimental.

Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20170118192454.58008-2-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-30 12:01:13 +01:00
..
bpf bpf: don't trigger OOM killer under pressure with map alloc 2017-01-18 17:12:26 -05:00
configs
debug kdb: call vkdb_printf() from vprintk_default() only when wanted 2016-12-14 16:04:08 -08:00
events perf/core: Make cgroup switch visit only cpuctxs with cgroup events 2017-01-30 12:01:13 +01:00
gcov
irq genirq/affinity: Fix node generation from cpumask 2016-12-15 12:32:35 +01:00
livepatch
locking Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
power Merge branches 'pm-sleep' and 'pm-cpufreq' 2017-01-27 00:08:59 +01:00
printk Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
rcu rcu: Narrow early boot window of illegal synchronous grace periods 2017-01-14 21:23:48 -08:00
sched ktime: Cleanup ktime_set() usage 2016-12-25 17:21:22 +01:00
time nohz: Fix collision between tick and other hrtimers 2017-01-11 10:41:33 +01:00
trace clocksource: Use a plain u64 instead of cycle_t 2016-12-25 11:04:12 +01:00
.gitignore
acct.c
async.c
audit.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-12-17 18:44:00 -08:00
audit.h audit_log_{name,link_denied}: constify struct path 2016-12-05 19:00:38 -05:00
audit_fsnotify.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-12-17 18:44:00 -08:00
audit_tree.c Merge branch 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit 2017-01-05 23:06:06 -08:00
audit_watch.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-12-17 18:44:00 -08:00
auditfilter.c audit: add support for session ID user filter 2016-11-29 15:10:12 -05:00
auditsc.c Merge branch 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit 2016-12-14 14:06:40 -08:00
backtracetest.c
bounds.c
capability.c capability: export has_capability 2017-01-12 07:01:56 -07:00
cgroup.c cgroup: add support for eBPF programs 2016-11-25 16:25:52 -05:00
cgroup_freezer.c
cgroup_pids.c
compat.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
configs.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
context_tracking.c
cpu.c cpu/hotplug: Remove unused but set variable in _cpu_down() 2017-01-18 11:55:09 +01:00
cpu_pm.c
cpuset.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
crash_dump.c
cred.c
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
extable.c kprobes, extable: Identify kprobes trampolines as kernel text area 2017-01-14 08:38:05 +01:00
fork.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
freezer.c
futex.c ktime: Get rid of the union 2016-12-25 17:21:22 +01:00
futex_compat.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
groups.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
hung_task.c hung_task: decrement sysctl_hung_task_warnings only if it is positive 2016-12-12 18:55:09 -08:00
irq_work.c
jump_label.c jump_labels: API for flushing deferred jump label updates 2017-01-12 14:33:16 +01:00
kallsyms.c
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/mutex: Allow MUTEX_SPIN_ON_OWNER when DEBUG_MUTEXES 2016-10-25 11:31:51 +02:00
Kconfig.preempt
kcov.c kcov: make kcov work properly with KASLR enabled 2016-12-20 09:48:47 -08:00
kexec.c
kexec_core.c kexec: add cond_resched into kimage_alloc_crash_control_pages 2016-12-14 16:04:07 -08:00
kexec_file.c ima: on soft reboot, save the measurement list 2016-12-20 09:48:44 -08:00
kexec_internal.h kexec_file: Allow arch-specific memory walking for kexec_add_buffer 2016-11-30 23:14:57 +11:00
kmod.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
kprobes.c kprobes, extable: Identify kprobes trampolines as kernel text area 2017-01-14 08:38:05 +01:00
ksysfs.c
kthread.c kthread: add __printf attributes 2016-12-12 18:55:06 -08:00
latencytop.c
Makefile Merge branch 'akpm' (patches from Andrew) 2016-12-14 17:25:18 -08:00
membarrier.c
memremap.c mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done} 2017-01-10 18:31:54 -08:00
module-internal.h
module.c taint/module: Fix problems when out-of-kernel driver defines true or false 2017-01-17 10:56:45 -08:00
module_signing.c
notifier.c
nsproxy.c
padata.c
panic.c kernel/panic.c: add missing \n 2017-01-24 16:26:14 -08:00
params.c
pid.c
pid_namespace.c pid: fix lockdep deadlock warning due to ucount_lock 2017-01-10 13:34:56 +13:00
profile.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ptrace.c ptrace: Don't allow accessing an undumpable mm 2016-11-22 12:57:38 -06:00
range.c
reboot.c
relay.c relay: check array offset before using it 2016-12-14 16:04:08 -08:00
resource.c
seccomp.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-12-14 13:57:44 -08:00
signal.c signal: protect SIGNAL_UNKILLABLE from unintentional clearing. 2017-01-10 18:31:55 -08:00
smp.c kernel/smp: Tell the user we're bringing up secondary CPUs 2016-10-26 12:02:35 +02:00
smpboot.c
smpboot.h
softirq.c
stacktrace.c
stop_machine.c locking/core, stop_machine: Yield the CPU during stop machine() 2016-11-16 10:15:09 +01:00
sys.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
sys_ni.c move aio compat to fs/aio.c 2016-12-22 22:58:37 -05:00
sysctl.c sysctl: fix proc_doulongvec_ms_jiffies_minmax() 2017-01-26 09:21:24 -08:00
sysctl_binary.c sysctl: add KERN_CONT to deprecated_sysctl_warning() 2016-12-14 16:04:07 -08:00
task_work.c
taskstats.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-11-15 10:54:36 -05:00
test_kprobes.c
torture.c
tracepoint.c tracing: Have the reg function allow to fail 2016-12-09 09:13:30 -05:00
tsacct.c
ucount.c userns: Make ucounts lock irq-safe 2017-01-24 06:23:51 +13:00
uid16.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
up.c
user-return-notifier.c
user.c
user_namespace.c
utsname.c
utsname_sysctl.c
watchdog.c kernel/watchdog: prevent false hardlockup on overloaded system 2017-01-24 16:26:14 -08:00
watchdog_hld.c kernel/watchdog: prevent false hardlockup on overloaded system 2017-01-24 16:26:14 -08:00
workqueue.c
workqueue_internal.h