remarkable-linux/kernel
Paul Jackson 607717a65d cpuset: remove sched domain hooks from cpusets
Remove the cpuset hooks that defined sched domains depending on the setting
of the 'cpu_exclusive' flag.

The cpu_exclusive flag can only be set on a child if it is set on the
parent.

This made that flag painfully unsuitable for use as a flag defining a
partitioning of a system.

It was entirely unobvious to a cpuset user what partitioning of sched
domains they would be causing when they set that one cpu_exclusive bit on
one cpuset, because it depended on what CPUs were in the remainder of that
cpusets siblings and child cpusets, after subtracting out other
cpu_exclusive cpusets.

Furthermore, there was no way on production systems to query the
result.

Using the cpu_exclusive flag for this was simply wrong from the get go.

Fortunately, it was sufficiently borked that so far as I know, almost no
successful use has been made of this.  One real time group did use it to
affectively isolate CPUs from any load balancing efforts.  They are willing
to adapt to alternative mechanisms for this, such as someway to manipulate
the list of isolated CPUs on a running system.  They can do without this
present cpu_exclusive based mechanism while we develop an alternative.

There is a real risk, to the best of my understanding, of users
accidentally setting up a partitioned scheduler domains, inhibiting desired
load balancing across all their CPUs, due to the nonobvious (from the
cpuset perspective) side affects of the cpu_exclusive flag.

Furthermore, since there was no way on a running system to see what one was
doing with sched domains, this change will be invisible to any using code.
Unless they have real insight to the scheduler load balancing choices, they
will be unable to detect that this change has been made in the kernel's
behaviour.

Initial discussion on lkml of this patch has generated much comment.  My
(probably controversial) take on that discussion is that it has reached a
rough concensus that the current cpuset cpu_exclusive mechanism for
defining sched domains is borked.  There is no concensus on the
replacement.  But since we can remove this mechanism, and since its
continued presence risks causing unwanted partitioning of the schedulers
load balancing, we should remove it while we can, as we proceed to work the
replacement scheduler domain mechanisms.

Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:09 -07:00
..
irq request_irq: fix DEBUG_SHIRQ handling 2007-08-31 01:42:23 -07:00
power hibernation doesn't even build on frv - tons of helpers are missing 2007-09-26 09:22:04 -07:00
time clockevents: introduce force broadcast notifier 2007-10-14 22:57:45 +02:00
.gitignore
acct.c Cleanup non-arch xtime uses, use get_seconds() or current_kernel_time(). 2007-07-25 10:09:20 -07:00
audit.c [NET]: make netlink user -> kernel interface synchronious 2007-10-10 21:15:29 -07:00
audit.h Audit: add TTY input auditing 2007-07-16 09:05:47 -07:00
auditfilter.c [PATCH] allow audit filtering on bit & operations 2007-07-22 09:57:02 -04:00
auditsc.c SUNRPC: Convert rpc_pipefs to use the generic filesystem notification hooks 2007-10-09 17:15:26 -04:00
capability.c
compat.c signal/timer/event: timerfd compat code 2007-05-11 08:29:36 -07:00
configs.c
cpu.c PM: Fix dependencies of CONFIG_SUSPEND and CONFIG_HIBERNATION 2007-08-31 01:42:22 -07:00
cpuset.c cpuset: remove sched domain hooks from cpusets 2007-10-16 09:43:09 -07:00
delayacct.c sched: clean up schedstats, cnt -> count 2007-10-15 17:00:12 +02:00
die_notifier.c
dma.c
exec_domain.c
exit.c sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields 2007-10-15 17:00:19 +02:00
extable.c
fork.c sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields 2007-10-15 17:00:19 +02:00
futex.c robust futex thread exit race 2007-10-01 07:52:23 -07:00
futex_compat.c robust futex thread exit race 2007-10-01 07:52:23 -07:00
hrtimer.c [KTIME]: Introduce ktime_sub_ns and ktime_sub_us 2007-10-10 16:48:12 -07:00
itimer.c
kallsyms.c kallsyms: make KSYM_NAME_LEN include space for trailing '\0' 2007-07-17 10:23:03 -07:00
Kconfig.hz
Kconfig.preempt [PATCH] sched: arch preempt notifier mechanism 2007-07-26 13:40:43 +02:00
kexec.c
kfifo.c is_power_of_2: kernel/kfifo.c 2007-07-16 09:05:50 -07:00
kmod.c Restore call_usermodehelper_pipe() behaviour 2007-09-11 17:21:20 -07:00
kprobes.c x86: optimize page faults like all other achitectures and kill notifier cruft 2007-10-16 09:42:50 -07:00
ksysfs.c sched: group scheduling, sysfs tunables 2007-10-15 17:00:14 +02:00
kthread.c kthread: silence bogus section mismatch warning 2007-07-31 15:39:42 -07:00
latency.c
lockdep.c lockdep: syscall exit check 2007-10-11 22:11:12 +02:00
lockdep_internals.h
lockdep_proc.c lockdep: Avoid /proc/lockdep & lock_stat infinite output 2007-10-11 22:11:11 +02:00
Makefile user namespace: add the framework 2007-07-16 09:05:47 -07:00
module.c Fix Off-by-one in /sys/module/*/refcnt 2007-08-22 14:35:35 -07:00
mutex-debug.c
mutex-debug.h
mutex.c lockdep: fixup mutex annotations 2007-10-11 22:11:12 +02:00
mutex.h
nsproxy.c [NET]: Add network namespace clone & unshare support. 2007-10-10 16:52:46 -07:00
panic.c Report that kernel is tainted if there was an OOPS 2007-07-17 10:23:02 -07:00
params.c modules: better error messages when modules fail to load due to a sysfs problem. 2007-07-30 14:25:23 -07:00
pid.c namespace: ensure clone_flags are always stored in an unsigned long 2007-07-16 09:05:48 -07:00
posix-cpu-timers.c sched: make posix-cpu-timers use CFS's accounting information 2007-07-09 18:51:58 +02:00
posix-timers.c more low-hanging fruits - kernel, fs, lib signedness 2007-10-14 12:41:52 -07:00
printk.c slow down printk during boot 2007-10-16 09:42:49 -07:00
profile.c Memoryless nodes: Allow profiling data to fall back to other nodes 2007-10-16 09:42:58 -07:00
ptrace.c m32r: convert to generic sys_ptrace 2007-10-16 09:43:04 -07:00
rcupdate.c lockdep: annotate rcu_read_{,un}lock{,_bh} 2007-10-11 22:11:12 +02:00
rcutorture.c Freezer: make kernel threads nonfreezable by default 2007-07-17 10:23:02 -07:00
relay.c Fix a use after free bug in kernel->userspace relay file support 2007-07-31 15:39:42 -07:00
resource.c memory unplug: memory hotplug cleanup 2007-10-16 09:43:01 -07:00
rtmutex-debug.c FUTEX: Tidy up the code 2007-07-16 09:05:49 -07:00
rtmutex-debug.h
rtmutex-tester.c Freezer: make kernel threads nonfreezable by default 2007-07-17 10:23:02 -07:00
rtmutex.c FUTEX: Tidy up the code 2007-07-16 09:05:49 -07:00
rtmutex.h
rtmutex_common.h FUTEX: Tidy up the code 2007-07-16 09:05:49 -07:00
rwsem.c lockstat: hook into spinlock_t, rwlock_t, rwsem and mutex 2007-07-19 10:04:49 -07:00
sched.c cpuset: remove sched domain hooks from cpusets 2007-10-16 09:43:09 -07:00
sched_debug.c Make scheduler debug file operations const 2007-10-15 17:00:19 +02:00
sched_fair.c sched: reintroduce cache-hot affinity 2007-10-15 17:00:18 +02:00
sched_idletask.c sched: mark scheduling classes as const 2007-10-15 17:00:12 +02:00
sched_rt.c sched: tidy up SCHED_RR 2007-10-15 17:00:13 +02:00
sched_stats.h sched: clean up schedstats, cnt -> count 2007-10-15 17:00:12 +02:00
seccomp.c make seccomp zerocost in schedule 2007-07-16 09:05:50 -07:00
signal.c fix bogus reporting of signals by audit 2007-10-07 16:28:43 -07:00
softirq.c [KERNEL]: Unexport raise_softirq_irqoff 2007-10-10 16:49:18 -07:00
softlockup.c Freezer: make kernel threads nonfreezable by default 2007-07-17 10:23:02 -07:00
spinlock.c lockstat: hook into spinlock_t, rwlock_t, rwsem and mutex 2007-07-19 10:04:49 -07:00
srcu.c
stacktrace.c
stop_machine.c Fix stop_machine_run problem with naughty real time process 2007-07-16 09:05:41 -07:00
sys.c Fix SMP poweroff hangs 2007-10-01 07:52:23 -07:00
sys_ni.c diskquota: 32bit quota tools on 64bit architectures 2007-07-16 09:05:48 -07:00
sysctl.c hugetlb: Add hugetlb_dynamic_pool sysctl 2007-10-16 09:43:02 -07:00
taskstats.c taskstats: add context-switch counters 2007-07-16 09:05:46 -07:00
time.c Cleanup non-arch xtime uses, use get_seconds() or current_kernel_time(). 2007-07-25 10:09:20 -07:00
timer.c Pull ia64-clocksource into release branch 2007-07-20 11:26:47 -07:00
tsacct.c Cleanup non-arch xtime uses, use get_seconds() or current_kernel_time(). 2007-07-25 10:09:20 -07:00
uid16.c
user.c sched: generate uevents for user creation/destruction 2007-10-15 17:00:18 +02:00
user_namespace.c Fix user namespace exiting OOPs 2007-09-19 11:24:18 -07:00
utsname.c Fix UTS corruption during clone(CLONE_NEWUTS) 2007-09-19 11:24:17 -07:00
utsname_sysctl.c remove CONFIG_UTS_NS and CONFIG_IPC_NS 2007-07-16 09:05:47 -07:00
wait.c
workqueue.c fix bogus hotplug cpu warning 2007-08-27 10:27:48 -07:00