1
0
Fork 0
remarkable-linux/kernel
Paul Jackson 3e0d98b9f1 [PATCH] cpuset: memory pressure meter
Provide a simple per-cpuset metric of memory pressure, tracking the -rate-
that the tasks in a cpuset call try_to_free_pages(), the synchronous
(direct) memory reclaim code.

This enables batch managers monitoring jobs running in dedicated cpusets to
efficiently detect what level of memory pressure that job is causing.

This is useful both on tightly managed systems running a wide mix of
submitted jobs, which may choose to terminate or reprioritize jobs that are
trying to use more memory than allowed on the nodes assigned them, and with
tightly coupled, long running, massively parallel scientific computing jobs
that will dramatically fail to meet required performance goals if they
start to use more memory than allowed to them.

This patch just provides a very economical way for the batch manager to
monitor a cpuset for signs of memory pressure.  It's up to the batch
manager or other user code to decide what to do about it and take action.

==> Unless this feature is enabled by writing "1" to the special file
    /dev/cpuset/memory_pressure_enabled, the hook in the rebalance
    code of __alloc_pages() for this metric reduces to simply noticing
    that the cpuset_memory_pressure_enabled flag is zero.  So only
    systems that enable this feature will compute the metric.

Why a per-cpuset, running average:

    Because this meter is per-cpuset, rather than per-task or mm, the
    system load imposed by a batch scheduler monitoring this metric is
    sharply reduced on large systems, because a scan of the tasklist can be
    avoided on each set of queries.

    Because this meter is a running average, instead of an accumulating
    counter, a batch scheduler can detect memory pressure with a single
    read, instead of having to read and accumulate results for a period of
    time.

    Because this meter is per-cpuset rather than per-task or mm, the
    batch scheduler can obtain the key information, memory pressure in a
    cpuset, with a single read, rather than having to query and accumulate
    results over all the (dynamically changing) set of tasks in the cpuset.

A per-cpuset simple digital filter (requires a spinlock and 3 words of data
per-cpuset) is kept, and updated by any task attached to that cpuset, if it
enters the synchronous (direct) page reclaim code.

A per-cpuset file provides an integer number representing the recent
(half-life of 10 seconds) rate of direct page reclaims caused by the tasks
in the cpuset, in units of reclaims attempted per second, times 1000.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:42 -08:00
..
irq [PATCH] Alpha: convert to generic irq framework (generic part) 2006-01-06 08:33:40 -08:00
power [PATCH] swsusp: save image header first 2006-01-06 08:33:43 -08:00
.gitignore gitignore: ignore more generated files 2006-01-03 11:35:26 +01:00
Kconfig.hz [PATCH] i386: Selectable Frequency of the Timer Interrupt 2005-06-23 09:45:10 -07:00
Kconfig.preempt [PATCH] sched: voluntary kernel preemption 2005-06-25 16:24:45 -07:00
Makefile [PATCH] RCU torture-testing kernel module 2005-10-30 17:37:27 -08:00
acct.c [PATCH] s390: cputime_t fixes 2006-01-06 08:33:49 -08:00
audit.c [PATCH] Add try_to_freeze to kauditd 2005-12-12 08:57:43 -08:00
auditsc.c [PATCH] gfp_t: kernel/* 2005-10-28 08:16:49 -07:00
capability.c [PATCH] kernel/capability.c: add kerneldoc 2005-07-27 16:26:06 -07:00
compat.c [PATCH] kernel: fix-up schedule_timeout() usage 2005-09-10 10:06:37 -07:00
configs.c update the email address of Randy Dunlap 2006-01-03 13:37:51 +01:00
cpu.c [PATCH] clean up lock_cpu_hotplug() in cpufreq 2005-11-28 14:42:23 -08:00
cpuset.c [PATCH] cpuset: memory pressure meter 2006-01-08 20:13:42 -08:00
crash_dump.c [PATCH] kernel/crash_dump.c: add kerneldoc 2005-07-27 16:26:06 -07:00
dma.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
exec_domain.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
exit.c [PATCH] RCU signal handling 2006-01-08 20:13:40 -08:00
extable.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
fork.c [PATCH] RCU signal handling 2006-01-08 20:13:40 -08:00
futex.c [PATCH] FRV: Make futex code compilable on nommu [try #2] 2006-01-06 08:33:33 -08:00
intermodule.c [PATCH] introduce and use kzalloc 2005-09-07 16:57:45 -07:00
itimer.c [PATCH] itimer fixes 2005-07-27 16:25:51 -07:00
kallsyms.c [PATCH] fix missing includes 2005-10-30 17:37:32 -08:00
kexec.c [PATCH] mm: split page table lock 2005-10-29 21:40:42 -07:00
kfifo.c [PATCH] gfp flags annotations - part 1 2005-10-08 15:00:57 -07:00
kmod.c [PATCH] Keys: Get rid of warning in kmod.c if keys disabled 2005-10-30 17:37:23 -08:00
kprobes.c [PATCH] kprobes: increment kprobe missed count for multiprobes 2005-12-12 08:57:45 -08:00
ksysfs.c [PATCH] kobject_uevent CONFIG_NET=n fix 2006-01-04 16:18:08 -08:00
kthread.c [PATCH] Add kthread_stop_sem() 2005-10-30 17:37:17 -08:00
module.c [PATCH] kernel/module.c: removed dead code 2006-01-06 08:33:59 -08:00
panic.c [PATCH] s390: cleanup Kconfig 2006-01-06 08:33:53 -08:00
params.c [PATCH] kernel/params.c: fix sysfs access with CONFIG_MODULES=n 2005-12-20 10:31:33 -08:00
pid.c [PATCH] RCU signal handling 2006-01-08 20:13:40 -08:00
posix-cpu-timers.c [PATCH] Fix posix-cpu-timers sched_time accumulation 2006-01-06 20:23:04 -08:00
posix-timers.c [PATCH] timespec: normalize off by one errors 2005-11-13 18:14:17 -08:00
printk.c [PATCH] Fix crash in unregister_console() 2005-11-23 16:08:39 -08:00
profile.c [PATCH] mostly_read data section 2005-07-07 18:23:46 -07:00
ptrace.c [PATCH] Fix crash when ptrace poking hugepage areas 2005-11-29 19:47:03 -08:00
rcupdate.c [PATCH] RCU signal handling 2006-01-08 20:13:40 -08:00
rcutorture.c [PATCH] Fix bug in RCU torture test 2005-12-12 08:57:42 -08:00
resource.c [PATCH] introduce and use kzalloc 2005-09-07 16:57:45 -07:00
sched.c [PATCH] RCU signal handling 2006-01-08 20:13:40 -08:00
seccomp.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
signal.c [PATCH] Simpler signal-exit concurrency handling 2006-01-08 20:13:40 -08:00
softirq.c [PATCH] cpu hoptlug: avoid usage of smp_processor_id() in preemptible code 2005-11-07 07:53:29 -08:00
softlockup.c [PATCH] quieten softlockup at boot 2005-11-09 07:55:50 -08:00
spinlock.c [PATCH] spinlock consolidation 2005-09-10 10:06:21 -07:00
stop_machine.c [PATCH] stop_machine() vs. synchronous IPI send deadlock 2005-11-13 18:14:16 -08:00
sys.c [PATCH] kprobes: no probes on critical path 2005-12-12 08:57:45 -08:00
sys_ni.c [PATCH] Swap Migration V5: sys_migrate_pages interface 2006-01-08 20:12:42 -08:00
sysctl.c [PATCH] Make high and batch sizes of per_cpu_pagelists configurable 2006-01-08 20:12:40 -08:00
time.c [PATCH] Add getnstimestamp function 2005-12-12 08:57:42 -08:00
timer.c [PATCH] jiffies_64 cleanup 2005-10-30 17:37:25 -08:00
uid16.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
user.c [PATCH] inotify 2005-07-12 20:38:38 -07:00
wait.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
workqueue.c [PATCH] add schedule_on_each_cpu() 2006-01-08 20:12:40 -08:00