1
0
Fork 0
Commit Graph

1098 Commits (9aa55fbd01779a0b476d87cd9b5170fd5bebab1d)

Author SHA1 Message Date
Ingo Molnar 9aa55fbd01 Merge branches 'sched/domains' and 'sched/clock' into sched/core
Merge reason: both topics are ready now, and we want to merge dependent
              changes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 10:08:47 +02:00
Arjan van de Ven 8f0dfc34e9 sched: Provide iowait counters
For counting how long an application has been waiting for
(disk) IO, there currently is only the HZ sample driven
information available, while for all other counters in this
class, a high resolution version is available via
CONFIG_SCHEDSTATS.

In order to make an improved bootchart tool possible, we also
need a higher resolution version of the iowait time.

This patch below adds this scheduler statistic to the kernel.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4A64B813.1080506@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-02 08:44:08 +02:00
Anirban Sinha 84e9dabf6e sched: Rename init_cfs_rq => init_tg_cfs_rq
... so that it does not share a common name with a function
within the same scope.

Signed-off-by: Anirban Sinha <asinha@zeugmasystems.com>
LKML-Reference: <DDFD17CC94A9BD49A82147DDF7D545C501EA98A6@exchange.ZeugmaSystems.local>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-29 15:51:16 +02:00
Peter Zijlstra 34d76c4155 sched: Fix division by zero - really
When re-computing the shares for each task group's cpu
representation we need the ratio of weight on each cpu vs the
total weight of the sched domain.

Since load-balancing is loosely (read not) synchronized, the
weight of individual cpus can change between doing the sum and
calculating the ratio.

The previous patch dealt with only one of the race scenarios,
this patch side steps them all by saving a snapshot of all the
individual cpu weights, thereby always working on a consistent
set.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: torvalds@linux-foundation.org
Cc: jes@sgi.com
Cc: jens.axboe@oracle.com
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1251371336.18584.77.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-28 08:26:49 +02:00
Peter Zijlstra a8af7246c1 sched: Avoid division by zero
Patch a5004278f0 (sched: Fix
cgroup smp fairness) introduced the possibility of a
divide-by-zero because load-balancing is not synchronized
between sched_domains.

This can cause the state of cpus to change between the first
and second loop over the sched domain in tg_shares_up().

Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1250855934.7538.30.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-21 14:15:10 +02:00
Hiroshi Shimamoto cde7e5ca4e sched: Use for_each_class macro in move_one_task()
Replace for loop with the macro for_each_class to cleanup.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
LKML-Reference: <4A8A277D.4090304@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-20 13:00:30 +02:00
Andreas Herrmann 294b0c9619 sched: Consolidate definition of variable sd in __build_sched_domains
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818110229.GM29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:45 +02:00
Andreas Herrmann 0601a88d8f sched: Separate out build of NUMA sched groups from __build_sched_domains
... to further strip down __build_sched_domains().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818110111.GL29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:44 +02:00
Andreas Herrmann de616e36c7 sched: Separate out build of ALLNODES sched groups from __build_sched_domains
For the sake of completeness.
Now all calls to init_sched_build_groups() are contained in
build_sched_groups().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818110013.GK29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:44 +02:00
Andreas Herrmann 86548096f2 sched: Separate out build of CPU sched groups from __build_sched_domains
... to further strip down __build_sched_domains().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105928.GJ29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:43 +02:00
Andreas Herrmann a2af04cdbb sched: Separate out build of MC sched groups from __build_sched_domains
... to further strip down __build_sched_domains().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105838.GI29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:43 +02:00
Andreas Herrmann 0e8e85c941 sched: Separate out build of SMT sched groups from __build_sched_domains
... to further strip down __build_sched_domains().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105751.GH29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:42 +02:00
Andreas Herrmann d817353555 sched: Separate out build of SMT sched domain from __build_sched_domains
... to further strip down __build_sched_domains().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105703.GG29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:42 +02:00
Andreas Herrmann 410c408108 sched: Separate out build of MC sched domain from __build_sched_domains
... to further strip down __build_sched_domains().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105614.GF29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:41 +02:00
Andreas Herrmann 87cce6622c sched: Separate out build of CPU sched domain from __build_sched_domains
... to further strip down __build_sched_domains().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105455.GE29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:40 +02:00
Andreas Herrmann 7f4588f3aa sched: Separate out build of NUMA sched domain from __build_sched_domains
... to further strip down __build_sched_domains().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105406.GD29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:40 +02:00
Andreas Herrmann 2109b99ee1 sched: Separate out allocation/free/goto-hell from __build_sched_domains
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105300.GC29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:39 +02:00
Andreas Herrmann 49a02c514d sched: Use structure to store local data in __build_sched_domains
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105152.GB29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:39 +02:00
Peter Zijlstra 693525e3be sched: Ensure the migration task doesn't go away during use
Like sched_migrate_task(), set_cpus_allowed_ptr() should hold
onto the migration thread too.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 14:26:15 +02:00
Gregory Haskins 00aec93d10 sched: Fully integrate cpus_active_map and root-domain code
Reflect "active" cpus in the rq->rd->online field, instead of
the online_map.

The motivation is that things that use the root-domain code
(such as cpupri) only care about cpus classified as "active"
anyway. By synchronizing the root-domain state with the active
map, we allow several optimizations.

For instance, we can remove an extra cpumask_and from the
scheduler hotpath by utilizing rq->rd->online (since it is now
a cached version of cpu_active_map & rq->rd->span).

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090730145723.25226.24493.stgit@dev.haskins.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 14:26:12 +02:00
Gregory Haskins 3f029d3c6d sched: Enhance the pre/post scheduling logic
We currently have an explicit "needs_post" vtable method which
returns a stack variable for whether we should later run
post-schedule.  This leads to an awkward exchange of the
variable as it bubbles back up out of the context switch. Peter
Zijlstra observed that this information could be stored in the
run-queue itself instead of handled on the stack.

Therefore, we revert to the method of having context_switch
return void, and update an internal rq->post_schedule variable
when we require further processing.

In addition, we fix a race condition where we try to access
current->sched_class without holding the rq->lock.  This is
technically racy, as the sched-class could change out from
under us.  Instead, we reference the per-rq post_schedule
variable with the runqueue unlocked, but with preemption
disabled to see if we need to reacquire the rq->lock.

Finally, we clean the code up slightly by removing the #ifdef
CONFIG_SMP conditionals from the schedule() call, and implement
some inline helper functions instead.

This patch passes checkpatch, and rt-migrate.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090729150422.17691.55590.stgit@dev.haskins.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 14:26:10 +02:00
Steven Rostedt da19ab5103 sched: Check for pushing rt tasks after all scheduling
The current method for pushing RT tasks after scheduling only
happens after a context switch. But we found cases where a task
is set up on a run queue to be pushed but the push never
happens because the schedule chooses the same task.

This bug was found with the help of Gregory Haskins and the use
of ftrace (trace_printk). It tooks several days for both of us
analyzing the code and the trace output to find this.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090729042526.205923666@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 14:26:08 +02:00
Peter Zijlstra e709715915 sched: Optimize unused cgroup configuration
When cgroup group scheduling is built in, skip some code paths
if we don't have any (but the root) cgroups configured.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 14:26:07 +02:00
Peter Zijlstra a5004278f0 sched: Fix cgroup smp fairness
Commit ec4e0e2fe0 ("fix
inconsistency when redistribute per-cpu tg->cfs_rq shares")
broke cgroup smp fairness.

In order to avoid starvation of newly placed tasks, we never
quite set the share of an empty cpu group-task to 0, but
instead we set it as if there's a single NICE-0 task present.

If however we actually set this in cfs_rq[cpu]->shares, that
means the total shares for that group will be slightly inflated
every time we balance, causing the observed unfairness.

Fix this by setting cfs_rq[cpu]->shares to 0 but actually
setting the effective weight of the related se to the inflated
number.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1248696557.6987.1615.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 14:26:06 +02:00
Ingo Molnar 8e9ed8b024 Merge branch 'sched/urgent' into sched/core
Merge reason: avoid upcoming patch conflict.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 14:23:57 +02:00
Thomas Gleixner a004cd4218 sched: Fix return value of migration_init()
migration_init() returns the return value of the hotplug notifier. In
the success case this is NOTIFY_OK which is 1. initcall_debug
evaluates that as an error code because init calls are expected to
return 0 on success.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-07-24 09:41:35 +02:00
Frederic Weisbecker 613afbf832 sched: Pull up the might_sleep() check into cond_resched()
might_sleep() is called late-ish in cond_resched(), after the
need_resched()/preempt enabled/system running tests are
checked.

It's better to check the sleeps while atomic earlier and not
depend on some environment datas that reduce the chances to
detect a problem.

Also define cond_resched_*() helpers as macros, so that the
FILE/LINE reported in the sleeping while atomic warning
displays the real origin and not sched.h

Changes in v2:

 - Call __might_sleep() directly instead of might_sleep() which
   may call cond_resched()

 - Turn cond_resched() into a macro so that the file:line
   couple reported refers to the caller of cond_resched() and
   not __cond_resched() itself.

Changes in v3:

 - Also propagate this __might_sleep() pull up to
   cond_resched_lock() and cond_resched_softirq()

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1247725694-6082-6-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18 15:51:44 +02:00
Frederic Weisbecker e4aafea2d4 sched: Add a preempt count base offset to __might_sleep()
Add a preempt count base offset to compare against the current
preempt level count. It prepares to pull up the might_sleep
check from cond_resched() to cond_resched_lock() and
cond_resched_bh().

For these two helpers, we need to respectively ensure that once
we'll unlock the given spinlock / reenable local softirqs, we
will reach a sleepable state.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
[ Move and rename preempt_count_equals() ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1247725694-6082-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18 15:51:42 +02:00
Frederic Weisbecker e09758fae8 sched: Cover the CONFIG_DEBUG_SPINLOCK_SLEEP off-case for __might_sleep()
Cover the off case for __might_sleep(), so that we avoid
#ifdefs in files that make use of it. Especially, this prepares
for the __might_sleep() pull up on cond_resched().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1247725694-6082-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18 15:51:41 +02:00
Frederic Weisbecker 4b2155678d sched: Remove obsolete comment in __cond_resched()
Remove the outdated comment from __cond_resched() related to
the now removed Big Kernel Semaphore.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1247725694-6082-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18 15:51:39 +02:00
Frederic Weisbecker e7aaaa6934 sched: Drop the need_resched() loop from cond_resched()
The schedule() function is a loop that reschedules the current
task while the TIF_NEED_RESCHED flag is set:

void schedule(void)
{
need_resched:
	/* schedule code */
	if (need_resched())
		goto need_resched;
}

And cond_resched() repeat this loop:

do {
	add_preempt_count(PREEMPT_ACTIVE);
	schedule();
	sub_preempt_count(PREEMPT_ACTIVE);
} while(need_resched());

This loop is needless because schedule() already did the check
and nothing can set TIF_NEED_RESCHED between schedule() exit
and the loop check in need_resched().

Then remove this needless loop.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1247725694-6082-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18 15:51:38 +02:00
Ingo Molnar 5304d5fc74 Merge branch 'linus' into sched/core
Merge reason: branch had an old upstream base (-rc1-ish), but also
              merge to avoid a conflict.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18 15:50:40 +02:00
Thomas Gleixner a468d38934 sched: fix load average accounting vs. cpu hotplug
The new load average code clears rq->calc_load_active on
CPU_ONLINE. That's wrong as the new onlined CPU might have got a
scheduler tick already and accounted the delta to the stale value of
the time we offlined the CPU.

Clear the value when we cleanup the dead CPU instead. 

Also move the update of the calc_load_update time for the newly online
CPU to CPU_UP_PREPARE to avoid that the CPU plays catch up with the
stale update time value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-07-18 14:19:52 +02:00
Linus Torvalds 4b0a84043e Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-sched
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-sched:
  sched: Fix bug in SCHED_IDLE interaction with group scheduling
  sched: Fix rt_rq->pushable_tasks initialization in init_rt_rq()
  sched: Reset sched stats on fork()
  sched_rt: Fix overload bug on rt group scheduling
  sched: Documentation/sched-rt-group: Fix style issues & bump version
2009-07-16 10:18:29 -07:00
Peter Zijlstra d86ee4809d sched: optimize cond_resched()
Optimize cond_resched() by removing one conditional.

Currently cond_resched() checks system_state ==
SYSTEM_RUNNING in order to avoid scheduling before the
scheduler is running.

We can however, as per suggestion of Matt, use
PREEMPT_ACTIVE to accomplish that very same.

Suggested-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-10 14:24:05 -07:00
Fabio Checconi c20b08e398 sched: Fix rt_rq->pushable_tasks initialization in init_rt_rq()
init_rt_rq() initializes only rq->rt.pushable_tasks, and not the
pushable_tasks field of the passed rt_rq.  The plist is not used
uninitialized since the only pushable_tasks plists used are the
ones of root rt_rqs; anyway reinitializing the list on every group
creation corrupts the root plist, losing its previous contents.

Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090615185638.GK21741@gandalf.sssup.it>
CC: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-10 10:43:30 +02:00
Lucas De Marchi 7793527b90 sched: Reset sched stats on fork()
The sched_stat fields are currently not reset upon fork.
Ingo's recent commit 6c594c21fc
did reset nr_migrations, but it didn't reset any of the
others.

This patch resets all sched_stat fields on fork.

Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <193b0f820907090457s7a3662f4gcdecdc22fcae857b@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-10 10:43:29 +02:00
Peter Zijlstra a1ba4d8ba9 sched_rt: Fix overload bug on rt group scheduling
Fixes an easily triggerable BUG() when setting process affinities.

Make sure to count the number of migratable tasks in the same place:
the root rt_rq. Otherwise the number doesn't make sense and we'll hit
the BUG in set_cpus_allowed_rt().

Also, make sure we only count tasks, not groups (this is probably
already taken care of by the fact that rt_se->nr_cpus_allowed will be 0
for groups, but be more explicit)

Tested-by: Thomas Gleixner <tglx@linutronix.de>
CC: stable@kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Gregory Haskins <ghaskins@novell.com>
LKML-Reference: <1247067476.9777.57.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-10 10:43:29 +02:00
Hitoshi Mitake 54d35f29f4 sched: Hide runqueues from direct reference at source code level for __raw_get_cpu_var()
Hide __raw_get_cpu_var() as well - thus all the direct
references to runqueues will abstracted out.

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
LKML-Reference: <20090629.144457.886429910353660979.mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-29 09:19:27 +02:00
Ingo Molnar 348b346b23 Merge branch 'linus' into sched/core
Merge reason: we will merge a dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-29 09:16:25 +02:00
Linus Torvalds 12e24f34cb Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
  perfcounter: Handle some IO return values
  perf_counter: Push perf_sample_data through the swcounter code
  perf_counter tools: Define and use our own u64, s64 etc. definitions
  perf_counter: Close race in perf_lock_task_context()
  perf_counter, x86: Improve interactions with fast-gup
  perf_counter: Simplify and fix task migration counting
  perf_counter tools: Add a data file header
  perf_counter: Update userspace callchain sampling uses
  perf_counter: Make callchain samples extensible
  perf report: Filter to parent set by default
  perf_counter tools: Handle lost events
  perf_counter: Add event overlow handling
  fs: Provide empty .set_page_dirty() aop for anon inodes
  perf_counter: tools: Makefile tweaks for 64-bit powerpc
  perf_counter: powerpc: Add processor back-end for MPC7450 family
  perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
  perf_counter: powerpc: Change how processor-specific back-ends get selected
  perf_counter: powerpc: Use unsigned long for register and constraint values
  perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
  perf_counter tools: Add and use isprint()
  ...
2009-06-20 11:29:32 -07:00
Linus Torvalds 1eb51c33b2 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix out of scope variable access in sched_slice()
  sched: Hide runqueues from direct refer at source code level
  sched: Remove unneeded __ref tag
  sched, x86: Fix cpufreq + sched_clock() TSC scaling
2009-06-20 10:57:40 -07:00
Peter Zijlstra e5289d4a18 perf_counter: Simplify and fix task migration counting
The task migrations counter was causing rare and hard to decypher
memory corruptions under load. After a day of debugging and bisection
we found that the problem was introduced with:

  3f731ca: perf_counter: Fix cpu migration counter

Turning them off fixes the crashes. Incidentally, the whole
perf_counter_task_migration() logic can be done simpler as well,
by injecting a proper sw-counter event.

This cleanup also fixed the crashes. The precise failure mode is
not completely clear yet, but we are clearly not unhappy about
having a fix ;-)

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-19 13:43:12 +02:00
Oleg Nesterov 371cbb387e kthreads: simplify migration_thread() exit path
Now that kthread_stop() can be used even if the task has already exited,
we can kill the "wait_to_die:" loop in migration_thread().  But we must
pin rq->migration_thread after creation.

Actually, I don't think CPU_UP_CANCELED or CPU_DEAD should wait for
->migration_thread exit.  Perhaps we can simplify this code a bit more.
migration_call() can set ->should_stop and forget about this thread.  But
we need a new helper in kthred.c for that.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Vitaliy Gusev <vgusev@openvz.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:54 -07:00
Mike Galbraith 6c697bdf08 sched: Add SCHED_RESET_ON_FORK functionality for nice < 0 tasks
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Lennart Poettering <mzxreary@0pointer.de>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1245228482.27326.1.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 18:34:18 +02:00
Mike Galbraith b9dc29e72f sched: Clean up SCHED_RESET_ON_FORK
Make SCHED_RESET_ON_FORK sched_fork() bits a self-contained unlikely code path.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Lennart Poettering <mzxreary@0pointer.de>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1245228361.18329.6.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 18:34:17 +02:00
Li Zefan fd5e1b5dba sched: Remove unneeded __ref tag
Those two functions no longer call alloc_bootmmem_cpumask_var(),
so no need to tag them with __init_refok.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <4A35DD5B.9050106@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 16:08:04 +02:00
Linus Torvalds 19035e5b5d Merge branch 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  timers: Logic to move non pinned timers
  timers: /proc/sys sysctl hook to enable timer migration
  timers: Identifying the existing pinned timers
  timers: Framework for identifying pinned timers
  timers: allow deferrable timers for intervals tv2-tv5 to be deferred

Fix up conflicts in kernel/sched.c and kernel/timer.c manually
2009-06-15 10:06:19 -07:00
Lennart Poettering ca94c44253 sched: Introduce SCHED_RESET_ON_FORK scheduling policy flag
This patch introduces a new flag SCHED_RESET_ON_FORK which can be passed
to the kernel via sched_setscheduler(), ORed in the policy parameter. If
set this will make sure that when the process forks a) the scheduling
priority is reset to DEFAULT_PRIO if it was higher and b) the scheduling
policy is reset to SCHED_NORMAL if it was either SCHED_FIFO or SCHED_RR.

Why have this?

Currently, if a process is real-time scheduled this will 'leak' to all
its child processes. For security reasons it is often (always?) a good
idea to make sure that if a process acquires RT scheduling this is
confined to this process and only this process. More specifically this
makes the per-process resource limit RLIMIT_RTTIME useful for security
purposes, because it makes it impossible to use a fork bomb to
circumvent the per-process RLIMIT_RTTIME accounting.

This feature is also useful for tools like 'renice' which can then
change the nice level of a process without having this spill to all its
child processes.

Why expose this via sched_setscheduler() and not other syscalls such as
prctl() or sched_setparam()?

prctl() does not take a pid parameter. Due to that it would be
impossible to modify this flag for other processes than the current one.

The struct passed to sched_setparam() can unfortunately not be extended
without breaking compatibility, since sched_setparam() lacks a size
parameter.

How to use this from userspace? In your RT program simply replace this:

  sched_setscheduler(pid, SCHED_FIFO, &param);

by this:

  sched_setscheduler(pid, SCHED_FIFO|SCHED_RESET_ON_FORK, &param);

Signed-off-by: Lennart Poettering <lennart@poettering.net>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090615152714.GA29092@tango.0pointer.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-15 17:31:59 +02:00
Rusty Russell b43e352139 sched: export kick_process
lguest needs kick_process: wake_up_process() does nothing if a process
is running, which isn't sufficient (we need it in the kernel).

And lguest support is usually modular.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
2009-06-12 22:27:01 +09:30