1
0
Fork 0
Commit Graph

12795 Commits (62f6536a630affe3176deb48554d27ee58b65077)

Author SHA1 Message Date
Linus Torvalds 83c2f912b4 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
  perf tools: Fix compile error on x86_64 Ubuntu
  perf report: Fix --stdio output alignment when --showcpuutilization used
  perf annotate: Get rid of field_sep check
  perf annotate: Fix usage string
  perf kmem: Fix a memory leak
  perf kmem: Add missing closedir() calls
  perf top: Add error message for EMFILE
  perf test: Change type of '-v' option to INCR
  perf script: Add missing closedir() calls
  tracing: Fix compile error when static ftrace is enabled
  recordmcount: Fix handling of elf64 big-endian objects.
  perf tools: Add const.h to MANIFEST to make perf-tar-src-pkg work again
  perf tools: Add support for guest/host-only profiling
  perf kvm: Do guest-only counting by default
  perf top: Don't update total_period on process_sample
  perf hists: Stop using 'self' for struct hist_entry
  perf hists: Rename total_session to total_period
  x86: Add counter when debug stack is used with interrupts enabled
  x86: Allow NMIs to hit breakpoints in i386
  x86: Keep current stack in NMI breakpoints
  ...
2012-01-15 11:26:35 -08:00
Linus Torvalds c49c41a413 Merge branch 'for-linus' of git://selinuxproject.org/~jmorris/linux-security
* 'for-linus' of git://selinuxproject.org/~jmorris/linux-security:
  capabilities: remove __cap_full_set definition
  security: remove the security_netlink_recv hook as it is equivalent to capable()
  ptrace: do not audit capability check when outputing /proc/pid/stat
  capabilities: remove task_ns_* functions
  capabitlies: ns_capable can use the cap helpers rather than lsm call
  capabilities: style only - move capable below ns_capable
  capabilites: introduce new has_ns_capabilities_noaudit
  capabilities: call has_ns_capability from has_capability
  capabilities: remove all _real_ interfaces
  capabilities: introduce security_capable_noaudit
  capabilities: reverse arguments to security_capable
  capabilities: remove the task from capable LSM hook entirely
  selinux: sparse fix: fix several warnings in the security server cod
  selinux: sparse fix: fix warnings in netlink code
  selinux: sparse fix: eliminate warnings for selinuxfs
  selinux: sparse fix: declare selinux_disable() in security.h
  selinux: sparse fix: move selinux_complete_init
  selinux: sparse fix: make selinux_secmark_refcount static
  SELinux: Fix RCU deref check warning in sel_netport_insert()

Manually fix up a semantic mis-merge wrt security_netlink_recv():

 - the interface was removed in commit fd77846152 ("security: remove
   the security_netlink_recv hook as it is equivalent to capable()")

 - a new user of it appeared in commit a38f7907b9 ("crypto: Add
   userspace configuration API")

causing no automatic merge conflict, but Eric Paris pointed out the
issue.
2012-01-14 18:36:33 -08:00
Linus Torvalds 0a80939b3e Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPD2aFAAoJENkgDmzRrbjxNzsQAIeYbbrXYLjr6kQzUSngj/eC
 FzjaTEfYTQIeuQCFJHcHthyc5lXV4sQbo3jOezW+Bp5yuDJL2aWIHesSfWZe7imu
 zQdM4VshOYdAmUR9Q0AW5zhB8Smbs7/AyABiF2jm4p0ZPOuyMDSlei9sjvE9Vjvt
 B7g5ht7L6kz0JbDnwwy0u5gs+tEitwpXYId9Y4ysZIBzIbL0qkPX8veOddGTMy0N
 8xhWXaKtufpjvxFD2ORLDsw3AkoF1xXSNuFd/5nzCNpbeE7TW931jfkPoqJumuAO
 7GLxcU9kKYl+IICobC6wBtsj/RrB7w+cBXMvPGwdBliam1qaRhUcJZi5FLM/Ha5d
 2A9QDYNUpoXiO8JbPXrV9Z+Y0+Co8RilsQj7R/rjZh6AbbYCWt9nxzx2Svl/RfTr
 xfiimHuB2P3rHjOvpCXULwOOuE5c8MzPuWncpdjiD3uGXOY/aY+X1m+if/quJw9D
 grPlKL0+YiRakEYUeGG4M77KCqyKFZaF7L7UQPbqfZcj8V/9AW3/7U5I/B9RlAjs
 idsr4fcf5s0N+oKUyTCW1ncpUDQNiwbU2NyJQqeu1ZxaRGj72AgyvsaNeyIPDyK+
 f6x95Bi7i8KLjXc9Z1KvJwh2Nxt25gNUiTYVha/9H2NpJGd1cfI15kTOGXrgddVv
 1pvuGcJDZwYiwfiXr3FL
 =HHrh
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://github.com/rustyrussell/linux

Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999  BFCB D920 0E6C D1AD B8F1

* tag 'for-linus' of git://github.com/rustyrussell/linux:
  module_param: check that bool parameters really are bool.
  intelfbdrv.c: bailearly is an int module_param
  paride/pcd: fix bool verbose module parameter.
  module_param: make bool parameters really bool (drivers & misc)
  module_param: make bool parameters really bool (arch)
  module_param: make bool parameters really bool (core code)
  kernel/async: remove redundant declaration.
  printk: fix unnecessary module_param_name.
  lirc_parallel: fix module parameter description.
  module_param: avoid bool abuse, add bint for special cases.
  module_param: check type correctness for module_param_array
  modpost: use linker section to generate table.
  modpost: use a table rather than a giant if/else statement.
  modules: sysfs - export: taint, coresize, initsize
  kernel/params: replace DEBUGP with pr_debug
  module: replace DEBUGP with pr_debug
  module: struct module_ref should contains long fields
  module: Fix performance regression on modules with large symbol tables
  module: Add comments describing how the "strmap" logic works

Fix up conflicts in scripts/mod/file2alias.c due to the new linker-
generated table approach to adding __mod_*_device_table entries.  The
ARM sa11x0 mcp bus needed to be converted to that too.
2012-01-14 12:32:16 -08:00
Barry Song ee34a37049 PM / Hibernate: Drop the check of swap space size for compressed image
For compressed image, the space required is not known until
we finish compressing and writing all pages.
This patch drops the check, and if swap space is not enough
finally, system can still restore to normal after writing
swap fails for compressed images.

Signed-off-by: Barry Song <Baohua.Song@csr.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-01-14 00:41:37 +01:00
Rafael J. Wysocki dae5cbc244 PM: Make sysrq-o be available for CONFIG_PM unset
After commit 1eb208aea3, "PM: Make
CONFIG_PM depend on (CONFIG_PM_SLEEP || CONFIG_PM_RUNTIME)", the
files under kernel/power are not built unless CONFIG_PM_SLEEP or
CONFIG_PM_RUNTIME is set.  In particular, this causes
kernel/power/poweroff.c to be omitted, even though it should be
compiled, because CONFIG_MAGIC_SYSRQ is set.

Fix the problem by causing kernel/power/Makefile to be processed
for CONFIG_PM unset too.

Reported-and-tested-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-01-14 00:33:03 +01:00
Cyrill Gorcunov 028ee4be34 c/r: prctl: add PR_SET_MM codes to set up mm_struct entries
When we restore a task we need to set up text, data and data heap sizes
from userspace to the values a task had at checkpoint time.  This patch
adds auxilary prctl codes for that.

While most of them have a statistical nature (their values are involved
into calculation of /proc/<pid>/statm output) the start_brk and brk values
are used to compute an allowed size of program data segment expansion.
Which means an arbitrary changes of this values might be dangerous
operation.  So to restrict access the following requirements applied to
prctl calls:

 - The process has to have CAP_SYS_ADMIN capability granted.
 - For all opcodes except start_brk/brk members an appropriate
   VMA area must exist and should fit certain VMA flags,
   such as:
   - code segment must be executable but not writable;
   - data segment must not be executable.

start_brk/brk values must not intersect with data segment and must not
exceed RLIMIT_DATA resource limit.

Still the main guard is CAP_SYS_ADMIN capability check.

Note the kernel should be compiled with CONFIG_CHECKPOINT_RESTORE support
otherwise these prctl calls will return -EINVAL.

[akpm@linux-foundation.org: cache current->mm in a local, saving 200 bytes text]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:13 -08:00
Andi Kleen 6e6f0a1f0f panic: don't print redundant backtraces on oops
When an oops causes a panic and panic prints another backtrace it's pretty
common to have the original oops data be scrolled away on a 80x50 screen.

The second backtrace is quite redundant and not needed anyways.

So don't print the panic backtrace when oops_in_progress is true.

[akpm@linux-foundation.org: add comment]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:11 -08:00
Pavel Emelyanov b8f566b04d sysctl: add the kernel.ns_last_pid control
The sysctl works on the current task's pid namespace, getting and setting
its last_pid field.

Writing is allowed for CAP_SYS_ADMIN-capable tasks thus making it possible
to create a task with desired pid value.  This ability is required badly
for the checkpoint/restore in userspace.

This approach suits all the parties for now.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:11 -08:00
Michael Holzheu 93e13a360b kdump: fix crash_kexec()/smp_send_stop() race in panic()
When two CPUs call panic at the same time there is a possible race
condition that can stop kdump.  The first CPU calls crash_kexec() and the
second CPU calls smp_send_stop() in panic() before crash_kexec() finished
on the first CPU.  So the second CPU stops the first CPU and therefore
kdump fails:

1st CPU:
  panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump

2nd CPU:
  panic()->crash_kexec()->kexec_mutex already held by 1st CPU
       ->smp_send_stop()-> stop 1st CPU (stop kdump)

This patch fixes the problem by introducing a spinlock in panic that
allows only one CPU to process crash_kexec() and the subsequent panic
code.

All other CPUs call the weak function panic_smp_self_stop() that stops the
CPU itself.  This function can be overloaded by architecture code.  For
example "tile" can use their lower-power "nap" instruction for that.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:11 -08:00
Michael Holzheu bec013c40b kdump: crashk_res init check for /sys/kernel/kexec_crash_size
Currently it is possible to set the crash_size via the sysfs
/sys/kernel/kexec_crash_size even if no crash kernel memory has been
defined with the "crashkernel" parameter.  In this case "crashk_res" is
not initialized and crashk_res.start = crashk_res.end = 0.  Unfortunately
resource_size(&crashk_res) returns 1 in this case.  This breaks the s390
implementation of crash_(un)map_reserved_pages().

To fix the problem the correct "old_size" is now calculated in
crash_shrink_memory().  "old_size is set to "0" if crashk_res is not
initialized.  With this change crash_shrink_memory() will do nothing, when
"crashk_res" is not initialized.  It will return "0" for "echo 0 >
/sys/kernel/kexec_crash_size" and -EINVAL for "echo [not zero] >
/sys/kernel/kexec_crash_size".

In addition to that this patch also simplifies the "ret = -EINVAL" vs.
"ret = 0" logic as suggested by Simon Horman.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Dave Young <dyoung@redhat.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:11 -08:00
Michael Holzheu 6480e5a092 kdump: add missing RAM resource in crash_shrink_memory()
When shrinking crashkernel memory using /sys/kernel/kexec_crash_size for
the newly added memory no RAM resource is created at the moment.

Example:

  $ cat /proc/iomem
  00000000-bfffffff : System RAM
    00000000-005b7ac3 : Kernel code
    005b7ac4-009743bf : Kernel data
    009bb000-00a85c33 : Kernel bss
  c0000000-cfffffff : Crash kernel
  d0000000-ffffffff : System RAM

  $ echo 0 > /sys/kernel/kexec_crash_size
  $ cat /proc/iomem
  00000000-bfffffff : System RAM
    00000000-005b7ac3 : Kernel code
    005b7ac4-009743bf : Kernel data
    009bb000-00a85c33 : Kernel bss
                                   <<-- here is System RAM missing
  d0000000-ffffffff : System RAM

One result of this bug is that the memory chunk can never be set offline
using memory hotplug.  With this patch I insert a new "System RAM"
resource for the released memory.  Then the upper example looks like the
following:

  $ echo 0 > /sys/kernel/kexec_crash_size
  $ cat /proc/iomem
  00000000-bfffffff : System RAM
    00000000-005b7ac3 : Kernel code
    005b7ac4-009743bf : Kernel data
    009bb000-00a85c33 : Kernel bss
  c0000000-cfffffff : System RAM   <<-- new rescoure
  d0000000-ffffffff : System RAM

And now I can set chunk c0000000-cfffffff offline.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:11 -08:00
WANG Cong a3dd332305 kexec: remove KMSG_DUMP_KEXEC
KMSG_DUMP_KEXEC is useless because we already save kernel messages inside
/proc/vmcore, and it is unsafe to allow modules to do other stuffs in a
crash dump scenario.

[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:11 -08:00
Joe Perches 9402c95f34 treewide: remove useless NORET_TYPE macro and uses
It's a very old and now unused prototype marking so just delete it.

Neaten panic pointer argument style to keep checkpatch quiet.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:03 -08:00
Stephen Boyd efeb156e72 kprobes: silence DEBUG_STRICT_USER_COPY_CHECKS=y warning
Enabling DEBUG_STRICT_USER_COPY_CHECKS causes the following warning:

  In file included from arch/x86/include/asm/uaccess.h:573,
                   from kernel/kprobes.c:55:
  In function 'copy_from_user',
      inlined from 'write_enabled_file_bool' at
      kernel/kprobes.c:2191:
  arch/x86/include/asm/uaccess_64.h:65:
  warning: call to 'copy_from_user_overflow' declared with attribute warning: copy_from_user() buffer size is not provably correct

presumably due to buf_size being signed causing GCC to fail to see that
buf_size can't become negative.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:02 -08:00
Rusty Russell 2329abfa34 module_param: make bool parameters really bool (core code)
module_param(bool) used to counter-intuitively take an int.  In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option.  For this version
it'll simply give a warning, but it'll break next kernel version.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13 09:32:18 +10:30
Rusty Russell 6d6a55ec08 kernel/async: remove redundant declaration.
It's in linux/init.h, and I'm about to change it to a bool.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13 09:32:18 +10:30
Rusty Russell 29d4d6df10 printk: fix unnecessary module_param_name.
You don't need module_param_name if the name is the same!

Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13 09:32:17 +10:30
Rusty Russell 69116f279a module_param: avoid bool abuse, add bint for special cases.
For historical reasons, we allow module_param(bool) to take an int (or
an unsigned int).  That's going away.

A few drivers really want an int: they set it to -1 and a parameter
will set it to 0 or 1.  This sucks: reading them from sysfs will give
'Y' for both -1 and 1, but if we change it to an int, then the users
might be broken (if they did "param" instead of "param=1").

Use a new 'bint' parser for them.

(ntfs has a different problem: it needs an int for debug_msgs because
it's also exposed via sysctl.)

Cc: Steve Glendinning <steve.glendinning@smsc.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux390@de.ibm.com
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: lm-sensors@lm-sensors.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-ntfs-dev@lists.sourceforge.net
Cc: alsa-devel@alsa-project.org
Acked-by: Takashi Iwai <tiwai@suse.de> (For the sound part)
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com> (For the hwmon driver)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13 09:32:17 +10:30
Kay Sievers cca3e70730 modules: sysfs - export: taint, coresize, initsize
Recent tools do not want to use /proc to retrieve module information. A few
values are currently missing from sysfs to replace the information available
in /proc/modules.

This adds /sys/module/*/{coresize,initsize,taint} attributes.

TAINT_PROPRIETARY_MODULE (P) and TAINT_OOT_MODULE (O) flags are both always
shown now, and do no longer exclude each other, also in /proc/modules.

Replace the open-coded sysfs attribute initializers with the __ATTR() macro.

Add the new attributes to Documentation/ABI.

Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13 09:32:15 +10:30
Jim Cromie 8487bfd954 kernel/params: replace DEBUGP with pr_debug
Use more flexible pr_debug.  This allows:

  echo "module params +p" > /dbg/dynamic_debug/control

to turn on debug messages when needed.

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13 09:32:15 +10:30
Jim Cromie 5e12416927 module: replace DEBUGP with pr_debug
Use more flexible pr_debug.  This allows:

  echo "module module +p" > /dbg/dynamic_debug/control

to turn on debug messages when needed.

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13 09:32:15 +10:30
Eric Dumazet bd77c04772 module: struct module_ref should contains long fields
module_ref contains two "unsigned int" fields.

Thats now too small, since some machines can open more than 2^32 files.

Check commit 518de9b39e (fs: allow for more than 2^31 files) for
reference.

We can add an aligned(2 * sizeof(unsigned long)) attribute to force
alloc_percpu() allocating module_ref areas in single cache lines.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Tejun Heo <tj@kernel.org>
CC: Robin Holt <holt@sgi.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13 09:32:14 +10:30
Kevin Cernekee 48fd11880b module: Fix performance regression on modules with large symbol tables
Looking at /proc/kallsyms, one starts to ponder whether all of the extra
strtab-related complexity in module.c is worth the memory savings.

Instead of making the add_kallsyms() loop even more complex, I tried the
other route of deleting the strmap logic and naively copying each string
into core_strtab with no consideration for consolidating duplicates.

Performance on an "already exists" insmod of nvidia.ko (runs
add_kallsyms() but does not actually initialize the module):

	Original scheme: 1.230s
	With naive copying: 0.058s

Extra space used: 35k (of a 408k module).

Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <73defb5e4bca04a6431392cc341112b1@localhost>
2012-01-13 09:32:14 +10:30
Kevin Cernekee 70b1e9161e module: Add comments describing how the "strmap" logic works
Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13 09:32:13 +10:30
Linus Torvalds b8bf17d311 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix lockup by limiting load-balance retries on lock-break
  sched: Fix CONFIG_CGROUP_SCHED dependency
  sched: Remove empty #ifdefs
2012-01-11 22:52:48 -08:00
Linus Torvalds 541048a1d3 Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, reboot: Fix typo in nmi reboot path
  x86, NMI: Add to_cpumask() to silence compile warning
  x86, NMI: NMI selftest depends on the local apic
  x86: Add stack top margin for stack overflow checking
  x86, NMI: NMI-selftest should handle the UP case properly
  x86: Fix the 32-bit stackoverflow-debug build
  x86, NMI: Add knob to disable using NMI IPIs to stop cpus
  x86, NMI: Add NMI IPI selftest
  x86, reboot: Use NMI instead of REBOOT_VECTOR to stop cpus
  x86: Clean up the range of stack overflow checking
  x86: Panic on detection of stack overflow
  x86: Check stack overflow in detail
2012-01-11 19:13:04 -08:00
Peter Zijlstra bced76aeac sched: Fix lockup by limiting load-balance retries on lock-break
Eric and David reported dead machines and traced it to commit
a195f004 ("sched: Fix load-balance lock-breaking"), it turns out
there's still a scenario where we can end up re-trying forever.

Since there is no strict forward progress guarantee in the
load-balance iteration we can get stuck re-retrying the same
task-set over and over.

Creating a forward progress guarantee with the existing
structure is somewhat non-trivial, for now simply terminate the
retry loop after a few tries.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: David Ahern <dsahern@gmail.com>
[ logic cleanup as suggested by Eric ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1326297936.2442.157.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-11 17:15:12 +01:00
Linus Torvalds 001a541ea9 Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: move MIN_WRITEBACK_PAGES to fs-writeback.c
  writeback: balanced_rate cannot exceed write bandwidth
  writeback: do strict bdi dirty_exceeded
  writeback: avoid tiny dirty poll intervals
  writeback: max, min and target dirty pause time
  writeback: dirty ratelimit - think time compensation
  btrfs: fix dirtied pages accounting on sub-page writes
  writeback: fix dirtied pages accounting on redirty
  writeback: fix dirtied pages accounting on sub-page writes
  writeback: charge leaked page dirties to active tasks
  writeback: Include all dirty inodes in background writeback
2012-01-10 16:59:59 -08:00
Linus Torvalds 40ba587923 Merge branch 'akpm' (aka "Andrew's patch-bomb")
Andrew elucidates:
 - First installmeant of MM.  We have a HUGE number of MM patches this
   time.  It's crazy.
 - MAINTAINERS updates
 - backlight updates
 - leds
 - checkpatch updates
 - misc ELF stuff
 - rtc updates
 - reiserfs
 - procfs
 - some misc other bits

* akpm: (124 commits)
  user namespace: make signal.c respect user namespaces
  workqueue: make alloc_workqueue() take printf fmt and args for name
  procfs: add hidepid= and gid= mount options
  procfs: parse mount options
  procfs: introduce the /proc/<pid>/map_files/ directory
  procfs: make proc_get_link to use dentry instead of inode
  signal: add block_sigmask() for adding sigmask to current->blocked
  sparc: make SA_NOMASK a synonym of SA_NODEFER
  reiserfs: don't lock root inode searching
  reiserfs: don't lock journal_init()
  reiserfs: delay reiserfs lock until journal initialization
  reiserfs: delete comments referring to the BKL
  drivers/rtc/interface.c: fix alarm rollover when day or month is out-of-range
  drivers/rtc/rtc-twl.c: add DT support for RTC inside twl4030/twl6030
  drivers/rtc/: remove redundant spi driver bus initialization
  drivers/rtc/rtc-jz4740.c: make jz4740_rtc_driver static
  drivers/rtc/rtc-mc13xxx.c: make mc13xxx_rtc_idtable static
  rtc: convert drivers/rtc/* to use module_platform_driver()
  drivers/rtc/rtc-wm831x.c: convert to devm_kzalloc()
  drivers/rtc/rtc-wm831x.c: remove unused period IRQ handler
  ...
2012-01-10 16:42:48 -08:00
Serge E. Hallyn 6b550f9495 user namespace: make signal.c respect user namespaces
ipc/mqueue.c: for __SI_MESQ, convert the uid being sent to recipient's
user namespace. (new, thanks Oleg)

__send_signal: convert current's uid to the recipient's user namespace
for any siginfo which is not SI_FROMKERNEL (patch from Oleg, thanks
again :)

do_notify_parent and do_notify_parent_cldstop: map task's uid to parent's
user namespace

ptrace_signal maps parent's uid into current's user namespace before
including in signal to current.  IIUC Oleg has argued that this shouldn't
matter as the debugger will play with it, but it seems like not converting
the value currently being set is misleading.

Changelog:
Sep 20: Inspired by Oleg's suggestion, define map_cred_ns() helper to
	simplify callers and help make clear what we are translating
        (which uid into which namespace).  Passing the target task would
	make callers even easier to read, but we pass in user_ns because
	current_user_ns() != task_cred_xxx(current, user_ns).
Sep 20: As recommended by Oleg, also put task_pid_vnr() under rcu_read_lock
	in ptrace_signal().
Sep 23: In send_signal(), detect when (user) signal is coming from an
	ancestor or unrelated user namespace.  Pass that on to __send_signal,
	which sets si_uid to 0 or overflowuid if needed.
Oct 12: Base on Oleg's fixup_uid() patch.  On top of that, handle all
	SI_FROMKERNEL cases at callers, because we can't assume sender is
	current in those cases.
Nov 10: (mhelsley) rename fixup_uid to more meaningful usern_fixup_signal_uid
Nov 10: (akpm) make the !CONFIG_USER_NS case clearer

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
From: Serge Hallyn <serge.hallyn@canonical.com>
Subject: __send_signal: pass q->info, not info, to userns_fixup_signal_uid (v2)

Eric Biederman pointed out that passing info is a bug and could lead to a
NULL pointer deref to boot.

A collection of signal, securebits, filecaps, cap_bounds, and a few other
ltp tests passed with this kernel.

Changelog:
    Nov 18: previous patch missed a leading '&'

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
From: Dan Carpenter <dan.carpenter@oracle.com>
Subject: ipc/mqueue: lock() => unlock() typo

There was a double lock typo introduced in b085f4bd6b21 "user namespace:
make signal.c respect user namespaces"

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10 16:30:54 -08:00
Tejun Heo b196be89cd workqueue: make alloc_workqueue() take printf fmt and args for name
alloc_workqueue() currently expects the passed in @name pointer to remain
accessible.  This is inconvenient and a bit silly given that the whole wq
is being dynamically allocated.  This patch updates alloc_workqueue() and
friends to take printf format string instead of opaque string and matching
varargs at the end.  The name is allocated together with the wq and
formatted.

alloc_ordered_workqueue() is converted to a macro to unify varargs
handling with alloc_workqueue(), and, while at it, add comment to
alloc_workqueue().

None of the current in-kernel users pass in string with '%' as constant
name and this change shouldn't cause any problem.

[akpm@linux-foundation.org: use __printf]
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10 16:30:54 -08:00
Matt Fleming 5e6292c0f2 signal: add block_sigmask() for adding sigmask to current->blocked
Abstract the code sequence for adding a signal handler's sa_mask to
current->blocked because the sequence is identical for all architectures.
Furthermore, in the past some architectures actually got this code wrong,
so introduce a wrapper that all architectures can use.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10 16:30:54 -08:00
KAMEZAWA Hiroyuki 43d2b11324 tracepoint: add tracepoints for debugging oom_score_adj
oom_score_adj is used for guarding processes from OOM-Killer.  One of
problem is that it's inherited at fork().  When a daemon set oom_score_adj
and make children, it's hard to know where the value is set.

This patch adds some tracepoints useful for debugging. This patch adds
3 trace points.
  - creating new task
  - renaming a task (exec)
  - set oom_score_adj

To debug, users need to enable some trace pointer. Maybe filtering is useful as

# EVENT=/sys/kernel/debug/tracing/events/task/
# echo "oom_score_adj != 0" > $EVENT/task_newtask/filter
# echo "oom_score_adj != 0" > $EVENT/task_rename/filter
# echo 1 > $EVENT/enable
# EVENT=/sys/kernel/debug/tracing/events/oom/
# echo 1 > $EVENT/enable

output will be like this.
# grep oom /sys/kernel/debug/tracing/trace
bash-7699  [007] d..3  5140.744510: oom_score_adj_update: pid=7699 comm=bash oom_score_adj=-1000
bash-7699  [007] ...1  5151.818022: task_newtask: pid=7729 comm=bash clone_flags=1200011 oom_score_adj=-1000
ls-7729  [003] ...2  5151.818504: task_rename: pid=7729 oldcomm=bash newcomm=ls oom_score_adj=-1000
bash-7699  [002] ...1  5175.701468: task_newtask: pid=7730 comm=bash clone_flags=1200011 oom_score_adj=-1000
grep-7730  [007] ...2  5175.701993: task_rename: pid=7730 oldcomm=bash newcomm=grep oom_score_adj=-1000

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10 16:30:44 -08:00
Stanislaw Gruszka c6968e73b9 PM/Hibernate: do not count debug pages as savable
When debugging with CONFIG_DEBUG_PAGEALLOC and debug_guardpage_minorder >
0, we have lot of free pages that are not marked so.  Snapshot code
account them as savable, what cause hibernate memory preallocation
failure.

It is pretty hard to make hibernate allocation succeed with
debug_guardpage_minorder=1.  This change at least make it possible when
system has relatively big amount of RAM.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10 16:30:42 -08:00
Linus Torvalds 3dcf6c1b6b Merge branch 'kvm-updates/3.3' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/3.3' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (74 commits)
  KVM: PPC: Whitespace fix for kvm.h
  KVM: Fix whitespace in kvm_para.h
  KVM: PPC: annotate kvm_rma_init as __init
  KVM: x86 emulator: implement RDPMC (0F 33)
  KVM: x86 emulator: fix RDPMC privilege check
  KVM: Expose the architectural performance monitoring CPUID leaf
  KVM: VMX: Intercept RDPMC
  KVM: SVM: Intercept RDPMC
  KVM: Add generic RDPMC support
  KVM: Expose a version 2 architectural PMU to a guests
  KVM: Expose kvm_lapic_local_deliver()
  KVM: x86 emulator: Use opcode::execute for Group 9 instruction
  KVM: x86 emulator: Use opcode::execute for Group 4/5 instructions
  KVM: x86 emulator: Use opcode::execute for Group 1A instruction
  KVM: ensure that debugfs entries have been created
  KVM: drop bsp_vcpu pointer from kvm struct
  KVM: x86: Consolidate PIT legacy test
  KVM: x86: Do not rely on implicit inclusions
  KVM: Make KVM_INTEL depend on CPU_SUP_INTEL
  KVM: Use memdup_user instead of kmalloc/copy_from_user
  ...
2012-01-10 09:57:11 -08:00
Hiroshi Shimamoto 9b9fb610f6 sched: Remove empty #ifdefs
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4F0B8525.8070901@ct.jp.nec.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-10 09:57:41 +01:00
Linus Torvalds db0c2bf69a Merge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
* 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
  cgroup: fix to allow mounting a hierarchy by name
  cgroup: move assignement out of condition in cgroup_attach_proc()
  cgroup: Remove task_lock() from cgroup_post_fork()
  cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end()
  cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static
  cgroup: only need to check oldcgrp==newgrp once
  cgroup: remove redundant get/put of task struct
  cgroup: remove redundant get/put of old css_set from migrate
  cgroup: Remove unnecessary task_lock before fetching css_set on migration
  cgroup: Drop task_lock(parent) on cgroup_fork()
  cgroups: remove redundant get/put of css_set from css_set_check_fetched()
  resource cgroups: remove bogus cast
  cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
  cgroup, cpuset: don't use ss->pre_attach()
  cgroup: don't use subsys->can_attach_task() or ->attach_task()
  cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
  cgroup: improve old cgroup handling in cgroup_attach_proc()
  cgroup: always lock threadgroup during migration
  threadgroup: extend threadgroup_lock() to cover exit and exec
  threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem
  ...

Fix up conflict in kernel/cgroup.c due to commit e0197aae59e5: "cgroups:
fix a css_set not found bug in cgroup_attach_proc" that already
mentioned that the bug is fixed (differently) in Tejun's cgroup
patchset. This one, in other words.
2012-01-09 12:59:24 -08:00
Jesper Juhl a0e86bd425 audit: always follow va_copy() with va_end()
A call to va_copy() should always be followed by a call to va_end() in
the same function.  In kernel/autit.c::audit_log_vformat() this is not
always done.  This patch makes sure va_end() is always called.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-08 14:15:21 -08:00
Linus Torvalds 98793265b4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
  Kconfig: acpi: Fix typo in comment.
  misc latin1 to utf8 conversions
  devres: Fix a typo in devm_kfree comment
  btrfs: free-space-cache.c: remove extra semicolon.
  fat: Spelling s/obsolate/obsolete/g
  SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
  tools/power turbostat: update fields in manpage
  mac80211: drop spelling fix
  types.h: fix comment spelling for 'architectures'
  typo fixes: aera -> area, exntension -> extension
  devices.txt: Fix typo of 'VMware'.
  sis900: Fix enum typo 'sis900_rx_bufer_status'
  decompress_bunzip2: remove invalid vi modeline
  treewide: Fix comment and string typo 'bufer'
  hyper-v: Update MAINTAINERS
  treewide: Fix typos in various parts of the kernel, and fix some comments.
  clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
  gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
  leds: Kconfig: Fix typo 'D2NET_V2'
  sound: Kconfig: drop unknown symbol ARCH_CLPS7500
  ...

Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
kconfig additions, close to removed commented-out old ones)
2012-01-08 13:21:22 -08:00
Linus Torvalds eb59c505f8 Merge branch 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
* 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits)
  PM / Hibernate: Implement compat_ioctl for /dev/snapshot
  PM / Freezer: fix return value of freezable_schedule_timeout_killable()
  PM / shmobile: Allow the A4R domain to be turned off at run time
  PM / input / touchscreen: Make st1232 use device PM QoS constraints
  PM / QoS: Introduce dev_pm_qos_add_ancestor_request()
  PM / shmobile: Remove the stay_on flag from SH7372's PM domains
  PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume
  PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode
  PM: Drop generic_subsys_pm_ops
  PM / Sleep: Remove forward-only callbacks from AMBA bus type
  PM / Sleep: Remove forward-only callbacks from platform bus type
  PM: Run the driver callback directly if the subsystem one is not there
  PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
  PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412.
  PM / Sleep: Merge internal functions in generic_ops.c
  PM / Sleep: Simplify generic system suspend callbacks
  PM / Hibernate: Remove deprecated hibernation snapshot ioctls
  PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
  ARM: S3C64XX: Implement basic power domain support
  PM / shmobile: Use common always on power domain governor
  ...

Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused
XBT_FORCE_SLEEP bit
2012-01-08 13:10:57 -08:00
Linus Torvalds 972b2c7199 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
  reiserfs: Properly display mount options in /proc/mounts
  vfs: prevent remount read-only if pending removes
  vfs: count unlinked inodes
  vfs: protect remounting superblock read-only
  vfs: keep list of mounts for each superblock
  vfs: switch ->show_options() to struct dentry *
  vfs: switch ->show_path() to struct dentry *
  vfs: switch ->show_devname() to struct dentry *
  vfs: switch ->show_stats to struct dentry *
  switch security_path_chmod() to struct path *
  vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
  vfs: trim includes a bit
  switch mnt_namespace ->root to struct mount
  vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
  vfs: opencode mntget() mnt_set_mountpoint()
  vfs: spread struct mount - remaining argument of next_mnt()
  vfs: move fsnotify junk to struct mount
  vfs: move mnt_devname
  vfs: move mnt_list to struct mount
  vfs: switch pnode.h macros to struct mount *
  ...
2012-01-08 12:19:57 -08:00
Ingo Molnar 636f0c70f2 Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core 2012-01-08 12:31:24 +01:00
Ingo Molnar 675eef66e3 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core 2012-01-08 09:51:24 +01:00
Linus Torvalds 02550d61f4 devicetree/next changes queued for v3.3 merge window
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPBUxHAAoJEEFnBt12D9kBfB8QAIj/sGqC4WYpQBU9PyGiSAfq
 Xz5mBqAoILAjLD5SK1eR/N6+CHFt2scsyRkOubfg/4bn0I9Kr37oSXPcVqY24EKw
 zn2tEcDDJdCMzW/CFBEcOfcKfcNl1dgEXodhoOrSAeyaZeL24V5O1SvrwKGINQIp
 NPO/nHeEf7tRncscfeCFSK6h98dIhRZt+xpthIIZxYym5jd9vzRjk21dG6egcHOo
 RTfW2CwA0cu3h2EtR1DCCMwfL+Ze1na4iYcXcTT/5xEUjmkJrM/I44djb9blfWLR
 X0gPuj7q6t+eBKDZqnJufDbY79oOpQN8mgAeCrBwF5I27zAzn6FIRQ7Wobmv0DXr
 NrqsduI+nImAyVR1jeWBGAnowNPXfxd9YboATkph5LTpelQ1cZ/tikxVDteBtDSb
 voH68FKFmFZl3QlpFjJnIsodn57ETmUuezA4Xq+iJTHBMpLpcLCZ4RyAoiJRHca6
 lGwBArLo+wpuRdd/BB+rI+CV6ACbJqwb8Yd6ZTw5lW0eaMPTT93ffteChfEQPMq0
 mGVTYY6kVM9ReFPEkcxnQ3RyPF0XqP2TTp37gXC4/Y2LVMqg0UrrWxtwaU2har/Z
 bUF1JOKiZ6+CWJJwMw7coP7rBCPUz6FDApuDmQo9+I7MjxjFkzYzPtisUs9y7wjx
 /XA9z93mB/O+wcbuvqCo
 =CI17
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux-2.6

devicetree/next changes queued for v3.3 merge window

* tag 'devicetree-for-linus-20120104' of git://git.secretlab.ca/git/linux-2.6:
  ARM: prom.h: Fix build error by removing unneeded header file
  irq: check domain hwirq range for DT translate
  dt: add empty of_get_node/of_put_node functions
  of/pdt: fix section mismatch warning
  i2c-designware: add OF binding support
  dt/i2c: Enumerate some of the known trivial i2c devices
  dt: reform for_each_property to for_each_property_of_node
  ARM/of: allow *machine_desc.dt_compat to be const
  of/base: Take NULL string into account for property with multiple strings
  OF/device-tree: Add some entries to vendor-prefixes.txt

Fix up trivial add-add conflicts in include/linux/of.h
2012-01-07 12:18:52 -08:00
Linus Torvalds 7affca3537 Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits)
  arm: fix up some samsung merge sysdev conversion problems
  firmware: Fix an oops on reading fw_priv->fw in sysfs loading file
  Drivers:hv: Fix a bug in vmbus_driver_unregister()
  driver core: remove __must_check from device_create_file
  debugfs: add missing #ifdef HAS_IOMEM
  arm: time.h: remove device.h #include
  driver-core: remove sysdev.h usage.
  clockevents: remove sysdev.h
  arm: convert sysdev_class to a regular subsystem
  arm: leds: convert sysdev_class to a regular subsystem
  kobject: remove kset_find_obj_hinted()
  m86k: gpio - convert sysdev_class to a regular subsystem
  mips: txx9_sram - convert sysdev_class to a regular subsystem
  mips: 7segled - convert sysdev_class to a regular subsystem
  sh: dma - convert sysdev_class to a regular subsystem
  sh: intc - convert sysdev_class to a regular subsystem
  power: suspend - convert sysdev_class to a regular subsystem
  power: qe_ic - convert sysdev_class to a regular subsystem
  power: cmm - convert sysdev_class to a regular subsystem
  s390: time - convert sysdev_class to a regular subsystem
  ...

Fix up conflicts with 'struct sysdev' removal from various platform
drivers that got changed:
 - arch/arm/mach-exynos/cpu.c
 - arch/arm/mach-exynos/irq-eint.c
 - arch/arm/mach-s3c64xx/common.c
 - arch/arm/mach-s3c64xx/cpu.c
 - arch/arm/mach-s5p64x0/cpu.c
 - arch/arm/mach-s5pv210/common.c
 - arch/arm/plat-samsung/include/plat/cpu.h
 - arch/powerpc/kernel/sysfs.c
and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
2012-01-07 12:03:30 -08:00
Al Viro 34c80b1d93 vfs: switch ->show_options() to struct dentry *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:19:54 -05:00
Al Viro d8c9584ea2 vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:16:53 -05:00
Al Viro ece2ccb668 Merge branches 'vfsmount-guts', 'umode_t' and 'partitions' into Z 2012-01-06 23:15:54 -05:00
Linus Torvalds 770e1b035d Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm
* 'for-linus' of git://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm: (207 commits)
  ARM: 7267/1: Remove BUILD_BUG_ON from asm/bug.h
  ARM: 7269/1: mach-sa1100: fix sched_clock breakage
  ARM: 7198/1: arm/imx6: add restart support for imx6q
  ARM: restart: remove the now empty arch_reset()
  ARM: restart: remove comments about adding code to arch_reset()
  ARM: restart: lpc32xx & u300: remove unnecessary printk
  ARM: restart: plat-samsung: remove plat/reset.h and s5p_reset_hook
  ARM: restart: w90x900: use new restart hook
  ARM: restart: Versatile Express: use new restart hook
  ARM: restart: versatile: use new restart hook
  ARM: restart: u300: use new restart hook
  ARM: restart: tegra: use new restart hook
  ARM: restart: spear: use new restart hook
  ARM: restart: shark: use new restart hook
  ARM: restart: sa1100: use new restart hook
  ARM: 7252/1: restart: S5PV210: use new restart hook
  ARM: 7251/1: restart: S5PC100: use new restart hook
  ARM: 7250/1: restart: S5P64X0: use new restart hook
  ARM: 7266/1: restart: S3C64XX: use new restart hook
  ARM: 7265/1: restart: S3C24XX: use new restart hook
  ...

Fix up trivial conflict in arch/arm/mm/init.c due to removal of
memblock_init() clashing with the movement of the sorting of the meminfo
array.
2012-01-06 18:15:25 -08:00
Linus Torvalds 376613e81d Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, tsc: Skip TSC synchronization checks for tsc=reliable
  clocksource: Convert tcb_clksrc to use clocksource_register_hz/khz
  clocksource: cris: Convert to clocksource_register_khz
  clocksource: xtensa: Convert to clocksource_register_hz/khz
  clocksource: um: Convert to clocksource_register_hz/khz
  clocksource: parisc: Convert to clocksource_register_hz/khz
  clocksource: m86k: Convert to clocksource_register_hz/khz
  time: x86: Replace LATCH with PIT_LATCH in i8253 clocksource driver
  time: x86: Remove CLOCK_TICK_RATE from acpi_pm clocksource driver
  time: x86: Remove CLOCK_TICK_RATE from mach_timer.h
  time: x86: Remove CLOCK_TICK_RATE from tsc code
  time: Fix spelling mistakes in new comments
  time: fix bogus comment in timekeeping_get_ns_raw
2012-01-06 13:57:44 -08:00
Greg Kroah-Hartman ff4b8a57f0 Merge branch 'driver-core-next' into Linux 3.2
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
and it fixes the build error in the arch/x86/kernel/microcode_core.c
file, that the merge did not catch.

The microcode_core.c patch was provided by Stephen Rothwell
<sfr@canb.auug.org.au> who was invaluable in the merge issues involved
with the large sysdev removal process in the driver-core tree.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 11:42:52 -08:00
Linus Torvalds 0db49b72bc Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  sched/tracing: Add a new tracepoint for sleeptime
  sched: Disable scheduler warnings during oopses
  sched: Fix cgroup movement of waking process
  sched: Fix cgroup movement of newly created process
  sched: Fix cgroup movement of forking process
  sched: Remove cfs bandwidth period check in tg_set_cfs_period()
  sched: Fix load-balance lock-breaking
  sched: Replace all_pinned with a generic flags field
  sched: Only queue remote wakeups when crossing cache boundaries
  sched: Add missing rcu_dereference() around ->real_parent usage
  [S390] fix cputime overflow in uptime_proc_show
  [S390] cputime: add sparse checking and cleanup
  sched: Mark parent and real_parent as __rcu
  sched, nohz: Fix missing RCU read lock
  sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer
  sched, nohz: Fix the idle cpu check in nohz_idle_balance
  sched: Use jump_labels for sched_feat
  sched/accounting: Fix parameter passing in task_group_account_field
  sched/accounting: Fix user/system tick double accounting
  sched/accounting: Re-use scheduler statistics for the root cgroup
  ...

Fix up conflicts in
 - arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h
	usecs_to_cputime64() vs the sparse cleanups
 - kernel/sched/fair.c, kernel/time/tick-sched.c
	scheduler changes in multiple branches
2012-01-06 08:44:54 -08:00
Linus Torvalds 35b740e466 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (106 commits)
  perf kvm: Fix copy & paste error in description
  perf script: Kill script_spec__delete
  perf top: Fix a memory leak
  perf stat: Introduce get_ratio_color() helper
  perf session: Remove impossible condition check
  perf tools: Fix feature-bits rework fallout, remove unused variable
  perf script: Add generic perl handler to process events
  perf tools: Use for_each_set_bit() to iterate over feature flags
  perf tools: Unify handling of features when writing feature section
  perf report: Accept fifos as input file
  perf tools: Moving code in some files
  perf tools: Fix out-of-bound access to struct perf_session
  perf tools: Continue processing header on unknown features
  perf tools: Improve macros for struct feature_ops
  perf: builtin-record: Document and check that mmap_pages must be a power of two.
  perf: builtin-record: Provide advice if mmap'ing fails with EPERM.
  perf tools: Fix truncated annotation
  perf script: look up thread using tid instead of pid
  perf tools: Look up thread names for system wide profiling
  perf tools: Fix comm for processes with named threads
  ...
2012-01-06 08:02:58 -08:00
Linus Torvalds 423d091dfe Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  cpu: Export cpu_up()
  rcu: Apply ACCESS_ONCE() to rcu_boost() return value
  Revert "rcu: Permit rt_mutex_unlock() with irqs disabled"
  docs: Additional LWN links to RCU API
  rcu: Augment rcu_batch_end tracing for idle and callback state
  rcu: Add rcutorture tests for srcu_read_lock_raw()
  rcu: Make rcutorture test for hotpluggability before offlining CPUs
  driver-core/cpu: Expose hotpluggability to the rest of the kernel
  rcu: Remove redundant rcu_cpu_stall_suppress declaration
  rcu: Adaptive dyntick-idle preparation
  rcu: Keep invoking callbacks if CPU otherwise idle
  rcu: Irq nesting is always 0 on rcu_enter_idle_common
  rcu: Don't check irq nesting from rcu idle entry/exit
  rcu: Permit dyntick-idle with callbacks pending
  rcu: Document same-context read-side constraints
  rcu: Identify dyntick-idle CPUs on first force_quiescent_state() pass
  rcu: Remove dynticks false positives and RCU failures
  rcu: Reduce latency of rcu_prepare_for_idle()
  rcu: Eliminate RCU_FAST_NO_HZ grace-period hang
  rcu: Avoid needlessly IPIing CPUs at GP end
  ...
2012-01-06 08:02:40 -08:00
Linus Torvalds 1483b38235 Merge branch 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  printk, lockdep: Switch to tracked irq ops
  printk, lockdep: Remove superfluous preempt_disable()
  printk, lockdep: Disable lock debugging on zap_locks()
2012-01-06 08:01:59 -08:00
Linus Torvalds 4a2164a7db Merge branch 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  memblock: Reimplement memblock allocation using reverse free area iterator
  memblock: Kill early_node_map[]
  score: Use HAVE_MEMBLOCK_NODE_MAP
  s390: Use HAVE_MEMBLOCK_NODE_MAP
  mips: Use HAVE_MEMBLOCK_NODE_MAP
  ia64: Use HAVE_MEMBLOCK_NODE_MAP
  SuperH: Use HAVE_MEMBLOCK_NODE_MAP
  sparc: Use HAVE_MEMBLOCK_NODE_MAP
  powerpc: Use HAVE_MEMBLOCK_NODE_MAP
  memblock: Implement memblock_add_node()
  memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users
  memblock: Track total size of regions automatically
  powerpc: Cleanup memblock usage
  memblock: Reimplement memblock_enforce_memory_limit() using __memblock_remove()
  memblock: Make memblock functions handle overflowing range @size
  memblock: Reimplement __memblock_remove() using memblock_isolate_range()
  memblock: Separate out memblock_isolate_range() from memblock_set_node()
  memblock: Kill memblock_init()
  memblock: Kill sentinel entries at the end of static region arrays
  memblock: Add __memblock_dump_all()
  ...
2012-01-06 07:54:53 -08:00
Linus Torvalds 15f043a65f Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lockdep/waitqueues: Add better annotation
  lockdep, bug: Exclude TAINT_OOT_MODULE from disabling lock debugging
  lockdep: Print lock name in lockdep_init_error()
  init/main.c: Execute lockdep_init() as early as possible
  lockdep, kmemcheck: Annotate ->lock in lockdep_init_map()
  lockdep, rtmutex, bug: Show taint flags on error
  lockdep, bug: Exclude TAINT_FIRMWARE_WORKAROUND from disabling lockdep
  lockdep: Always try to set ->class_cache in register_lock_class() lockdep_init_map()
2012-01-06 07:53:52 -08:00
Linus Torvalds 8c717b72de Merge branch 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timer: Use debugobjects to catch deletion of uninitialized timers
  timer: Setup uninitialized timer with a stub callback
  debugobjects: Extend to assert that an object is initialized
  debugobjects: Be smarter about static objects
2012-01-06 07:53:34 -08:00
Eric Paris fd77846152 security: remove the security_netlink_recv hook as it is equivalent to capable()
Once upon a time netlink was not sync and we had to get the effective
capabilities from the skb that was being received.  Today we instead get
the capabilities from the current task.  This has rendered the entire
purpose of the hook moot as it is now functionally equivalent to the
capable() call.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-05 18:53:01 -05:00
Eric Paris 69f594a389 ptrace: do not audit capability check when outputing /proc/pid/stat
Reading /proc/pid/stat of another process checks if one has ptrace permissions
on that process.  If one does have permissions it outputs some data about the
process which might have security and attack implications.  If the current
task does not have ptrace permissions the read still works, but those fields
are filled with inocuous (0) values.  Since this check and a subsequent denial
is not a violation of the security policy we should not audit such denials.

This can be quite useful to removing ptrace broadly across a system without
flooding the logs when ps is run or something which harmlessly walks proc.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
2012-01-05 18:53:00 -05:00
Eric Paris f1c84dae0e capabilities: remove task_ns_* functions
task_ in the front of a function, in the security subsystem anyway, means
to me at least, that we are operating with that task as the subject of the
security decision.  In this case what it means is that we are using current as
the subject but we use the task to get the right namespace.  Who in the world
would ever realize that's what task_ns_capability means just by the name?  This
patch eliminates the task_ns functions entirely and uses the has_ns_capability
function instead.  This means we explicitly open code the ns in question in
the caller.  I think it makes the caller a LOT more clear what is going on.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
2012-01-05 18:52:59 -05:00
Eric Paris d2a7009f0b capabitlies: ns_capable can use the cap helpers rather than lsm call
Just to reduce the number of places to change if we every change the LSM
hook, use the capability helpers internally when possible.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
2012-01-05 18:52:58 -05:00
Eric Paris 105ddf49cd capabilities: style only - move capable below ns_capable
Although the current code is fine for consistency this moves the capable
code below the function it calls in the c file.  It doesn't actually change
code.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
2012-01-05 18:52:57 -05:00
Eric Paris 7b61d64849 capabilites: introduce new has_ns_capabilities_noaudit
For consistency in interfaces, introduce a new interface called
has_ns_capabilities_noaudit.  It checks if the given task has the given
capability in the given namespace.  Use this new function by
has_capabilities_noaudit.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
2012-01-05 18:52:57 -05:00
Eric Paris 25e7570341 capabilities: call has_ns_capability from has_capability
Declare the more specific has_ns_capability first in the code and then call it
from has_capability.  The declaration reversal isn't stricty necessary since
they are both declared in header files, but it just makes sense to put more
specific functions first in the code.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
2012-01-05 18:52:56 -05:00
Eric Paris 2920a8409d capabilities: remove all _real_ interfaces
The name security_real_capable and security_real_capable_noaudit just don't
make much sense to me.  Convert them to use security_capable and
security_capable_noaudit.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
2012-01-05 18:52:55 -05:00
Eric Paris b7e724d303 capabilities: reverse arguments to security_capable
security_capable takes ns, cred, cap.  But the LSM capable() hook takes
cred, ns, cap.  The capability helper functions also take cred, ns, cap.
Rather than flip argument order just to flip it back, leave them alone.
Heck, this should be a little faster since argument will be in the right
place!

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-05 18:52:53 -05:00
Li Zefan 0d19ea8665 cgroup: fix to allow mounting a hierarchy by name
If we mount a hierarchy with a specified name, the name is unique,
and we can use it to mount the hierarchy without specifying its
set of subsystem names. This feature is documented is
Documentation/cgroups/cgroups.txt section 2.3

Here's an example:

	# mount -t cgroup -o cpuset,name=myhier xxx /cgroup1
	# mount -t cgroup -o name=myhier xxx /cgroup2

But it was broken by commit 32a8cf235e
(cgroup: make the mount options parsing more accurate)

This fixes the regression.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
2012-01-05 09:59:46 -08:00
Russell King 2e0e943436 Merge branch 'devel-stable' into for-linus
Conflicts:
	arch/arm/kernel/setup.c
	arch/arm/mach-shmobile/board-kota2.c
2012-01-05 13:24:33 +00:00
Rafael J. Wysocki c233523b3d Merge branch 'pm-sleep' into pm-for-linus
* pm-sleep:
  PM / Hibernate: Implement compat_ioctl for /dev/snapshot
2012-01-05 00:06:24 +01:00
Ben Hutchings c336078bf6 PM / Hibernate: Implement compat_ioctl for /dev/snapshot
This allows uswsusp built for i386 to run on an x86_64 kernel (tested
with Debian package version 1.0+20110509-2).

References: http://bugs.debian.org/502816
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-01-05 00:05:55 +01:00
Oleg Nesterov 8a88951b58 ptrace: ensure JOBCTL_STOP_SIGMASK is not zero after detach
This is the temporary simple fix for 3.2, we need more changes in this
area.

1. do_signal_stop() assumes that the running untraced thread in the
   stopped thread group is not possible. This was our goal but it is
   not yet achieved: a stopped-but-resumed tracee can clone the running
   thread which can initiate another group-stop.

   Remove WARN_ON_ONCE(!current->ptrace).

2. A new thread always starts with ->jobctl = 0. If it is auto-attached
   and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING
   but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr)
   in do_jobctl_trap() if another debugger attaches.

   Change __ptrace_unlink() to set the artificial SIGSTOP for report.

   Alternatively we could change ptrace_init_task() to copy signr from
   current, but this means we can copy it for no reason and hide the
   possible similar problems.

Acked-by: Tejun Heo <tj@kernel.org>
Cc: <stable@kernel.org>		[3.1]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-04 15:01:59 -08:00
Oleg Nesterov 50b8d25748 ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE race
Test-case:

	int main(void)
	{
		int pid, status;

		pid = fork();
		if (!pid) {
			for (;;) {
				if (!fork())
					return 0;
				if (waitpid(-1, &status, 0) < 0) {
					printf("ERR!! wait: %m\n");
					return 0;
				}
			}
		}

		assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
		assert(waitpid(-1, NULL, 0) == pid);

		assert(ptrace(PTRACE_SETOPTIONS, pid, 0,
					PTRACE_O_TRACEFORK) == 0);

		do {
			ptrace(PTRACE_CONT, pid, 0, 0);
			pid = waitpid(-1, NULL, 0);
		} while (pid > 0);

		return 1;
	}

It fails because ->real_parent sees its child in EXIT_DEAD state
while the tracer is going to change the state back to EXIT_ZOMBIE
in wait_task_zombie().

The offending commit is 823b018e which moved the EXIT_DEAD check,
but in fact we should not blame it. The original code was not
correct as well because it didn't take ptrace_reparented() into
account and because we can't really trust ->ptrace.

This patch adds the additional check to close this particular
race but it doesn't solve the whole problem. We simply can't
rely on ->ptrace in this case, it can be cleared if the tracer
is multithreaded by the exiting ->parent.

I think we should kill EXIT_DEAD altogether, we should always
remove the soon-to-be-reaped child from ->children or at least
we should never do the DEAD->ZOMBIE transition. But this is too
complex for 3.2.

Reported-and-tested-by: Denys Vlasenko <vda.linux@googlemail.com>
Tested-by: Lukasz Michalik <lmi@ift.uni.wroc.pl>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <stable@kernel.org>		[3.0+]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-04 15:01:59 -08:00
Dan Carpenter 305f3c8b20 cgroup: move assignement out of condition in cgroup_attach_proc()
Gcc complains about this: "kernel/cgroup.c:2179:4: warning: suggest
parentheses around assignment used as truth value [-Wparentheses]"

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-01-04 07:37:48 -08:00
Al Viro 93d3a10ef4 auditsc: propage umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:17 -05:00
Al Viro 2570ebbd1f switch kern_ipc_perm to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:17 -05:00
Al Viro df0a42837b switch mq_open() to umode_t 2012-01-03 22:55:16 -05:00
Al Viro 36fcb589e7 sysctl: use umode_t for table permissions
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:12 -05:00
Al Viro a5e7ed3287 cgroup: propagate mode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:03 -05:00
Al Viro f4ae40a6a5 switch debugfs to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:56 -05:00
Al Viro 18bb1db3e7 switch vfs_mkdir() and ->mkdir() to umode_t
vfs_mkdir() gets int, but immediately drops everything that might not
fit into umode_t and that's the only caller of ->mkdir()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:53 -05:00
Al Viro ff01bb4832 fs: move code out of buffer.c
Move invalidate_bdev, block_sync_page into fs/block_dev.c.  Export
kill_bdev as well, so brd doesn't have to open code it.  Reduce
buffer_head.h requirement accordingly.

Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving.  The small comment replacing it says enough.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:07 -05:00
Al Viro 32dc730860 get rid of timer in kern/acct.c
... and clean it up a bit, while we are at it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:41 -05:00
Mandeep Singh Baines f9fab10bbd hung_task: fix false positive during vfork
vfork parent uninterruptibly and unkillably waits for its child to
exec/exit. This wait is of unbounded length. Ignore such waits
in the hung_task detector.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
LKML-Reference: <1325344394.28904.43.camel@lappy>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-03 16:14:32 -08:00
Al Viro d36b691077 misc latin1 to utf8 conversions
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-01-02 13:04:55 +01:00
Hugh Dickins e6780f7243 futex: Fix uninterruptible loop due to gate_area
It was found (by Sasha) that if you use a futex located in the gate
area we get stuck in an uninterruptible infinite loop, much like the
ZERO_PAGE issue.

While looking at this problem, PeterZ realized you'll get into similar
trouble when hitting any install_special_pages() mapping.  And are there
still drivers setting up their own special mmaps without page->mapping,
and without special VM or pte flags to make get_user_pages fail?

In most cases, if page->mapping is NULL, we do not need to retry at all:
Linus points out that even /proc/sys/vm/drop_caches poses no problem,
because it ends up using remove_mapping(), which takes care not to
interfere when the page reference count is raised.

But there is still one case which does need a retry: if memory pressure
called shmem_writepage in between get_user_pages_fast dropping page
table lock and our acquiring page lock, then the page gets switched from
filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
Fault it back in to get the page->mapping needed for key->shared.inode.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-31 11:48:28 -08:00
Linus Torvalds 3b87487ac5 Revert "clockevents: Set noop handler in clockevents_exchange_device()"
This reverts commit de28f25e82.

It results in resume problems for various people. See for example

  http://thread.gmane.org/gmane.linux.kernel/1233033
  http://thread.gmane.org/gmane.linux.kernel/1233389
  http://thread.gmane.org/gmane.linux.kernel/1233159
  http://thread.gmane.org/gmane.linux.kernel/1227868/focus=1230877

and the fedora and ubuntu bug reports

  https://bugzilla.redhat.com/show_bug.cgi?id=767248
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/904569

which got bisected down to the stable version of this commit.

Reported-by: Jonathan Nieder <jrnieder@gmail.com>
Reported-by: Phil Miller <mille121@illinois.edu>
Reported-by: Philip Langdale <philipl@overt.org>
Reported-by: Tim Gardner <tim.gardner@canonical.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <gregkh@suse.de>
Cc: stable@kernel.org    # for stable kernels that applied the original
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-30 13:24:40 -08:00
Rob Herring 93797d87d6 irq: check domain hwirq range for DT translate
A DT node may have more than 1 domain associated with it, so make sure
the hwirq number is within range when doing DT translation.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
2011-12-28 08:26:06 -06:00
Frederic Weisbecker 7e3aa30ac8 cgroup: Remove task_lock() from cgroup_post_fork()
cgroup_post_fork() is protected between threadgroup_change_begin()
and threadgroup_change_end() against concurrent changes of the
child's css_set in cgroup_task_migrate(). Also the child can't
exit and call cgroup_exit() at this stage, this means it's css_set
can't be changed with init_css_set concurrently.

For these reasons, we don't need to hold task_lock() on the child
because it's css_set can only remain stable in this place.

Let's remove the lock there.

v2: Update comment to explain that we are safe against
cgroup_exit()

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Containers <containers@lists.linux-foundation.org>
Cc: Cgroups <cgroups@vger.kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Mandeep Singh Baines <msb@chromium.org>
2011-12-27 09:05:20 -08:00
Kirill A. Shutemov c6ca57500c cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end()
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-12-27 08:51:47 -08:00
Kirill A. Shutemov 1c6c3fad81 cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-12-27 08:51:46 -08:00
Avi Kivity 9e31905f29 Merge remote-tracking branch 'tip/perf/core' into kvm-updates/3.3
* tip/perf/core: (66 commits)
  perf, x86: Expose perf capability to other modules
  perf, x86: Implement arch event mask as quirk
  x86, perf: Disable non available architectural events
  jump_label: Provide jump_label_key initializers
  jump_label, x86: Fix section mismatch
  perf, core: Rate limit perf_sched_events jump_label patching
  perf: Fix enable_on_exec for sibling events
  perf: Remove superfluous arguments
  perf, x86: Prefer fixed-purpose counters when scheduling
  perf, x86: Fix event scheduler for constraints with overlapping counters
  perf, x86: Implement event scheduler helper functions
  perf: Avoid a useless pmu_disable() in the perf-tick
  x86/tools: Add decoded instruction dump mode
  x86: Update instruction decoder to support new AVX formats
  x86/tools: Fix insn_sanity message outputs
  x86/tools: Fix instruction decoder message output
  x86: Fix instruction decoder to handle grouped AVX instructions
  x86/tools: Fix Makefile to build all test tools
  perf test: Soft errors shouldn't stop the "Validate PERF_RECORD_" test
  perf test: Validate PERF_RECORD_ events and perf_sample fields
  ...

Signed-off-by: Avi Kivity <avi@redhat.com>

* commit 'b3d9468a8bd218a695e3a0ff112cd4efd27b670a': (66 commits)
  perf, x86: Expose perf capability to other modules
  perf, x86: Implement arch event mask as quirk
  x86, perf: Disable non available architectural events
  jump_label: Provide jump_label_key initializers
  jump_label, x86: Fix section mismatch
  perf, core: Rate limit perf_sched_events jump_label patching
  perf: Fix enable_on_exec for sibling events
  perf: Remove superfluous arguments
  perf, x86: Prefer fixed-purpose counters when scheduling
  perf, x86: Fix event scheduler for constraints with overlapping counters
  perf, x86: Implement event scheduler helper functions
  perf: Avoid a useless pmu_disable() in the perf-tick
  x86/tools: Add decoded instruction dump mode
  x86: Update instruction decoder to support new AVX formats
  x86/tools: Fix insn_sanity message outputs
  x86/tools: Fix instruction decoder message output
  x86: Fix instruction decoder to handle grouped AVX instructions
  x86/tools: Fix Makefile to build all test tools
  perf test: Soft errors shouldn't stop the "Validate PERF_RECORD_" test
  perf test: Validate PERF_RECORD_ events and perf_sample fields
  ...
2011-12-27 11:22:24 +02:00
Xiao Guangrong a65cf5181a jump-label: export jump_label_inc/jump_label_dec
Export these two symbols, they will be used by KVM mmu audit

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-12-27 11:22:04 +02:00
Rafael J. Wysocki b7ba68c4a0 Merge branch 'pm-sleep' into pm-for-linus
* pm-sleep: (51 commits)
  PM: Drop generic_subsys_pm_ops
  PM / Sleep: Remove forward-only callbacks from AMBA bus type
  PM / Sleep: Remove forward-only callbacks from platform bus type
  PM: Run the driver callback directly if the subsystem one is not there
  PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
  PM / Sleep: Merge internal functions in generic_ops.c
  PM / Sleep: Simplify generic system suspend callbacks
  PM / Hibernate: Remove deprecated hibernation snapshot ioctls
  PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
  PM / Sleep: Recommend [un]lock_system_sleep() over using pm_mutex directly
  PM / Sleep: Replace mutex_[un]lock(&pm_mutex) with [un]lock_system_sleep()
  PM / Sleep: Make [un]lock_system_sleep() generic
  PM / Sleep: Use the freezer_count() functions in [un]lock_system_sleep() APIs
  PM / Freezer: Remove the "userspace only" constraint from freezer[_do_not]_count()
  PM / Hibernate: Replace unintuitive 'if' condition in kernel/power/user.c with 'else'
  Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer
  PM / Sleep: Unify diagnostic messages from device suspend/resume
  ACPI / PM: Do not save/restore NVS on Asus K54C/K54HR
  PM / Hibernate: Remove deprecated hibernation test modes
  PM / Hibernate: Thaw processes in SNAPSHOT_CREATE_IMAGE ioctl test path
  ...

Conflicts:
	kernel/kmod.c
2011-12-25 23:42:20 +01:00
Rafael J. Wysocki 3e68001bb1 Merge branch 'pm-misc' into pm-for-linus
* pm-misc:
  CPU: Add right qualifiers for alloc_frozen_cpus() and cpu_hotplug_pm_sync_init()
  PM / Usermodehelper: Cleanup remnants of usermodehelper_pm_callback()
2011-12-25 23:41:23 +01:00
Jamie Iles c87fb57346 ARM: 7235/1: irqdomain: export irq_domain_simple_ops for !CONFIG_OF
irqdomain support is used in interrupt controller drivers that may not
have device tree support but only need the basic HW->Linux irq
translation.  Rather than having each of these implement their own IRQ
domain, allow them to use the simple ops.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rob Herring <robherring2@gmail.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-12-23 22:33:58 +00:00
Arun Sharma 1ac9bc6943 sched/tracing: Add a new tracepoint for sleeptime
If CONFIG_SCHEDSTATS is defined, the kernel maintains
information about how long the task was sleeping or
in the case of iowait, blocking in the kernel before
getting woken up.

This will be useful for sleep time profiling.

Note: this information is only provided for sched_fair.
Other scheduling classes may choose to provide this in
the future.

Note: the delay includes the time spent on the runqueue
as well.

Signed-off-by: Arun Sharma <asharma@fb.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1324512940-32060-2-git-send-email-asharma@fb.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-23 17:56:17 +01:00
Dave Jones 664dfa65e8 sched: Disable scheduler warnings during oopses
The panic-on-framebuffer code seems to cause a schedule
to occur during an oops. This causes a bunch of extra
spew as can be seen in:

   https://bugzilla.redhat.com/attachment.cgi?id=549230

Don't do scheduler debug checks when we are oopsing already.

Signed-off-by: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/20111222213929.GA4722@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-23 11:20:50 +01:00
Mandeep Singh Baines 892a2b90ba cgroup: only need to check oldcgrp==newgrp once
In cgroup_attach_proc it is now sufficient to only check that
oldcgrp==newcgrp once. Now that we are using threadgroup_lock()
during the migrations, oldcgrp will not change.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
2011-12-22 07:32:57 -08:00
Mandeep Singh Baines b07ef77411 cgroup: remove redundant get/put of task struct
threadgroup_lock() guarantees that the target threadgroup will
remain stable - no new task will be added, no new PF_EXITING
will be set and exec won't happen.

Changes in V2:
* https://lkml.org/lkml/2011/12/20/369 (Tejun Heo)
  * Undo incorrect removal of get/put from attach_task_by_pid()
* Author
  * Remove a comment which is made stale by this change

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
2011-12-22 07:32:57 -08:00