Commit graph

41650 commits

Author SHA1 Message Date
Tim Chen 333c5ae994 idle governor: Avoid lock acquisition to read pm_qos before entering idle
Thanks to the reviews and comments by Rafael, James, Mark and Andi.
Here's version 2 of the patch incorporating your comments and also some
update to my previous patch comments.

I noticed that before entering idle state, the menu idle governor will
look up the current pm_qos target value according to the list of qos
requests received.  This look up currently needs the acquisition of a
lock to access the list of qos requests to find the qos target value,
slowing down the entrance into idle state due to contention by multiple
cpus to access this list.  The contention is severe when there are a lot
of cpus waking and going into idle.  For example, for a simple workload
that has 32 pair of processes ping ponging messages to each other, where
64 cpu cores are active in test system, I see the following profile with
37.82% of cpu cycles spent in contention of pm_qos_lock:

-     37.82%          swapper  [kernel.kallsyms]          [k]
_raw_spin_lock_irqsave
   - _raw_spin_lock_irqsave
      - 95.65% pm_qos_request
           menu_select
           cpuidle_idle_call
         - cpu_idle
              99.98% start_secondary

A better approach will be to cache the updated pm_qos target value so
reading it does not require lock acquisition as in the patch below.
With this patch the contention for pm_qos_lock is removed and I saw a
2.2X increase in throughput for my message passing workload.

cc: stable@kernel.org
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: James Bottomley <James.Bottomley@suse.de>
Acked-by: mark gross <markgross@thegnar.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-05-29 00:50:59 -04:00
Linus Torvalds 5f40d42094 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: NFSROOT should default to "proto=udp"
  nfs4: remove duplicated #include
  NFSv4: nfs4_state_mark_reclaim_nograce() should be static
  NFSv4: Fix the setlk error handler
  NFSv4.1: Fix the handling of the SEQUENCE status bits
  NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses
  NFSv4.1 reclaim complete must wait for completion
  NFSv4: remove duplicate clientid in struct nfs_client
  NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY
  sunrpc: Propagate errors from xs_bind() through xs_create_sock()
  (try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid
  nfs: fix compilation warning
  nfs: add kmalloc return value check in decode_and_add_ds
  SUNRPC: Remove resource leak in svc_rdma_send_error()
  nfs: close NFSv4 COMMIT vs. CLOSE race
  SUNRPC: Close a race in __rpc_wait_for_completion_task()
2011-03-14 11:19:50 -07:00
Linus Torvalds eebea5d13d Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] target: Fix t_transport_aborted handling in LUN_RESET + active I/O shutdown
2011-03-13 16:00:28 -07:00
Trond Myklebust 0400a6b0cb NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses
nfs4_schedule_state_recovery() should only be used when we need to force
the state manager to check the lease. If we just want to start the
state manager in order to handle a state recovery situation, we should be
using nfs4_schedule_state_manager().

This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing
its use with a set of helper functions that do the right thing.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:18:22 -05:00
Andy Adamson 114f64b5f2 NFSv4: remove duplicate clientid in struct nfs_client
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10 15:05:00 -05:00
Trond Myklebust bf294b41ce SUNRPC: Close a race in __rpc_wait_for_completion_task()
Although they run as rpciod background tasks, under normal operation
(i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
and nfs4_do_close() want to be fully synchronous. This means that when we
exit, we want all references to the rpc_task to be gone, and we want
any dentry references etc. held by that task to be released.

For this reason these functions call __rpc_wait_for_completion_task(),
followed by rpc_put_task() in the expectation that the latter will be
releasing the last reference to the rpc_task, and thus ensuring that the
callback_ops->rpc_release() has been called synchronously.

This patch fixes a race which exists due to the fact that
rpciod calls rpc_complete_task() (in order to wake up the callers of
__rpc_wait_for_completion_task()) and then subsequently calls
rpc_put_task() without ensuring that these two steps are done atomically.

In order to avoid adding new spin locks, the patch uses the existing
waitqueue spin lock to order the rpc_task reference count releases between
the waiting process and rpciod.
The common case where nobody is waiting for completion is optimised for by
checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
reference count is 1: in those cases we drop trying to grab the spin lock,
and immediately free up the rpc_task.

Those few processes that need to put the rpc_task from inside an
asynchronous context and that do not care about ordering are given a new
helper: rpc_put_task_async().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10 15:04:52 -05:00
Linus Torvalds ab02a95405 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules
2011-03-09 16:45:02 -08:00
Stephen Rothwell 684adca4f8 sysctl: the include of rcupdate.h is only needed in the kernel
Fixes this build-check error:

  include/linux/sysctl.h:28: included file 'linux/rcupdate.h' is not exported

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-09 16:43:24 -08:00
Vasiliy Kulikov 8909c9ad8f net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules
Since a8f80e8ff9 any process with
CAP_NET_ADMIN may load any module from /lib/modules/.  This doesn't mean
that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are
limited to /lib/modules/**.  However, CAP_NET_ADMIN capability shouldn't
allow anybody load any module not related to networking.

This patch restricts an ability of autoloading modules to netdev modules
with explicit aliases.  This fixes CVE-2011-1019.

Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior
of loading netdev modules by name (without any prefix) for processes
with CAP_SYS_MODULE to maintain the compatibility with network scripts
that use autoloading netdev modules by aliases like "eth0", "wlan0".

Currently there are only three users of the feature in the upstream
kernel: ipip, ip_gre and sit.

    root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) --
    root@albatros:~# grep Cap /proc/$$/status
    CapInh:	0000000000000000
    CapPrm:	fffffff800001000
    CapEff:	fffffff800001000
    CapBnd:	fffffff800001000
    root@albatros:~# modprobe xfs
    FATAL: Error inserting xfs
    (/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted
    root@albatros:~# lsmod | grep xfs
    root@albatros:~# ifconfig xfs
    xfs: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep xfs
    root@albatros:~# lsmod | grep sit
    root@albatros:~# ifconfig sit
    sit: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep sit
    root@albatros:~# ifconfig sit0
    sit0      Link encap:IPv6-in-IPv4
	      NOARP  MTU:1480  Metric:1

    root@albatros:~# lsmod | grep sit
    sit                    10457  0
    tunnel4                 2957  1 sit

For CAP_SYS_MODULE module loading is still relaxed:

    root@albatros:~# grep Cap /proc/$$/status
    CapInh:	0000000000000000
    CapPrm:	ffffffffffffffff
    CapEff:	ffffffffffffffff
    CapBnd:	ffffffffffffffff
    root@albatros:~# ifconfig xfs
    xfs: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep xfs
    xfs                   745319  0

Reference: https://lkml.org/lkml/2011/2/24/203

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
2011-03-10 10:25:19 +11:00
Linus Torvalds 78833dd706 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  nd->inode is not set on the second attempt in path_walk()
  unfuck proc_sysctl ->d_compare()
  minimal fix for do_filp_open() race
2011-03-09 13:55:51 -08:00
Al Viro dfef6dcd35 unfuck proc_sysctl ->d_compare()
a) struct inode is not going to be freed under ->d_compare();
however, the thing PROC_I(inode)->sysctl points to just might.
Fortunately, it's enough to make freeing that sucker delayed,
provided that we don't step on its ->unregistering, clear
the pointer to it in PROC_I(inode) before dropping the reference
and check if it's NULL in ->d_compare().

b) I'm not sure that we *can* walk into NULL inode here (we recheck
dentry->seq between verifying that it's still hashed / fetching
dentry->d_inode and passing it to ->d_compare() and there's no
negative hashed dentries in /proc/sys/*), but if we can walk into
that, we really should not have ->d_compare() return 0 on it!
Said that, I really suspect that this check can be simply killed.
Nick?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-08 02:22:27 -05:00
Linus Torvalds fb62c00a6d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: no .snap inside of snapped namespace
  libceph: fix msgr standby handling
  libceph: fix msgr keepalive flag
  libceph: fix msgr backoff
  libceph: retry after authorization failure
  libceph: fix handling of short returns from get_user_pages
  ceph: do not clear I_COMPLETE from d_release
  ceph: do not set I_COMPLETE
  Revert "ceph: keep reference to parent inode on ceph_dentry"
2011-03-05 10:43:22 -08:00
Andi Kleen 236344d6b4 mm: add alloc_page_vma_node()
Add a alloc_page_vma_node that allows passing the "local" node in.  Used
in a followon patch.

Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-04 17:53:39 -08:00
Andi Kleen 2f5f9486f8 mm: change alloc_pages_vma to pass down the policy node for local policy
Currently alloc_pages_vma() always uses the local node as policy node for
the LOCAL policy.  Pass this node down as an argument instead.

No behaviour change from this patch, but will be needed for followons.

Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-04 17:53:39 -08:00
Sage Weil e76661d0a5 libceph: fix msgr keepalive flag
There was some broken keepalive code using a dead variable.  Shift to using
the proper bit flag.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-04 12:24:31 -08:00
Sage Weil 60bf8bf881 libceph: fix msgr backoff
With commit f363e45f we replaced a bunch of hacky workqueue mutual
exclusion logic with the WQ_NON_REENTRANT flag.  One pieces of fallout is
that the exponential backoff breaks in certain cases:

 * con_work attempts to connect.
 * we get an immediate failure, and the socket state change handler queues
   immediate work.
 * con_work calls con_fault, we decide to back off, but can't queue delayed
   work.

In this case, we add a BACKOFF bit to make con_work reschedule delayed work
next time it runs (which should be immediately).

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-04 12:24:28 -08:00
Linus Torvalds e3e89cc535 Mark ptrace_{traceme,attach,detach} static
They are only used inside kernel/ptrace.c, and have been for a long
time.  We don't want to go back to the bad-old-days when architectures
did things on their own, so make them static and private.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-04 09:23:30 -08:00
Linus Torvalds 4438a02fc4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
  MAINTAINERS: Add Andy Gospodarek as co-maintainer.
  r8169: disable ASPM
  RxRPC: Fix v1 keys
  AF_RXRPC: Handle receiving ACKALL packets
  cnic: Fix lost interrupt on bnx2x
  cnic: Prevent status block race conditions with hardware
  net: dcbnl: check correct ops in dcbnl_ieee_set()
  e1000e: disable broken PHY wakeup for ICH10 LOMs, use MAC wakeup instead
  igb: fix sparse warning
  e1000: fix sparse warning
  netfilter: nf_log: avoid oops in (un)bind with invalid nfproto values
  dccp: fix oops on Reset after close
  ipvs: fix dst_lock locking on dest update
  davinci_emac: Add Carrier Link OK check in Davinci RX Handler
  bnx2x: update driver version to 1.62.00-6
  bnx2x: properly calculate lro_mss
  bnx2x: perform statistics "action" before state transition.
  bnx2x: properly configure coefficients for MinBW algorithm (NPAR mode).
  bnx2x: Fix ethtool -t link test for MF (non-pmf) devices.
  bnx2x: Fix nvram test for single port devices.
  ...
2011-03-03 15:43:15 -08:00
Linus Torvalds fb4b10ab5f Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: kill loop_mutex
  blktrace: Remove blk_fill_rwbs_rq.
  block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue()
  block: add @force_kblockd to __blk_run_queue()
  block: fix kernel-doc format for blkdev_issue_zeroout
  blk-throttle: Do not use kblockd workqueue for throtl work
2011-03-03 15:42:35 -08:00
Tao Ma 2d3a8497f8 blktrace: Remove blk_fill_rwbs_rq.
If we enable trace events to trace block actions, We use
blk_fill_rwbs_rq to analyze the corresponding actions
in request's cmd_flags, but we only choose the minor 2 bits
from it, so most of other flags(e.g, REQ_SYNC) are missing.
For example, with a sync write we get:
write_test-2409  [001]   160.013869: block_rq_insert: 3,64 W 0 () 258135 + =
8 [write_test]

Since now we have integrated the flags of both bio and request,
it is safe to pass rq->cmd_flags directly to blk_fill_rwbs and
blk_fill_rwbs_rq isn't needed any more.

With this patch, after a sync write we get:
write_test-2417  [000]   226.603878: block_rq_insert: 3,64 WS 0 () 258135 +=
 8 [write_test]

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-03 10:53:20 -05:00
Anton Blanchard f009918a1c RxRPC: Fix v1 keys
commit 339412841d (RxRPC: Allow key payloads to be passed in XDR form)
broke klog for me. I notice the v1 key struct had a kif_version field
added:

-struct rxkad_key {
-       u16     security_index;         /* RxRPC header security index */
-       u16     ticket_len;             /* length of ticket[] */
-       u32     expiry;                 /* time at which expires */
-       u32     kvno;                   /* key version number */
-       u8      session_key[8];         /* DES session key */
-       u8      ticket[0];              /* the encrypted ticket */
-};

+struct rxrpc_key_data_v1 {
+       u32             kif_version;            /* 1 */
+       u16             security_index;
+       u16             ticket_length;
+       u32             expiry;                 /* time_t */
+       u32             kvno;
+       u8              session_key[8];
+       u8              ticket[0];
+};

However the code in rxrpc_instantiate strips it away:

	data += sizeof(kver);
	datalen -= sizeof(kver);

Removing kif_version fixes my problem.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02 22:18:53 -08:00
Tejun Heo 1654e7411a block: add @force_kblockd to __blk_run_queue()
__blk_run_queue() automatically either calls q->request_fn() directly
or schedules kblockd depending on whether the function is recursed.
blk-flush implementation needs to be able to explicitly choose
kblockd.  Add @force_kblockd.

All the current users are converted to specify %false for the
parameter and this patch doesn't introduce any behavior change.

stable: This is prerequisite for fixing ide oops caused by the new
        blk-flush implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-02 08:48:05 -05:00
Mark Brown 77bd70e900 mfd: Don't suspend WM8994 if the CODEC is not suspended
ASoC supports keeping the audio subsysetm active over suspend in order
to support use cases such as audio passthrough from a cellular modem
with the main CPU suspended. Ensure that we don't power down the CODEC
when this is happening by checking to see if VMID is up and skipping
suspend and resume when it is. If the CODEC has suspended then it'll
turn VMID off before the core suspend() gets called.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-02 10:57:50 +01:00
Vivek Goyal 450adcbe51 blk-throttle: Do not use kblockd workqueue for throtl work
o Dominik Klein reported a system hang issue while doing some blkio
  throttling testing.

  https://lkml.org/lkml/2011/2/24/173

o Some tracing revealed that CFQ was not dispatching any more jobs as
  queue unplug was not happening. And queue unplug was not happening
  because unplug work was not being called as there was one throttling
  work on same cpu which as not finished yet. And throttling work had not
  finished as it was tyring to dispatch a bio to CFQ but all the request
  descriptors were consume to it was put to sleep.

o So basically it is a cyclic dependecny between CFQ unplug work and
  throtl dispatch work. Tejun suggested that use separate workqueue for
  such cases.

o This patch uses a separate workqueue for throttle related work and
  does not rely on kblockd workqueue anymore.

Cc: stable@kernel.org
Reported-by: Dominik Klein <dk@in-telegence.net>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-01 13:41:53 -05:00
Rafael J. Wysocki af06216a8e ACPI: Fix build for CONFIG_NET unset
Several ACPI drivers fail to build if CONFIG_NET is unset, because
they refer to things depending on CONFIG_THERMAL that in turn depends
on CONFIG_NET.  However, CONFIG_THERMAL doesn't really need to depend
on CONFIG_NET, because the only part of it requiring CONFIG_NET is
the netlink interface in thermal_sys.c.

Put the netlink interface in thermal_sys.c under #ifdef CONFIG_NET
and remove the dependency of CONFIG_THERMAL on CONFIG_NET from
drivers/thermal/Kconfig.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Len Brown <lenb@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Luming Yu <luming.yu@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-28 18:00:31 -08:00
Linus Torvalds dbc39ec4b6 Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm: fix unsigned vs signed comparison issue in modeset ctl ioctl.
  drm/nv50-nvc0: make sure vma is definitely unmapped when destroying bo
2011-02-28 17:58:09 -08:00
Ben Hutchings fbd7184485 mm: <asm-generic/pgtable.h> must include <linux/mm_types.h>
Commit e2cda32264 ("thp: add pmd mangling generic functions") replaced
some macros in <asm-generic/pgtable.h> with inline functions.

If the functions are to be defined (not all architectures need them)
then struct vm_area_struct must be defined first.  So include
<linux/mm_types.h>.

Fixes a build failure seen in Debian:

    CC [M]  drivers/media/dvb/mantis/mantis_pci.o
  In file included from arch/arm/include/asm/pgtable.h:460,
                   from drivers/media/dvb/mantis/mantis_pci.c:25:
  include/asm-generic/pgtable.h: In function 'ptep_test_and_clear_young':
  include/asm-generic/pgtable.h:29: error: dereferencing pointer to incomplete type

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-28 17:46:49 -08:00
Nicholas Bellinger 52208ae3fc [SCSI] target: Fix t_transport_aborted handling in LUN_RESET + active I/O shutdown
This patch addresses two outstanding bugs related to
T_TASK(cmd)->t_transport_aborted handling during TMR LUN_RESET and
active I/O shutdown.

This first involves adding two explict t_transport_aborted=1
assignments in core_tmr_lun_reset() in order to signal the task has
been aborted, and updating transport_generic_wait_for_tasks() to skip
sleeping when t_transport_aborted=1 has been set.  This fixes an issue
where transport_generic_wait_for_tasks() would end up sleeping
indefinately when called from fabric module context while TMR
LUN_RESET was happening with long outstanding backend struct se_task
not yet being completed.

The second adds a missing call to
transport_remove_task_from_execute_queue() when
task->task_execute_queue=1 is set in order to fix an OOPs when
task->t_execute_list has not been dropped.  It also fixes the same
case in transport_processing_shutdown() to prevent the issue from
happening during active I/O struct se_device shutdown.

Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-02-28 11:23:32 -06:00
Dave Airlie 1922756124 drm: fix unsigned vs signed comparison issue in modeset ctl ioctl.
This fixes CVE-2011-1013.

Reported-by: Matthiew Herrb (OpenBSD X.org team)
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-02-28 15:24:35 +10:00
Linus Torvalds 493f3358cb Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM: Make ACPI wakeup from S5 work again when CONFIG_PM_SLEEP is unset
2011-02-25 15:15:17 -08:00
Alexandre Bounine fe41947e1a rapidio: fix sysfs config attribute to access 16MB of maint space
Fixes sysfs config attribute to allow access to entire 16MB maintenance
space of RapidIO devices.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:37 -08:00
Linus Torvalds 638691a7a4 Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
  md: Fix - again - partition detection when array becomes active
  Fix over-zealous flush_disk when changing device size.
  md: avoid spinlock problem in blk_throtl_exit
  md: correctly handle probe of an 'mdp' device.
  md: don't set_capacity before array is active.
  md: Fix raid1->raid0 takeover
2011-02-25 11:13:26 -08:00
Rafael J. Wysocki 805bdaec1a PM: Make ACPI wakeup from S5 work again when CONFIG_PM_SLEEP is unset
Commit 074037e (PM / Wakeup: Introduce wakeup source objects and
event statistics (v3)) caused ACPI wakeup to only work if
CONFIG_PM_SLEEP is set, but it also worked for CONFIG_PM_SLEEP unset
before.  This can be fixed by making device_set_wakeup_enable(),
device_init_wakeup() and device_may_wakeup() work in the same way
as before commit 074037e when CONFIG_PM_SLEEP is unset.

Reported-and-tested-by: Justin Maggard <jmaggard10@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-02-24 19:53:06 +01:00
NeilBrown 93b270f76e Fix over-zealous flush_disk when changing device size.
There are two cases when we call flush_disk.
In one, the device has disappeared (check_disk_change) so any
data will hold becomes irrelevant.
In the oter, the device has changed size (check_disk_size_change)
so data we hold may be irrelevant.

In both cases it makes sense to discard any 'clean' buffers,
so they will be read back from the device if needed.

In the former case it makes sense to discard 'dirty' buffers
as there will never be anywhere safe to write the data.  In the
second case it *does*not* make sense to discard dirty buffers
as that will lead to file system corruption when you simply enlarge
the containing devices.

flush_disk calls __invalidate_devices.
__invalidate_device calls both invalidate_inodes and invalidate_bdev.

invalidate_inodes *does* discard I_DIRTY inodes and this does lead
to fs corruption.

invalidate_bev *does*not* discard dirty pages, but I don't really care
about that at present.

So this patch adds a flag to __invalidate_device (calling it
__invalidate_device2) to indicate whether dirty buffers should be
killed, and this is passed to invalidate_inodes which can choose to
skip dirty inodes.

flusk_disk then passes true from check_disk_change and false from
check_disk_size_change.

dm avoids tripping over this problem by calling i_size_write directly
rathher than using check_disk_size_change.

md does use check_disk_size_change and so is affected.

This regression was introduced by commit 608aeef17a which causes
check_disk_size_change to call flush_disk, so it is suitable for any
kernel since 2.6.27.

Cc: stable@kernel.org
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Andrew Patterson <andrew.patterson@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-24 17:25:47 +11:00
Miklos Szeredi 2aa15890f3 mm: prevent concurrent unmap_mapping_range() on the same inode
Michael Leun reported that running parallel opens on a fuse filesystem
can trigger a "kernel BUG at mm/truncate.c:475"

Gurudas Pai reported the same bug on NFS.

The reason is, unmap_mapping_range() is not prepared for more than
one concurrent invocation per inode.  For example:

  thread1: going through a big range, stops in the middle of a vma and
     stores the restart address in vm_truncate_count.

  thread2: comes in with a small (e.g. single page) unmap request on
     the same vma, somewhere before restart_address, finds that the
     vma was already unmapped up to the restart address and happily
     returns without doing anything.

Another scenario would be two big unmap requests, both having to
restart the unmapping and each one setting vm_truncate_count to its
own value.  This could go on forever without any of them being able to
finish.

Truncate and hole punching already serialize with i_mutex.  Other
callers of unmap_mapping_range() do not, and it's difficult to get
i_mutex protection for all callers.  In particular ->d_revalidate(),
which calls invalidate_inode_pages2_range() in fuse, may be called
with or without i_mutex.

This patch adds a new mutex to 'struct address_space' to prevent
running multiple concurrent unmap_mapping_range() on the same mapping.

[ We'll hopefully get rid of all this with the upcoming mm
  preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
  lockbreak" patch in particular.  But that is for 2.6.39 ]

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reported-by: Michael Leun <lkml20101129@newton.leun.net>
Reported-by: Gurudas Pai <gurudas.pai@oracle.com>
Tested-by: Gurudas Pai <gurudas.pai@oracle.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-23 19:52:52 -08:00
Linus Torvalds ef3242859f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits)
  Added support for usb ethernet (0x0fe6, 0x9700)
  r8169: fix RTL8168DP power off issue.
  r8169: correct settings of rtl8102e.
  r8169: fix incorrect args to oob notify.
  DM9000B: Fix PHY power for network down/up
  DM9000B: Fix reg_save after spin_lock in dm9000_timeout
  net_sched: long word align struct qdisc_skb_cb data
  sfc: lower stack usage in efx_ethtool_self_test
  bridge: Use IPv6 link-local address for multicast listener queries
  bridge: Fix MLD queries' ethernet source address
  bridge: Allow mcast snooping for transient link local addresses too
  ipv6: Add IPv6 multicast address flag defines
  bridge: Add missing ntohs()s for MLDv2 report parsing
  bridge: Fix IPv6 multicast snooping by correcting offset in MLDv2 report
  bridge: Fix IPv6 multicast snooping by storing correct protocol type
  p54pci: update receive dma buffers before and after processing
  fix cfg80211_wext_siwfreq lock ordering...
  rt2x00: Fix WPA TKIP Michael MIC failures.
  ath5k: Fix fast channel switching
  tcp: undo_retrans counter fixes
  ...
2011-02-23 16:02:00 -08:00
Eric Dumazet 9e924cf407 net_sched: long word align struct qdisc_skb_cb data
netem_skb_cb() does :

return (struct netem_skb_cb *)qdisc_skb_cb(skb)->data;

Unfortunatly struct qdisc_skb_cb data is not long word aligned, so
access to psched_time_t time_to_send uses a non aligned access.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-23 14:17:02 -08:00
Linus Lüssing 5ced133961 ipv6: Add IPv6 multicast address flag defines
This commit adds the missing IPv6 multicast address flag defines to
complement the already existing multicast address scope defines and to
be able to check these flags nicely in the future.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 10:07:27 -08:00
Linus Torvalds d8204a37ba Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6
* 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6:
  pcmcia: re-enable Zoomed Video support
  cm4000_cs: Fix undefined ops warning
  pcmcia vs. MECR on pxa25x/sa1111
  drivers/char/pcmcia/ipwireless/main.c: Convert release_resource to release_region/release_mem_region
2011-02-22 09:26:54 -08:00
Linus Torvalds 609b06f335 Merge branch 'fix/asoc' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'fix/asoc' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ASoC: Ensure supplies are maintained for force enabled widgets
  ASoC: WM8994: Improve playback robustness
  ASoC: WM8994: Improve robustness in some use cases
  ASoC: WM8903: Fix mic detection enable logic
  ASoC: WM8903: Fix mic detection register definitions
  ASoC: CX20442: fix wrong reg_cache_default content
  ASoC: Sync initial widget state with hardware
2011-02-22 08:20:02 -08:00
Dmitry Torokhov 98562ad8cb module: explicitly align module_version_attribute structure
We force particular alignment when we generate attribute structures
when generation MODULE_VERSION() data and we need to make sure that
this alignment is followed when we iterate over these structures,
otherwise we may crash on platforms whose natural alignment is not
sizeof(void *), such as m68k.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
[ There are more issues here, but the fixes are incredibly ugly - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-21 15:21:53 -08:00
Dominik Brodowski 33619f0d3f pcmcia: re-enable Zoomed Video support
Allow drivers to enable Zoomed Video support. Currently, this is only
used by out-of-tree drivers (L64020 DVB driver in particular).

CC: <stable@kernel.org> [for 2.6.37]
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2011-02-20 12:47:34 +01:00
John Fastabend 226111d1fb net: dcb: match dcb_app protocol field with 802.1Qaz spec
The dcb_app protocol field is a __u32 however the 802.1Qaz
specification defines it as a 16 bit field. This patch brings
the structure inline with the spec making it a __u16.

CC: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-19 19:00:50 -08:00
David S. Miller ece639caa3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2011-02-19 16:42:37 -08:00
Linus Torvalds 0cc9d52578 Merge branch 'rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  RTC: Re-enable UIE timer/polling emulation
  RTC: Revert UIE emulation removal
  RTC: Release mutex in error path of rtc_alarm_irq_enable
2011-02-18 14:20:46 -08:00
Linus Torvalds bc3adfc670 Merge branch 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long
  workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'
  workqueue: wake up a worker when a rescuer is leaving a gcwq
2011-02-18 12:36:06 -08:00
Linus Torvalds 3c18d4de86 Expand CONFIG_DEBUG_LIST to several other list operations
When list debugging is enabled, we aim to readably show list corruption
errors, and the basic list_add/list_del operations end up having extra
debugging code in them to do some basic validation of the list entries.

However, "list_del_init()" and "list_move[_tail]()" ended up avoiding
the debug code due to how they were written. This fixes that.

So the _next_ time we have list_move() problems with stale list entries,
we'll hopefully have an easier time finding them..

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-18 11:32:28 -08:00
John Stultz 456d66ecd0 RTC: Re-enable UIE timer/polling emulation
This patch re-enables UIE timer/polling emulation for rtc devices
that do not support alarm irqs.

CC: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-02-17 14:59:42 -08:00
John Stultz 6e57b1d6a8 RTC: Revert UIE emulation removal
Uwe pointed out that my alarm based UIE emulation is not sufficient
to replace the older timer/polling based UIE emulation on devices
where there is no alarm irq. This causes rtc devices without alarms
to return -EINVAL to UIE ioctls. The fix is to re-instate the old
timer/polling method for devices without alarm irqs.

This patch reverts the following commits:
042620a018 - Remove UIE emulation
1daeddd596 - Cleanup removed UIE emulation declaration
b5cc8ca1c9 - Remove Kconfig symbol for UIE emulation

The emulation mode will still need to be wired-in with a following
patch before it will work.

CC: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-02-17 14:59:41 -08:00
Florian Westphal d503b30bd6 netfilter: tproxy: do not assign timewait sockets to skb->sk
Assigning a socket in timewait state to skb->sk can trigger
kernel oops, e.g. in nfnetlink_log, which does:

if (skb->sk) {
        read_lock_bh(&skb->sk->sk_callback_lock);
        if (skb->sk->sk_socket && skb->sk->sk_socket->file) ...

in the timewait case, accessing sk->sk_callback_lock and sk->sk_socket
is invalid.

Either all of these spots will need to add a test for sk->sk_state != TCP_TIME_WAIT,
or xt_TPROXY must not assign a timewait socket to skb->sk.

This does the latter.

If a TW socket is found, assign the tproxy nfmark, but skip the skb->sk assignment,
thus mimicking behaviour of a '-m socket .. -j MARK/ACCEPT' re-routing rule.

The 'SYN to TW socket' case is left unchanged -- we try to redirect to the
listener socket.

Cc: Balazs Scheidler <bazsi@balabit.hu>
Cc: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-17 11:32:38 +01:00