Commit graph

233877 commits

Author SHA1 Message Date
Ben Hutchings fbd7184485 mm: <asm-generic/pgtable.h> must include <linux/mm_types.h>
Commit e2cda32264 ("thp: add pmd mangling generic functions") replaced
some macros in <asm-generic/pgtable.h> with inline functions.

If the functions are to be defined (not all architectures need them)
then struct vm_area_struct must be defined first.  So include
<linux/mm_types.h>.

Fixes a build failure seen in Debian:

    CC [M]  drivers/media/dvb/mantis/mantis_pci.o
  In file included from arch/arm/include/asm/pgtable.h:460,
                   from drivers/media/dvb/mantis/mantis_pci.c:25:
  include/asm-generic/pgtable.h: In function 'ptep_test_and_clear_young':
  include/asm-generic/pgtable.h:29: error: dereferencing pointer to incomplete type

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-28 17:46:49 -08:00
Hegde, Vinay 0a5f384677 davinci_emac: Add Carrier Link OK check in Davinci RX Handler
This patch adds an additional check in the Davinci EMAC RX Handler,
which tests the __LINK_STATE_NOCARRIER flag along with the
__LINK_STATE_START flag as part EMAC shutting down procedure.

This avoids
WARNING: at drivers/net/davinci_emac.c:1040 emac_rx_handler+0xf8/0x120()
during rtcwake used to suspend the target for a specified duration.

Signed-off-by: Hegde, Vinay <vinay.hegde@ti.com>
Acked-by: Cyril Chemparathy <cyril@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 13:16:14 -08:00
Dmitry Kravkov b746f7e52f bnx2x: update driver version to 1.62.00-6
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 13:14:50 -08:00
Vladislav Zolotarov e4e3c02a1a bnx2x: properly calculate lro_mss
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 13:14:49 -08:00
Vladislav Zolotarov 63135281af bnx2x: perform statistics "action" before state transition.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 13:14:48 -08:00
Dmitry Kravkov ff80ee029b bnx2x: properly configure coefficients for MinBW algorithm (NPAR mode).
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 13:14:48 -08:00
Dmitry Kravkov 633ac3637a bnx2x: Fix ethtool -t link test for MF (non-pmf) devices.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 13:14:47 -08:00
Dmitry Kravkov d4215ef71f bnx2x: Fix nvram test for single port devices.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 13:14:47 -08:00
Dmitry Kravkov faa6fcbbba bnx2x: (NPAR mode) Fix FW initialization
Fix FW initialization according to max BW stored in percents
 for NPAR mode. Protect HW from being configured to speed 0.

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 13:14:46 -08:00
Lars Ellenberg e3fa3aff0c net: fix nla_policy_len to actually _iterate_ over the policy
Currently nla_policy_len always returns n * NLA_HDRLEN:
It loops, but does not advance it's iterator.
NLA_UNSPEC == 0 does not contain a .len in any policy.

Trivially fixed by adding p++.

Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:38:25 -08:00
Rémi Denis-Courmont f5a4532528 f_phonet: avoid pskb_pull(), fix OOPS with CONFIG_HIGHMEM
This is similar to what we already do in cdc-phonet.c in the same
situation. pskb_pull() refuses to work with HIGHMEM, even if it is
known that the socket buffer is entirely in "low" memory.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:36:39 -08:00
Axel Lin 9eb0e6f26e net/fec: fix unterminated platform_device_id table
The platform_device_id table is supposed to be zero-terminated.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:35:44 -08:00
Randy Dunlap a90e81579d net: update Documentation/networking/00-INDEX
Clean up entries in 00-INDEX: drop files that have been removed.

Reported-by: Rob Landley <rlandley@parallels.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Rob Landley <rlandley@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:33:19 -08:00
Ilya Yanok b093dd9684 dnet: fix wrong use of platform_set_drvdata()
platform_set_drvdata() was used with argument of incorrect type and
could cause memory corruption. Moreover, because of not setting drvdata
in the correct place not all resources were freed upon module unload.

Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:22:21 -08:00
Jamie Iles 915239472a macb: don't use platform_set_drvdata() on a net_device
Commit 71d6429 (Driver core: convert platform_{get,set}_drvdata to
static inline functions) now triggers a warning in the macb network
driver:

  CC      drivers/net/macb.o
drivers/net/macb.c: In function ‘macb_mii_init’:
drivers/net/macb.c:263: warning: passing argument 1 of ‘platform_set_drvdata’ from incompatible pointer type
include/linux/platform_device.h:138: note: expected ‘struct platform_device *’ but argument is of type ‘struct net_device *’

Use dev_set_drvdata() on the device embedded in the net_device instead.

Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:22:20 -08:00
Andrey Vagin b44d211e16 netlink: handle errors from netlink_dump()
netlink_dump() may failed, but nobody handle its error.
It generates output data, when a previous portion has been returned to
user space. This mechanism works when all data isn't go in skb. If we
enter in netlink_recvmsg() and skb is absent in the recv queue, the
netlink_dump() will not been executed. So if netlink_dump() is failed
one time, the new data never appear and the reader will sleep forever.

netlink_dump() is called from two places:

1. from netlink_sendmsg->...->netlink_dump_start().
   In this place we can report error directly and it will be returned
   by sendmsg().

2. from netlink_recvmsg
   There we can't report error directly, because we have a portion of
   valid output data and call netlink_dump() for prepare the next portion.
   If netlink_dump() is failed, the socket will be mark as error and the
   next recvmsg will be failed.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:18:12 -08:00
Kurt Van Dijck dad3d44dcb CAN: add controller hardware name for Softing cards
I just found that the controller hardware name is not set for the Softing
driver. After this patch, "$ ip -d link show" looks nicer.

Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:08:48 -08:00
Justin Mattock 19d73f3c6f drivers:isdn:istream.c Fix typo pice to piece
The below patch changes a typo "pice" to "piece"

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:07:32 -08:00
Ken Kawasaki 27aadb615a fmvj18x_cs: add new id
fmvj18x_cs:add new id
           Toshiba lan&modem multifuction card (model name:IPC5010A)

Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:06:20 -08:00
Christian Lamparter 2b799a6b25 p54usb: add Senao NUB-350 usbid
Reported-by: Mark Davis
Cc: <stable@kernel.org>
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-28 14:00:38 -05:00
Sujith Manoharan 2c27392dc4 ath9k_htc: Fix an endian issue
The stream length/tag fields have to be in little endian
format. Fixing this makes the driver work on big-endian
platforms.

Cc: stable@kernel.org
Tested-by: raghunathan.kailasanathan@wipro.com
Signed-off-by: Sujith Manoharan <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-28 13:39:06 -05:00
Don Zickus 299c56966a x86: Use u32 instead of long to set reset vector back to 0
A customer of ours, complained that when setting the reset
vector back to 0, it trashed other data and hung their box.
They noticed when only 4 bytes were set to 0 instead of 8,
everything worked correctly.

Mathew pointed out:

 |
 | We're supposed to be resetting trampoline_phys_low and
 | trampoline_phys_high here, which are two 16-bit values.
 | Writing 64 bits is definitely going to overwrite space
 | that we're not supposed to be touching.
 |

So limit the area modified to u32.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Matthew Garrett <mjg@redhat.com>
Cc: <stable@kernel.org>
LKML-Reference: <1297139100-424-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-28 16:22:18 +01:00
Thomas Renninger 54b08f5f90 perf timechart: Fix max number of cpus
Currently numcpus is determined in pid_put_sample which is only
called on sched_switch/sched_wakeup sample processing.

On a machine with a lot cpus I often saw the last cpu missing.

Check for (max) numcpus on every event happening and in the
beginning. -> fixes the issue for me.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: lenb@kernel.org
LKML-Reference: <1298842606-55712-6-git-send-email-trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-28 08:56:15 +01:00
Thomas Renninger e853072055 perf timechart: Fix black idle boxes in the title
This fix is needed for eye of gnome and firefox svg viewers.
Only Inkscape can handle the broken case.

Compare with the other svg_legenda_box declarations, looks
like a typo slipped in at this place.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: lenb@kernel.org
LKML-Reference: <1298842606-55712-5-git-send-email-trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-28 08:56:14 +01:00
Dave Airlie 467a29ea5a Merge remote branch 'nouveau/drm-nouveau-fixes' of /ssd/git/drm-nouveau-next into drm-fixes
* 'nouveau/drm-nouveau-fixes' of /ssd/git/drm-nouveau-next:
  drm/nv50-nvc0: make sure vma is definitely unmapped when destroying bo
2011-02-28 15:35:16 +10:00
Dave Airlie 1922756124 drm: fix unsigned vs signed comparison issue in modeset ctl ioctl.
This fixes CVE-2011-1013.

Reported-by: Matthiew Herrb (OpenBSD X.org team)
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-02-28 15:24:35 +10:00
Ben Skeggs 7db2662325 drm/nv50-nvc0: make sure vma is definitely unmapped when destroying bo
Somehow fixes a misrendering + hang at GDM startup on my NVA8...

My first guess would have been stale TLB entries laying around that a new
bo then accidentally inherits.  That doesn't make a great deal of sense
however, as when we mapped the pages for the new bo the TLBs would've
gotten flushed anyway.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2011-02-28 15:00:16 +10:00
Vladislav Zolotarov df213559f0 bnx2x: Add a missing bit for PXP parity register of 57712.
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-27 16:05:48 -08:00
Arnaldo Carvalho de Melo a89d030ee4 dccp: Change maintainer
Today was as good as any other day, but I felt I had to do things I love
to when paying hommage to somebody I love, so please apply this one,
something he would be proud of, even if so geekly.

    Way past it was/is deserved.

    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-26 21:46:57 -08:00
axel lin a7254d68b6 hwmon: (adt7411) add MODULE_DEVICE_TABLE
The device table is required to load modules based on modaliases.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
2011-02-26 08:59:32 -08:00
axel lin 6407deb594 hwmon: (ad7414) add MODULE_DEVICE_TABLE
The device table is required to load modules based on modaliases.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
2011-02-26 08:59:32 -08:00
Takashi Iwai 11be6a269d Merge branch 'fix/asoc' into for-linus 2011-02-26 11:27:47 +01:00
Thomas Gleixner 3a142a0672 clockevents: Prevent oneshot mode when broadcast device is periodic
When the per cpu timer is marked CLOCK_EVT_FEAT_C3STOP, then we only
can switch into oneshot mode, when the backup broadcast device
supports oneshot mode as well. Otherwise we would try to switch the
broadcast device into an unsupported mode unconditionally. This went
unnoticed so far as the current available broadcast devices support
oneshot mode. Seth unearthed this problem while debugging and working
around an hpet related BIOS wreckage.

Add the necessary check to tick_is_oneshot_available().

Reported-and-tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <alpine.LFD.2.00.1102252231200.2701@localhost6.localdomain6>
Cc: stable@kernel.org # .21 ->
2011-02-26 09:45:28 +01:00
Linus Torvalds 493f3358cb Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM: Make ACPI wakeup from S5 work again when CONFIG_PM_SLEEP is unset
2011-02-25 15:15:17 -08:00
Alexandre Bounine fe41947e1a rapidio: fix sysfs config attribute to access 16MB of maint space
Fixes sysfs config attribute to allow access to entire 16MB maintenance
space of RapidIO devices.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:37 -08:00
Alexander Gordeev 99b0d365e5 pps: initialize ts_real properly
Initialize ts_real.flags to fix compiler warning about possible
uninitialized use of this field.

Signed-off-by: Alexander Gordeev <lasaine@lvk.cs.msu.su>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Rodolfo Giometti <giometti@linux.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:37 -08:00
Hugh Dickins e5598f8bf5 memcg: more mem_cgroup_uncharge() batching
It seems odd that truncate_inode_pages_range(), called not only when
truncating but also when evicting inodes, has mem_cgroup_uncharge_start
and _end() batching in its second loop to clear up a few leftovers, but
not in its first loop that does almost all the work: add them there too.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:37 -08:00
Andi Kleen 8eac563c1c thp: fix interleaving for transparent hugepages
The THP code didn't pass the correct interleaving shift to the memory
policy code.  Fix this here by adjusting for the order.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:37 -08:00
Jan Kara 7137c6bd45 aio: fix race between io_destroy() and io_submit()
A race can occur when io_submit() races with io_destroy():

 CPU1						CPU2
io_submit()
  do_io_submit()
    ...
    ctx = lookup_ioctx(ctx_id);
						io_destroy()
    Now do_io_submit() holds the last reference to ctx.
    ...
    queue new AIO
    put_ioctx(ctx) - frees ctx with active AIOs

We solve this issue by checking whether ctx is being destroyed in AIO
submission path after adding new AIO to ctx.  Then we are guaranteed that
either io_destroy() waits for new AIO or we see that ctx is being
destroyed and bail out.

Cc: Nick Piggin <npiggin@kernel.dk>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:37 -08:00
Nick Piggin 3bd9a5d734 aio: fix rcu ioctx lookup
aio-dio-invalidate-failure GPFs in aio_put_req from io_submit.

lookup_ioctx doesn't implement the rcu lookup pattern properly.
rcu_read_lock does not prevent refcount going to zero, so we might take
a refcount on a zero count ioctx.

Fix the bug by atomically testing for zero refcount before incrementing.

[jack@suse.cz: added comment into the code]
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:37 -08:00
Namhyung Kim 29723fccc8 mm: fix dubious code in __count_immobile_pages()
When pfn_valid_within() failed 'iter' was incremented twice.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:37 -08:00
Lei Xu a2d6d2fa90 drivers/rtc/rtc-ds3232.c: fix time range difference between linux and RTC chip
In linux rtc_time struct, tm_mon range is 0~11, tm_wday range is 0~6,
while in RTC HW REG, month range is 1~12, day of the week range is 1~7,
this patch adjusts difference of them.

The efect of this bug was that most of month will be operated on as the
next month by the hardware (When in Jan it maybe even worse).  For
example, if in May, software wrote 4 to the hardware, which handled it as
April.  Then the logic would be different between software and hardware,
which would cause weird things to happen.

Signed-off-by: Lei Xu <B33228@freescale.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Jack Lan <jack.lan@freescale.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:37 -08:00
Timo Warns 294f6cf486 ldm: corrupted partition table can cause kernel oops
The kernel automatically evaluates partition tables of storage devices.
The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains
a bug that causes a kernel oops on certain corrupted LDM partitions.  A
kernel subsystem seems to crash, because, after the oops, the kernel no
longer recognizes newly connected storage devices.

The patch changes ldm_parse_vmdb() to Validate the value of vblk_size.

Signed-off-by: Timo Warns <warns@pre-sense.de>
Cc: Eugene Teo <eugeneteo@kernel.sg>
Acked-by: Richard Russon <ldm@flatcap.org>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:36 -08:00
Mel Gorman 2876592f23 mm: vmscan: stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT
should_continue_reclaim() for reclaim/compaction allows scanning to
continue even if pages are not being reclaimed until the full list is
scanned.  In terms of allocation success, this makes sense but potentially
it introduces unwanted latency for high-order allocations such as
transparent hugepages and network jumbo frames that would prefer to fail
the allocation attempt and fallback to order-0 pages.  Worse, there is a
potential that the full LRU scan will clear all the young bits, distort
page aging information and potentially push pages into swap that would
have otherwise remained resident.

This patch will stop reclaim/compaction if no pages were reclaimed in the
last SWAP_CLUSTER_MAX pages that were considered.  For allocations such as
hugetlbfs that use __GFP_REPEAT and have fewer fallback options, the full
LRU list may still be scanned.

Order-0 allocation should not be affected because RECLAIM_MODE_COMPACTION
is not set so the following avoids the gfp_mask being examined:

        if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION))
                return false;

A tool was developed based on ftrace that tracked the latency of
high-order allocations while transparent hugepage support was enabled and
three benchmarks were run.  The "fix-infinite" figures are 2.6.38-rc4 with
Johannes's patch "vmscan: fix zone shrinking exit when scan work is done"
applied.

  STREAM Highorder Allocation Latency Statistics
                 fix-infinite     break-early
  1 :: Count            10298           10229
  1 :: Min             0.4560          0.4640
  1 :: Mean            1.0589          1.0183
  1 :: Max            14.5990         11.7510
  1 :: Stddev          0.5208          0.4719
  2 :: Count                2               1
  2 :: Min             1.8610          3.7240
  2 :: Mean            3.4325          3.7240
  2 :: Max             5.0040          3.7240
  2 :: Stddev          1.5715          0.0000
  9 :: Count           111696          111694
  9 :: Min             0.5230          0.4110
  9 :: Mean           10.5831         10.5718
  9 :: Max            38.4480         43.2900
  9 :: Stddev          1.1147          1.1325

Mean time for order-1 allocations is reduced.  order-2 looks increased but
with so few allocations, it's not particularly significant.  THP mean
allocation latency is also reduced.  That said, allocation time varies so
significantly that the reductions are within noise.

Max allocation time is reduced by a significant amount for low-order
allocations but reduced for THP allocations which presumably are now
breaking before reclaim has done enough work.

  SysBench Highorder Allocation Latency Statistics
                 fix-infinite     break-early
  1 :: Count            15745           15677
  1 :: Min             0.4250          0.4550
  1 :: Mean            1.1023          1.0810
  1 :: Max            14.4590         10.8220
  1 :: Stddev          0.5117          0.5100
  2 :: Count                1               1
  2 :: Min             3.0040          2.1530
  2 :: Mean            3.0040          2.1530
  2 :: Max             3.0040          2.1530
  2 :: Stddev          0.0000          0.0000
  9 :: Count             2017            1931
  9 :: Min             0.4980          0.7480
  9 :: Mean           10.4717         10.3840
  9 :: Max            24.9460         26.2500
  9 :: Stddev          1.1726          1.1966

Again, mean time for order-1 allocations is reduced while order-2
allocations are too few to draw conclusions from.  The mean time for THP
allocations is also slightly reduced albeit the reductions are within
varianes.

Once again, our maximum allocation time is significantly reduced for
low-order allocations and slightly increased for THP allocations.

  Anon stream mmap reference Highorder Allocation Latency Statistics
  1 :: Count             1376            1790
  1 :: Min             0.4940          0.5010
  1 :: Mean            1.0289          0.9732
  1 :: Max             6.2670          4.2540
  1 :: Stddev          0.4142          0.2785
  2 :: Count                1               -
  2 :: Min             1.9060               -
  2 :: Mean            1.9060               -
  2 :: Max             1.9060               -
  2 :: Stddev          0.0000               -
  9 :: Count            11266           11257
  9 :: Min             0.4990          0.4940
  9 :: Mean        27250.4669      24256.1919
  9 :: Max      11439211.0000    6008885.0000
  9 :: Stddev     226427.4624     186298.1430

This benchmark creates one thread per CPU which references an amount of
anonymous memory 1.5 times the size of physical RAM.  This pounds swap
quite heavily and is intended to exercise THP a bit.

Mean allocation time for order-1 is reduced as before.  It's also reduced
for THP allocations but the variations here are pretty massive due to
swap.  As before, maximum allocation times are significantly reduced.

Overall, the patch reduces the mean and maximum allocation latencies for
the smaller high-order allocations.  This was with Slab configured so it
would be expected to be more significant with Slub which uses these size
allocations more aggressively.

The mean allocation times for THP allocations are also slightly reduced.
The maximum latency was slightly increased as predicted by the comments
due to reclaim/compaction breaking early.  However, workloads care more
about the latency of lower-order allocations than THP so it's an
acceptable trade-off.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:36 -08:00
Matti J. Aaltonen ac3c830419 drivers/nfc/pn544.c: add missing regulator
The regulator framework is used for power management.  The regulators are
only named in the driver code, the actual control stuff is in the board
file for each architecture or use case.

The PN544 chip has three regulators that can be controlled or not -
depending on the architecture where the chip is being used.  So some of
the regulators may not be controllable.  In our current case the third
regulator, which was missing from the code, went unnoticed because we
didn't need to control it.  To be as general as possible - in this respect
- the driver needs to list all regulators.  Then the board file can be
used to actually set the usage.

Signed-off-by: Matti J. Aaltonen <matti.j.aaltonen@nokia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:36 -08:00
Matti J. Aaltonen d73fa4b914 drivers/nfc/Kconfig: use full form of the NFC acronym
Spell out the NFC acronym when it's shown for the first time.

Signed-off-by: Matti J. Aaltonen <matti.j.aaltonen@nokia.com>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:36 -08:00
FUJITA Tomonori fba99fa38b swiotlb: fix wrong panic
swiotlb's map_page wrongly calls panic() when it can't find a buffer fit
for device's dma mask.  It should return an error instead.

Devices with an odd dma mask (i.e.  under 4G) like b44 network card hit
this bug (the system crashes):

   http://marc.info/?l=linux-kernel&m=129648943830106&w=2

If swiotlb returns an error, b44 driver can use the own bouncing
mechanism.

Reported-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:36 -08:00
Harry Wei f8407f26b4 MAINTAINERS: add Chinese documentation maintainer
I have translated some kernel documentation so I wish to maintain the
Chinese documentation in our kernel directories.

Signed-off-by: Harry Wei <harryxiyou@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:36 -08:00
Greg Thelen a879bf582d mm: grab rcu read lock in move_pages()
The move_pages() usage of find_task_by_vpid() requires rcu_read_lock() to
prevent free_pid() from reclaiming the pid.

Without this patch, RCU warnings are printed in v2.6.38-rc4 move_pages()
with:

  CONFIG_LOCKUP_DETECTOR=y
  CONFIG_PREEMPT=y
  CONFIG_LOCKDEP=y
  CONFIG_PROVE_LOCKING=y
  CONFIG_PROVE_RCU=y

Previously, migrate_pages() went through a similar transformation
replacing usage of tasklist_lock with rcu read lock:

  commit 55cfaa3cbd
  Author: Zeng Zhaoming <zengzm.kernel@gmail.com>
  Date:   Thu Dec 2 14:31:13 2010 -0800

      mm/mempolicy.c: add rcu read lock to protect pid structure

  commit 1e50df39f6
  Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
  Date:   Thu Jan 13 15:46:14 2011 -0800

      mempolicy: remove tasklist_lock from migrate_pages

Signed-off-by: Greg Thelen <gthelen@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Zeng Zhaoming <zengzm.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:36 -08:00
Davide Libenzi 22bacca48a epoll: prevent creating circular epoll structures
In several places, an epoll fd can call another file's ->f_op->poll()
method with ep->mtx held.  This is in general unsafe, because that other
file could itself be an epoll fd that contains the original epoll fd.

The code defends against this possibility in its own ->poll() method using
ep_call_nested, but there are several other unsafe calls to ->poll
elsewhere that can be made to deadlock.  For example, the following simple
program causes the call in ep_insert recursively call the original fd's
->poll, leading to deadlock:

 #include <unistd.h>
 #include <sys/epoll.h>

 int main(void) {
     int e1, e2, p[2];
     struct epoll_event evt = {
         .events = EPOLLIN
     };

     e1 = epoll_create(1);
     e2 = epoll_create(2);
     pipe(p);

     epoll_ctl(e2, EPOLL_CTL_ADD, e1, &evt);
     epoll_ctl(e1, EPOLL_CTL_ADD, p[0], &evt);
     write(p[1], p, sizeof p);
     epoll_ctl(e1, EPOLL_CTL_ADD, e2, &evt);

     return 0;
 }

On insertion, check whether the inserted file is itself a struct epoll,
and if so, do a recursive walk to detect whether inserting this file would
create a loop of epoll structures, which could lead to deadlock.

[nelhage@ksplice.com: Use epmutex to serialize concurrent inserts]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
Reported-by: Nelson Elhage <nelhage@ksplice.com>
Tested-by: Nelson Elhage <nelhage@ksplice.com>
Cc: <stable@kernel.org>		[2.6.34+, possibly earlier]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:36 -08:00