Commit graph

321196 commits

Author SHA1 Message Date
Ben Hutchings 1485348d24 tcp: Apply device TSO segment limit earlier
Cache the device gso_max_segs in sock::sk_gso_max_segs and use it to
limit the size of TSO skbs.  This avoids the need to fall back to
software GSO for local TCP senders.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-02 00:19:17 -07:00
Ben Hutchings 7e6d06f0de sfc: Fix maximum number of TSO segments and minimum TX queue size
Currently an skb requiring TSO may not fit within a minimum-size TX
queue.  The TX queue selected for the skb may stall and trigger the TX
watchdog repeatedly (since the problem skb will be retried after the
TX reset).  This issue is designated as CVE-2012-3412.

Set the maximum number of TSO segments for our devices to 100.  This
should make no difference to behaviour unless the actual MSS is less
than about 700.  Increase the minimum TX queue size accordingly to
allow for 2 worst-case skbs, so that there will definitely be space
to add an skb after we wake a queue.

To avoid invalidating existing configurations, change
efx_ethtool_set_ringparam() to fix up values that are too small rather
than returning -EINVAL.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-02 00:19:17 -07:00
Ben Hutchings 30b678d844 net: Allow driver to limit number of GSO segments per skb
A peer (or local user) may cause TCP to use a nominal MSS of as little
as 88 (actual MSS of 76 with timestamps).  Given that we have a
sufficiently prodigious local sender and the peer ACKs quickly enough,
it is nevertheless possible to grow the window for such a connection
to the point that we will try to send just under 64K at once.  This
results in a single skb that expands to 861 segments.

In some drivers with TSO support, such an skb will require hundreds of
DMA descriptors; a substantial fraction of a TX ring or even more than
a full ring.  The TX queue selected for the skb may stall and trigger
the TX watchdog repeatedly (since the problem skb will be retried
after the TX reset).  This particularly affects sfc, for which the
issue is designated as CVE-2012-3412.

Therefore:
1. Add the field net_device::gso_max_segs holding the device-specific
   limit.
2. In netif_skb_features(), if the number of segments is too high then
   mask out GSO features to force fall back to software GSO.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-02 00:19:17 -07:00
Linus Torvalds 1a9b4993b7 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
 "The lion share of this pull request are fixes for clk-related breakage
  caused by other changes during this merge window.  For some platforms
  the fix was as simple as selecting HAVE_CLK, for others like the
  Loongson 2 significant restructuring was required.

  The remainder are changes required to get the Lantiq code to work
  again."

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: Loongson 2: Sort out clock managment.
  MIPS: Loongson 1: more clk support and add select HAVE_CLK
  MIPS: txx9: Fix redefinition of clk_* by adding select HAVE_CLK
  MIPS: BCM63xx: Fix redefinition of clk_* by adding select HAVE_CLK
  MIPS: AR7: Fix redefinition of clk_* by adding select HAVE_CLK
  MIPS: Lantiq: Platform specific CLK fixup
  MIPS: Lantiq: Add device_tree_init function
  MIPS: Lantiq: Fix interface clock and PCI control register offset
2012-08-01 16:47:15 -07:00
Linus Torvalds 1871e845e5 Merge branch 'for-linus-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML fixes from Richard Weinberger:
 "This patch set contains mostly fixes and cleanups.  The UML tty driver
  uses now tty_port and is no longer broken like hell  :-)"

* 'for-linus-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Add arch/x86/um to MAINTAINERS
  um: pass siginfo to guest process
  um: fix ubd_file_size for read-only files
  um: pull interrupt_end() into userspace()
  um: split syscall_trace(), pass pt_regs to it
  um: switch UPT_SET_RETURN_VALUE and regs_return_value to pt_regs
  um: set BLK_CGROUP=y in defconfig
  um: remove count_lock
  um: fully use tty_port
  um: Remove dead code
  um: remove line_ioctl()
  TTY: um/line, use tty from tty_port
  TTY: um/line, add tty_port
2012-08-01 16:45:02 -07:00
Linus Torvalds a6dc77254b Merge branch 'dmaengine' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM DMA engine updates from Russell King:
 "This looks scary at first glance, but what it is is:
   - a rework of the sa11x0 DMA engine driver merged during the previous
     cycle, to extract a common set of helper functions for DMA engine
     implementations.
   - conversion of amba-pl08x.c to use these helper functions.
   - addition of OMAP DMA engine driver (using these helper functions),
     and conversion of some of the OMAP DMA users to use DMA engine.

  Nothing in the helper functions is ARM specific, so I hope that other
  implementations can consolidate some of their code by making use of
  these helpers.

  This has been sitting in linux-next most of the merge cycle, and has
  been tested by several OMAP folk.  I've tested it on sa11x0 platforms,
  and given it my best shot on my broken platforms which have the
  amba-pl08x controller.

  The last point is the addition to feature-removal-schedule.txt, which
  will have a merge conflict.  Between myself and TI, we're planning to
  remove the old TI DMA implementation next year."

Fix up trivial add/add conflicts in Documentation/feature-removal-schedule.txt
and drivers/dma/{Kconfig,Makefile}

* 'dmaengine' of git://git.linaro.org/people/rmk/linux-arm: (53 commits)
  ARM: 7481/1: OMAP2+: omap2plus_defconfig: enable OMAP DMA engine
  ARM: 7464/1: mmc: omap_hsmmc: ensure probe returns error if DMA channel request fails
  Add feature removal of old OMAP private DMA implementation
  mtd: omap2: remove private DMA API implementation
  mtd: omap2: add DMA engine support
  spi: omap2-mcspi: remove private DMA API implementation
  spi: omap2-mcspi: add DMA engine support
  ARM: omap: remove mmc platform data dma_mask and initialization
  mmc: omap: remove private DMA API implementation
  mmc: omap: add DMA engine support
  mmc: omap_hsmmc: remove private DMA API implementation
  mmc: omap_hsmmc: add DMA engine support
  dmaengine: omap: add support for cyclic DMA
  dmaengine: omap: add support for setting fi
  dmaengine: omap: add support for returning residue in tx_state method
  dmaengine: add OMAP DMA engine driver
  dmaengine: sa11x0-dma: add cyclic DMA support
  dmaengine: sa11x0-dma: fix DMA residue support
  dmaengine: PL08x: ensure all descriptors are freed when channel is released
  dmaengine: PL08x: get rid of write only pool_ctr and free_txd locking
  ...
2012-08-01 16:41:07 -07:00
Linus Torvalds 02a6ec6a24 Merge branch 'audit' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM audit/signal updates from Russell King:
 "ARM audit/signal handling updates from Al and Will.  This improves on
  the work Viro did last merge window, and sorts out some of the issues
  found with that work."

* 'audit' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7475/1: sys_trace: allow all syscall arguments to be updated via ptrace
  ARM: 7474/1: get rid of TIF_SYSCALL_RESTARTSYS
  ARM: 7473/1: deal with handlerless restarts without leaving the kernel
  ARM: 7472/1: pull all work_pending logics into C function
  ARM: 7471/1: Revert "7442/1: Revert "remove unused restart trampoline""
  ARM: 7470/1: Revert "7443/1: Revert "new way of handling ERESTART_RESTARTBLOCK""
2012-08-01 16:35:37 -07:00
Linus Torvalds 9a2533c3eb Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM fixes from Russell King:
 "This fixes various issues found during July"

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7479/1: mm: avoid NULL dereference when flushing gate_vma with VIVT caches
  ARM: Fix undefined instruction exception handling
  ARM: 7480/1: only call smp_send_stop() on SMP
  ARM: 7478/1: errata: extend workaround for erratum #720789
  ARM: 7477/1: vfp: Always save VFP state in vfp_pm_suspend on UP
  ARM: 7476/1: vfp: only clear vfp state for current cpu in vfp_pm_suspend
  ARM: 7468/1: ftrace: Trace function entry before updating index
  ARM: 7467/1: mutex: use generic xchg-based implementation for ARMv6+
  ARM: 7466/1: disable interrupt before spinning endlessly
  ARM: 7465/1: Handle >4GB memory sizes in device tree and mem=size@start option
2012-08-01 16:30:45 -07:00
Richard Weinberger b070989aeb um: Add arch/x86/um to MAINTAINERS
Signed-off-by: Richard Weinberger <richard@nod.at>
2012-08-02 01:00:47 +02:00
Martin Pärtel d3c1cfcdb4 um: pass siginfo to guest process
UML guest processes now get correct siginfo_t for SIGTRAP, SIGFPE,
SIGILL and SIGBUS. Specifically, si_addr and si_code are now correct
where previously they were si_addr = NULL and si_code = 128.

Signed-off-by: Martin Pärtel <martin.partel@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2012-08-02 00:49:17 +02:00
Martin Pärtel d4afcba95f um: fix ubd_file_size for read-only files
Made ubd_file_size not request write access. Fixes use of read-only images.

Signed-off-by: Martin Pärtel <martin.partel@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2012-08-02 00:44:49 +02:00
Al Viro b8a4209523 um: pull interrupt_end() into userspace()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2012-08-02 00:25:44 +02:00
Al Viro 1bfa2317b2 um: split syscall_trace(), pass pt_regs to it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[richard@nod.at: Fixed some minor build issues]
Signed-off-by: Richard Weinberger <richard@nod.at>
2012-08-02 00:25:38 +02:00
Al Viro a3170d2ec2 um: switch UPT_SET_RETURN_VALUE and regs_return_value to pt_regs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2012-08-01 23:33:16 +02:00
Linus Torvalds d4fdc32517 fbdev updates for 3.6
It includes:
 - large updates for OMAP
   - support for LCD3 overlay manager (omap5)
   - omapdss output cleanup
   - removal of passive matrix LCD support as there are no drivers for
     such panels for DSS or DSS2 and nobody complained (cleanup)
 - large updates for SH Mobile
   - overlay support
   - separating MERAM (cache) from framebuffer driver
 - some updates for Exynos and da8xx-fb
 - various other small patches
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQGSWqAAoJECSVL5KnPj1PnEQP/RQ5NWKlgkPloLkWrLc8stN/
 0AKWgxBJ0BuJBW6CJCKVy76kkCBW2PZ7bHsFLNuX490KnPwt1cz3sLni78UiW1CZ
 bNlN5UKshOfC511BxF5GZjtLvvkj5+ocmoybq27MhBoJn/7EVbli7lSeYjEWeWuk
 sTq6wJTBJ8Nc4PEWdPhIbiWe7NgnCge27AOHrzUV5cRHxdHtl+mqD99Ky7UdMsHz
 qBVTVmtEmLh2g3KMuu5rByuDDlUqhpi0sKorGsNWk92rpUnVsc4E4/v06JJqB3n8
 ef3q352GK32LKpWwX78pm5+DJMhpSMFJg6UrvQu03gQSU5Pw3O4Dl8g+hh6FgcMo
 niYZ+g07K0D8BSqdTwy9gwRTSWLPHplR8xz9VsW3+jdmFFuQgB6suA2Dk2E/K3Cf
 12jurwegypfI5KutRyTz7IuTw/8OhHs4x0PJWcSw3lq1czUM212vqDWKYJFbgznq
 6s8sHLWnQg0U47LYTCNV/mA4QRlE4ewE3B6wIVrzPvcTIinKqO1hky10fZ4+sEw6
 gH6bnSBvlRgYOYhy/MPInz0Gt6YfzK26M6ZPMq2DU7gZ6OFcL9IrnqWDdnZMwwD9
 j15DWeU2XPMc8vAGpdg2giq3VmQ53rviwLgp7Ht6fGrTLk1z4q167zGUSOvo3KQi
 Ai90ycCBgAGLorHhdCI1
 =22b7
 -----END PGP SIGNATURE-----

Merge tag 'fbdev-updates-for-3.6' of git://github.com/schandinat/linux-2.6

Pull fbdev updates from Florian Tobias Schandinat:
 - large updates for OMAP
   - support for LCD3 overlay manager (omap5)
   - omapdss output cleanup
   - removal of passive matrix LCD support as there are no drivers for
     such panels for DSS or DSS2 and nobody complained (cleanup)
 - large updates for SH Mobile
   - overlay support
   - separating MERAM (cache) from framebuffer driver
 - some updates for Exynos and da8xx-fb
 - various other small patches

* tag 'fbdev-updates-for-3.6' of git://github.com/schandinat/linux-2.6: (78 commits)
  da8xx-fb: fix compile issue due to missing include
  fbdev: Make pixel_to_pat() failure mode more friendly
  da8xx-fb: do not turn ON LCD backlight unless LCDC is enabled
  fbdev: sh_mobile_lcdc: Fix vertical panning step
  video: exynos mipi dsi: Fix mipi dsi regulators handling issue
  video: da8xx-fb: do clock reset of revision 2 LCDC before enabling
  arm: da850: configure LCDC fifo threshold
  video: da8xx-fb: configure FIFO threshold to reduce underflow errors
  video: da8xx-fb: fix flicker due to 1 frame delay in updated frame
  video: da8xx-fb rev2: fix disabling of palette completion interrupt
  da8xx-fb: add missing FB_BLANK operations
  video: exynos_dp: use usleep_range instead of delay
  video: exynos_dp: check the only INTERLANE_ALIGN_DONE bit during Link Training
  fb: epson1355fb: Fix section mismatch
  video: exynos_dp: fix wrong DPCD address during Link Training
  video/smscufx: fix line counting in fb_write
  aty128fb: Fix coding style issues
  fbdev: sh_mobile_lcdc: Fix pan offset computation in YUV mode
  fbdev: sh_mobile_lcdc: Fix overlay registers update during pan operation
  fbdev: sh_mobile_lcdc: Support horizontal panning
  ...
2012-08-01 10:45:12 -07:00
Linus Torvalds 9a51cf28a3 Sound fixes for 3.6-rc1
A collection of small fixes that have been found recently.
 Most of the commits are regression fixes in HD-audio and some other
 random drivers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJQGOkdAAoJEGwxgFQ9KSmk5REP/0OH5srTWkSGDJqWK0m0Z0A6
 vkZE9KXm/cKcw59MEBhZrE28G4K8fI28XLj6iEuhzcuv7XsUTo9d24Uvvv1pWaEy
 p2GFMRNc5QrXtprnckL+HPA4+asmiyEpXpYC7D4YH1N6ofYuNJfh0QIgQKG0R2Oz
 8Ekdwuuzu0gfNYcN7aWDFiDwNID8hRiW4RVf9V5mNOGtO9Z+82o7u2pnr74vu6FG
 C07DrpKXauGhGDIgfoNn30HwifSWvPm/rpPWwxUucPLAjiE25/70hTjnZZYWtRbe
 g9o9INh3F72aBv23zTQzjkOr9/hhc4/j9zxZ1cMSjTKdvSdoFa5QuQTfCct7z7Fd
 GcdXtMMNSF+FLNC4TyOlyMLoEFaHhv9uBMVk0rBe+y1/urzf4aH+PfI1B42meSI5
 tHiGVvTdhktA2NGp1kf24b88db5ZoNPk2Kmzzn8xHxZsQTjjaUriMAtM/CgmLoBj
 sOjMEkHZpcmAWCOqZDhb9U7QDZNp3h6TBG2/j/PerN/mt5pAVdoxzECDbswm/8My
 g/ujPJFe/2NpBRsDqTI2Lb1H5Xy1tLAnwz5NA4+aiEQjaCRNGLYUvnlcrgdwOmaE
 bk1OmKWTE2ck6rU+edsyPOSWzFEyU1hL1UDcqIyeBsZbh+pvFh+dxEbQFckhR6o4
 fXqmVya1YWUrl2vF99QW
 =cSAm
 -----END PGP SIGNATURE-----

Merge tag 'sound-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of small fixes that have been found recently.  Most of
  the commits are regression fixes in HD-audio and some other random
  drivers."

* tag 'sound-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: snd-usb: fix clock source validity index
  ALSA: hda - Fix mute-LED GPIO initialization for IDT codecs
  ALSA: hda - Add descriptions for missing IDT 92HD83x models
  ALSA: hda - Fix polarity of mute LED on HP Mini 210
  ALSA: es1688 - freeup resources on init failure
  ALSA: hda - Workaround for silent output on VAIO Z with ALC889
  ALSA: hda - Fix WARNING from HDMI/DP parser
  ALSA: hda - Detach from converter at closing in patch_hdmi.c
  ALSA: hda - Fix mute-LED GPIO setup for HP Mini 210
  ALSA: mpu401: Fix missing initialization of irq field
  ALSA: hda - Fix invalid D3 of headphone DAC on VT202x codecs
2012-08-01 10:42:26 -07:00
Linus Torvalds a0e881b7c1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second vfs pile from Al Viro:
 "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
  deadlock reproduced by xfstests 068), symlink and hardlink restriction
  patches, plus assorted cleanups and fixes.

  Note that another fsfreeze deadlock (emergency thaw one) is *not*
  dealt with - the series by Fernando conflicts a lot with Jan's, breaks
  userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
  for massive vfsmount leak; this is going to be handled next cycle.
  There probably will be another pull request, but that stuff won't be
  in it."

Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  delousing target_core_file a bit
  Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
  fs: Remove old freezing mechanism
  ext2: Implement freezing
  btrfs: Convert to new freezing mechanism
  nilfs2: Convert to new freezing mechanism
  ntfs: Convert to new freezing mechanism
  fuse: Convert to new freezing mechanism
  gfs2: Convert to new freezing mechanism
  ocfs2: Convert to new freezing mechanism
  xfs: Convert to new freezing code
  ext4: Convert to new freezing mechanism
  fs: Protect write paths by sb_start_write - sb_end_write
  fs: Skip atime update on frozen filesystem
  fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
  fs: Improve filesystem freezing handling
  switch the protection of percpu_counter list to spinlock
  nfsd: Push mnt_want_write() outside of i_mutex
  btrfs: Push mnt_want_write() outside of i_mutex
  fat: Push mnt_want_write() outside of i_mutex
  ...
2012-08-01 10:26:23 -07:00
Ralf Baechle 95cf1468f7 MIPS: Loongson 2: Sort out clock managment.
For unexplainable reasons the Loongson 2 clock API was implemented in a
module so fixing this involved shifting large amounts of code around.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-08-01 18:10:06 +02:00
Linus Torvalds eff0d13f38 Merge branch 'for-3.6/drivers' of git://git.kernel.dk/linux-block
Pull block driver changes from Jens Axboe:

 - Making the plugging support for drivers a bit more sane from Neil.
   This supersedes the plugging change from Shaohua as well.

 - The usual round of drbd updates.

 - Using a tail add instead of a head add in the request completion for
   ndb, making us find the most completed request more quickly.

 - A few floppy changes, getting rid of a duplicated flag and also
   running the floppy init async (since it takes forever in boot terms)
   from Andi.

* 'for-3.6/drivers' of git://git.kernel.dk/linux-block:
  floppy: remove duplicated flag FD_RAW_NEED_DISK
  blk: pass from_schedule to non-request unplug functions.
  block: stack unplug
  blk: centralize non-request unplug handling.
  md: remove plug_cnt feature of plugging.
  block/nbd: micro-optimization in nbd request completion
  drbd: announce FLUSH/FUA capability to upper layers
  drbd: fix max_bio_size to be unsigned
  drbd: flush drbd work queue before invalidate/invalidate remote
  drbd: fix potential access after free
  drbd: call local-io-error handler early
  drbd: do not reset rs_pending_cnt too early
  drbd: reset congestion information before reporting it in /proc/drbd
  drbd: report congestion if we are waiting for some userland callback
  drbd: differentiate between normal and forced detach
  drbd: cleanup, remove two unused global flags
  floppy: Run floppy initialization asynchronous
2012-08-01 09:06:47 -07:00
Linus Torvalds 8cf1a3fce0 Merge branch 'for-3.6/core' of git://git.kernel.dk/linux-block
Pull core block IO bits from Jens Axboe:
 "The most complicated part if this is the request allocation rework by
  Tejun, which has been queued up for a long time and has been in
  for-next ditto as well.

  There are a few commits from yesterday and today, mostly trivial and
  obvious fixes.  So I'm pretty confident that it is sound.  It's also
  smaller than usual."

* 'for-3.6/core' of git://git.kernel.dk/linux-block:
  block: remove dead func declaration
  block: add partition resize function to blkpg ioctl
  block: uninitialized ioc->nr_tasks triggers WARN_ON
  block: do not artificially constrain max_sectors for stacking drivers
  blkcg: implement per-blkg request allocation
  block: prepare for multiple request_lists
  block: add q->nr_rqs[] and move q->rq.elvpriv to q->nr_rqs_elvpriv
  blkcg: inline bio_blkcg() and friends
  block: allocate io_context upfront
  block: refactor get_request[_wait]()
  block: drop custom queue draining used by scsi_transport_{iscsi|fc}
  mempool: add @gfp_mask to mempool_create_node()
  blkcg: make root blkcg allocation use %GFP_KERNEL
  blkcg: __blkg_lookup_create() doesn't need radix preload
2012-08-01 09:02:41 -07:00
Linus Torvalds fcff06c438 Merge branch 'for-next' of git://neil.brown.name/md
Pull md updates from NeilBrown.

* 'for-next' of git://neil.brown.name/md:
  DM RAID: Add support for MD RAID10
  md/RAID1: Add missing case for attempting to repair known bad blocks.
  md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE.
  md/raid1: don't abort a resync on the first badblock.
  md: remove duplicated test on ->openers when calling do_md_stop()
  raid5: Add R5_ReadNoMerge flag which prevent bio from merging at block layer
  md/raid1: prevent merging too large request
  md/raid1: read balance chooses idlest disk for SSD
  md/raid1: make sequential read detection per disk based
  MD RAID10: Export md_raid10_congested
  MD: Move macros from raid1*.h to raid1*.c
  MD RAID1: rename mirror_info structure
  MD RAID10: rename mirror_info structure
  MD RAID10: Fix compiler warning.
  raid5: add a per-stripe lock
  raid5: remove unnecessary bitmap write optimization
  raid5: lockless access raid5 overrided bi_phys_segments
  raid5: reduce chance release_stripe() taking device_lock
2012-08-01 09:02:01 -07:00
J. Bruce Fields 068535f1fe locks: remove unused lm_release_private
In commit 3b6e2723f3 ("locks: prevent side-effects of
locks_release_private before file_lock is initialized") we removed the
last user of lm_release_private without removing the field itself.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-01 09:01:46 -07:00
Yoichi Yuasa 4b00951f6f MIPS: Loongson 1: more clk support and add select HAVE_CLK
This fixes a redefinition of clk_*:

arch/mips/loongson1/common/clock.c:23:13: error: redefinition of 'clk_get'
include/linux/clk.h:281:27: note: previous definition of 'clk_get' was here
arch/mips/loongson1/common/clock.c:41:15: error: redefinition of 'clk_get_rate'
include/linux/clk.h:302:29: note: previous definition of 'clk_get_rate' was here
make[3]: *** [arch/mips/loongson1/common/clock.o] Error 1

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Cc: linux-mips@linux-mips.org
Reviewed-by: John Crispin <blogic@openwrt.org>
Acked-by: Kelvin Cheung <keguang.zhang@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/4143/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-08-01 17:59:23 +02:00
Yoichi Yuasa 47cd7343f5 MIPS: txx9: Fix redefinition of clk_* by adding select HAVE_CLK
arch/mips/txx9/generic/setup.c:87:13: error: redefinition of 'clk_get'
include/linux/clk.h:281:27: note: previous definition of 'clk_get' was here
arch/mips/txx9/generic/setup.c:97:5: error: redefinition of 'clk_enable'
include/linux/clk.h:295:19: note: previous definition of 'clk_enable' was here
arch/mips/txx9/generic/setup.c:103:6: error: redefinition of 'clk_disable'
include/linux/clk.h:300:20: note: previous definition of 'clk_disable' was here
arch/mips/txx9/generic/setup.c:108:15: error: redefinition of 'clk_get_rate'
include/linux/clk.h:302:29: note: previous definition of 'clk_get_rate' was here
arch/mips/txx9/generic/setup.c:114:6: error: redefinition of 'clk_put'
include/linux/clk.h:291:20: note: previous definition of 'clk_put' was here
make[3]: *** [arch/mips/txx9/generic/setup.o] Error 1

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Cc: linux-mips@linux-mips.org
Reviewed-by: John Crispin <blogic@openwrt.org>
Patchwork: https://patchwork.linux-mips.org/patch/4142/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-08-01 17:59:15 +02:00
Yoichi Yuasa 3e82eeebad MIPS: BCM63xx: Fix redefinition of clk_* by adding select HAVE_CLK
arch/mips/bcm63xx/clk.c:249:5: error: redefinition of 'clk_enable'
include/linux/clk.h:295:19: note: previous definition of 'clk_enable' was here
arch/mips/bcm63xx/clk.c:259:6: error: redefinition of 'clk_disable'
include/linux/clk.h:300:20: note: previous definition of 'clk_disable' was here
arch/mips/bcm63xx/clk.c:268:15: error: redefinition of 'clk_get_rate'
include/linux/clk.h:302:29: note: previous definition of 'clk_get_rate' was here
arch/mips/bcm63xx/clk.c:275:13: error: redefinition of 'clk_get'
include/linux/clk.h:281:27: note: previous definition of 'clk_get' was here
arch/mips/bcm63xx/clk.c:302:6: error: redefinition of 'clk_put'
include/linux/clk.h:291:20: note: previous definition of 'clk_put' was here
make[2]: *** [arch/mips/bcm63xx/clk.o] Error 1

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Cc: linux-mips@linux-mips.org
Reviewed-by: John Crispin <blogic@openwrt.org>
Patchwork: https://patchwork.linux-mips.org/patch/4141/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-08-01 17:59:06 +02:00
Yoichi Yuasa 8551fb643a MIPS: AR7: Fix redefinition of clk_* by adding select HAVE_CLK
arch/mips/ar7/clock.c:420:5: error: redefinition of 'clk_enable'
include/linux/clk.h:295:19: note: previous definition of 'clk_enable' was here
arch/mips/ar7/clock.c:426:6: error: redefinition of 'clk_disable'
include/linux/clk.h:300:20: note: previous definition of 'clk_disable' was here
arch/mips/ar7/clock.c:431:15: error: redefinition of 'clk_get_rate'
include/linux/clk.h:302:29: note: previous definition of 'clk_get_rate' was here
arch/mips/ar7/clock.c:437:13: error: redefinition of 'clk_get'
include/linux/clk.h:281:27: note: previous definition of 'clk_get' was here
arch/mips/ar7/clock.c:454:6: error: redefinition of 'clk_put'
include/linux/clk.h:291:20: note: previous definition of 'clk_put' was here
make[2]: *** [arch/mips/ar7/clock.o] Error 1

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Cc: linux-mips@linux-mips.org
Reviewed-by: John Crispin <blogic@openwrt.org>
Acked-by: Florian Fainelli <florian@openwrt.org>
Patchwork: https://patchwork.linux-mips.org/patch/4140/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-08-01 17:58:57 +02:00
John Crispin b902d9a98e MIPS: Lantiq: Platform specific CLK fixup
As we use CLKDEV_LOOKUP but dont have support for COMMON_CLK yet, we need to
provide our own version of of_clk_get_from_provider().

Signed-off-by: John Crispin <blogic@openwrt.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4117/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-08-01 17:58:09 +02:00
John Crispin a9188bc162 MIPS: Lantiq: Add device_tree_init function
Add a lantiq specific version of device_tree_init. The generic MIPS version
was removed by.

commit 594e966bc412d64eec9282d28ce511bdd62fea39
Author: David Daney <david.daney@cavium.com>
Date:   Thu Jul 5 18:12:38 2012 +0200

MIPS: Prune some target specific code out of prom.c

Signed-off-by: John Crispin <blogic@openwrt.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4116/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-08-01 17:58:04 +02:00
John Crispin e29b72f5e1 MIPS: Lantiq: Fix interface clock and PCI control register offset
The XRX200 based SoC have a different register offset for the interface
clock and PCI control registers. This patch detects the SoC and sets the
register offset at runtime. This make PCI work on the VR9 SoC.

Signed-off-by: John Crispin <blogic@openwrt.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4113/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-08-01 17:57:04 +02:00
Al Viro dbc6e0222d delousing target_core_file a bit
* set_fs(KERNEL_DS) + getname() is probably the weirdest implementation
of strdup() I've seen.  Especially since they don't to copy it at all...
* filp_open() never returns NULL; it's ERR_PTR(-E...) on failure.
* file->f_dentry is never going to be NULL, TYVM.
* match_strdup() + snprintf() + kfree() is a bloody weird way to spell
match_strlcpy().

Pox on cargo-cult programmers...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-08-01 16:53:16 +04:00
Jonathan Brassow 63f33b8dda DM RAID: Add support for MD RAID10
Support the MD RAID10 personality through dm-raid.c

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-08-01 20:41:20 +10:00
NeilBrown bb181e2e48 Merge commit 'c039c332f23e794deb6d6f37b9f07ff3b27fb2cf' into md
Pull in pre-requisites for adding raid10 support to dm-raid.
2012-08-01 20:40:02 +10:00
Yuanhan Liu 80799fbb7d block: remove dead func declaration
__generic_unplug_device() function is removed with commit
7eaceaccab, which forgot to
remove the declaration at meantime. Here remove it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-08-01 12:25:54 +02:00
Vivek Goyal c83f6bf98d block: add partition resize function to blkpg ioctl
Add a new operation code (BLKPG_RESIZE_PARTITION) to the BLKPG ioctl that
allows altering the size of an existing partition, even if it is currently
in use.

This patch converts hd_struct->nr_sects into sequence counter because
One might extend a partition while IO is happening to it and update of
nr_sects can be non-atomic on 32bit machines with 64bit sector_t. This
can lead to issues like reading inconsistent size of a partition. Sequence
counter have been used so that readers don't have to take bdev mutex lock
as we call sector_in_part() very frequently.

Now all the access to hd_struct->nr_sects should happen using sequence
counter read/update helper functions part_nr_sects_read/part_nr_sects_write.
There is one exception though, set_capacity()/get_capacity(). I think
theoritically race should exist there too but this patch does not
modify set_capacity()/get_capacity() due to sheer number of call sites
and I am afraid that change might break something. I have left that as a
TODO item. We can handle it later if need be. This patch does not introduce
any new races as such w.r.t set_capacity()/get_capacity().

v2: Add CONFIG_LBDAF test to UP preempt case as suggested by Phillip.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Phillip Susi <psusi@ubuntu.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-08-01 12:24:18 +02:00
Olof Johansson 4638a83e86 block: uninitialized ioc->nr_tasks triggers WARN_ON
Hi,

I'm using the old-fashioned 'dump' backup tool, and I noticed that it spews the
below warning as of 3.5-rc1 and later (3.4 is fine):

[   10.886893] ------------[ cut here ]------------
[   10.886904] WARNING: at include/linux/iocontext.h:140 copy_process+0x1488/0x1560()
[   10.886905] Hardware name: Bochs
[   10.886906] Modules linked in:
[   10.886908] Pid: 2430, comm: dump Not tainted 3.5.0-rc7+ #27
[   10.886908] Call Trace:
[   10.886911]  [<ffffffff8107ce8a>] warn_slowpath_common+0x7a/0xb0
[   10.886912]  [<ffffffff8107ced5>] warn_slowpath_null+0x15/0x20
[   10.886913]  [<ffffffff8107c088>] copy_process+0x1488/0x1560
[   10.886914]  [<ffffffff8107c244>] do_fork+0xb4/0x340
[   10.886918]  [<ffffffff8108effa>] ? recalc_sigpending+0x1a/0x50
[   10.886919]  [<ffffffff8108f6b2>] ? __set_task_blocked+0x32/0x80
[   10.886920]  [<ffffffff81091afa>] ? __set_current_blocked+0x3a/0x60
[   10.886923]  [<ffffffff81051db3>] sys_clone+0x23/0x30
[   10.886925]  [<ffffffff8179bd73>] stub_clone+0x13/0x20
[   10.886927]  [<ffffffff8179baa2>] ? system_call_fastpath+0x16/0x1b
[   10.886928] ---[ end trace 32a14af7ee6a590b ]---

Reproducing is easy, I can hit it on a KVM system with a very basic
config (x86_64 make defconfig + enable the drivers needed). To hit it,
just install dump (on debian/ubuntu, not sure what the package might be
called on Fedora), and:

dump -o -f /tmp/foo /

You'll see the warning in dmesg once it forks off the I/O process and
starts dumping filesystem contents.

I bisected it down to the following commit:

commit f6e8d01bee
Author: Tejun Heo <tj@kernel.org>
Date:   Mon Mar 5 13:15:26 2012 -0800

    block: add io_context->active_ref

    Currently ioc->nr_tasks is used to decide two things - whether an ioc
    is done issuing IOs and whether it's shared by multiple tasks.  This
    patch separate out the first into ioc->active_ref, which is acquired
    and released using {get|put}_io_context_active() respectively.

    This will be used to associate bio's with a given task.  This patch
    doesn't introduce any visible behavior change.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

It seems like the init of ioc->nr_tasks was removed in that patch,
so it starts out at 0 instead of 1.

Tejun, is the right thing here to add back the init, or should something else
be done?

The below patch removes the warning, but I haven't done any more extensive
testing on it.

Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-08-01 12:17:27 +02:00
Mike Snitzer fe86cdcef7 block: do not artificially constrain max_sectors for stacking drivers
blk_set_stacking_limits is intended to allow stacking drivers to build
up the limits of the stacked device based on the underlying devices'
limits.  But defaulting 'max_sectors' to BLK_DEF_MAX_SECTORS (1024)
doesn't allow the stacking driver to inherit a max_sectors larger than
1024 -- due to blk_stack_limits' use of min_not_zero.

It is now clear that this artificial limit is getting in the way so
change blk_set_stacking_limits's max_sectors to UINT_MAX (which allows
stacking drivers like dm-multipath to inherit 'max_sectors' from the
underlying paths).

Reported-by: Vijay Chauhan <vijay.chauhan@netapp.com>
Tested-by: Vijay Chauhan <vijay.chauhan@netapp.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-08-01 10:44:28 +02:00
Daniel Mack aff252a848 ALSA: snd-usb: fix clock source validity index
uac_clock_source_is_valid() uses the control selector value to access
the bmControls bitmap of the clock source unit. This is wrong, as
control selector values start from 1, while the bitmap uses all
available bits.

In other words, "Clock Validity Control" is stored in D3..2, not D5..4
of the clock selector unit's bmControls.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-by: Andreas Koch <andreas@akdesigninc.com>
Cc: stable@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2012-08-01 10:24:16 +02:00
Linus Torvalds 2d53492620 irqdomain changes for Linux v3.6
Round of refactoring and enhancements to irq_domain infrastructure. This
 series starts the process of simplifying irqdomain. The ultimate goal is
 to merge LEGACY, LINEAR and TREE mappings into a single system, but had
 to back off from that after some last minute bugs. Instead it mainly
 reorganizes the code and ensures that the reverse map gets populated
 when the irq is mapped instead of the first time it is looked up.
 
 Merging of the irq_domain types is deferred to v3.7
 
 In other news, this series adds helpers for creating static mappings on
 a linear or tree mapping.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQGKKkAAoJEEFnBt12D9kB59gQAJnTjrihej1tr0OEkffIthGK
 RyVI/DMo0jMgLs4K/rIo3Y+PdTSsNYd8x4R7ln8O7rNRQn8W6jE6NQgoMh51EvNc
 FAltmTsBldq6hUNuz2FEnbmojBP4QklTzL8bAiXtX5EufWQsgMsP4guOuHXLCjEV
 CkWYVk/slXEWJ8yYJc6GKVRvL+CNeiXVCTcOsYA0CI3ofN7O0rd+YAL314CRllIc
 e5uARbWM+s9FJ/eXwCZP4+3jCmdI/CHJb284WldMc/mBD8Rbiqpb4kH6AZI+TH2O
 CyiNEPWs6FG5eJPTID7HrOarXGzwYq/pvv8iG7Mh8NiKSae1C1HdkHelCjbLQ+pU
 POya0fWF1Gvzlmw0gHik86dqaKjwb29btjj7SFg8KnQExWn2ifhsY70mM9wCTo3s
 cwcQlssDIsARE83nttTFCoV/iAWh9AvTxafrXu/+9OKTjpsYlC8kgzdVjq5aAxON
 JaAUK1OduTWRsd1TabKlh6naRXr9nRcLKikwKri2oYVKkj97wahBuib4ffzAcNqz
 VklRBxTH6M+dz/t5NpcVyLXJpqzTN++QNdTAmeQG6LOnHJL4tpFTsx5sMa7ghmzX
 LNpmp/AkVfP0MT7Drf0FUUx6iFA7sjANYzcepUVDrPGKHx0E3LyqbG5JKcC5LgM6
 +UIoKAktF3vY7pdZJL9z
 =ZUF/
 -----END PGP SIGNATURE-----

Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6

Pull irqdomain changes from Grant Likely:
 "Round of refactoring and enhancements to irq_domain infrastructure.
  This series starts the process of simplifying irqdomain.  The ultimate
  goal is to merge LEGACY, LINEAR and TREE mappings into a single
  system, but had to back off from that after some last minute bugs.
  Instead it mainly reorganizes the code and ensures that the reverse
  map gets populated when the irq is mapped instead of the first time it
  is looked up.

  Merging of the irq_domain types is deferred to v3.7

  In other news, this series adds helpers for creating static mappings
  on a linear or tree mapping."

* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6:
  irqdomain: Improve diagnostics when a domain mapping fails
  irqdomain: eliminate slow-path revmap lookups
  irqdomain: Fix irq_create_direct_mapping() to test irq_domain type.
  irqdomain: Eliminate dedicated radix lookup functions
  irqdomain: Support for static IRQ mapping and association.
  irqdomain: Always update revmap when setting up a virq
  irqdomain: Split disassociating code into separate function
  irq_domain: correct a minor wrong comment for linear revmap
  irq_domain: Standardise legacy/linear domain selection
  irqdomain: Make ops->map hook optional
  irqdomain: Remove unnecessary test for IRQ_DOMAIN_MAP_LEGACY
  irqdomain: Simple NUMA awareness.
  devicetree: add helper inline for retrieving a node's full name
2012-07-31 20:44:03 -07:00
Linus Torvalds ac694dbdbc Merge branch 'akpm' (Andrew's patch-bomb)
Merge Andrew's second set of patches:
 - MM
 - a few random fixes
 - a couple of RTC leftovers

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
  rtc/rtc-88pm80x: remove unneed devm_kfree
  rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
  mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
  tmpfs: distribute interleave better across nodes
  mm: remove redundant initialization
  mm: warn if pg_data_t isn't initialized with zero
  mips: zero out pg_data_t when it's allocated
  memcg: gix memory accounting scalability in shrink_page_list
  mm/sparse: remove index_init_lock
  mm/sparse: more checks on mem_section number
  mm/sparse: optimize sparse_index_alloc
  memcg: add mem_cgroup_from_css() helper
  memcg: further prevent OOM with too many dirty pages
  memcg: prevent OOM with too many dirty pages
  mm: mmu_notifier: fix freed page still mapped in secondary MMU
  mm: memcg: only check anon swapin page charges for swap cache
  mm: memcg: only check swap cache pages for repeated charging
  mm: memcg: split swapin charge function into private and public part
  mm: memcg: remove needless !mm fixup to init_mm when charging
  mm: memcg: remove unneeded shmem charge type
  ...
2012-07-31 19:25:39 -07:00
Linus Torvalds a40a1d3d0a VFIO for v3.6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQF+koAAoJECObm247sIsiNjQP/Rra8MHYEtUYlIDlx/MU5aPV
 dVOQ1rbHqLH6vnqBESuzwWYd1Eg9pI0yjMoQKuOrxg0Ri/cAhf3WE8xTzb/HTZ6h
 npBvVmYIrqAFvE8Tqd0L9XZXNES5dy47F7a3Sp7vXAflQTQ7NAgG3idhzcmom5Zp
 6Z7ETrtcaLnu0wr67eb96ICpq9QgPd9rWy5Fn9RAo7NzOb3SJp9CXT02vShytQLk
 4sCcez4DTXf3IQjvjgVwV1OfvmcvG1lhI8R1Wvj1TqfqJVC0ZmKQYpmR9rlnPkLB
 SB51h0v6a64YaKBpD7R7wuKUGWn2CEg4zAPuFxeACZ99pyx+eulPqZOTX2mAl+gQ
 aW5Ec7Au76zvqnKgPVt2nzFMCXYZpHlWRQHqQNon5xXVdqJRS4cJAev7o9udyD6q
 SQZIKBoZrwQtA3YbukXd91JKnxliH23MxCaL2u5C5ix6BSKsRs7djHboe210GAcQ
 bEjQnhHujETJcFzm1O9HbypJhfg3Klw5o/SKYi3pylURDRMmKVhR3vP9cxdloxAC
 QGh3klwQncKN9RxSACsfcNAh0V7RqkufcLXPpEcCwAuyUIrYniwWxiroco/bYE24
 DK8g6bi+cY9At2iMALTd2J7RzZcrertQVmRtmXaOrLwSZvWrdizc6qNBpwTIozIT
 +V/e1WcSau1C4zU4T0Z0
 =fx5A
 -----END PGP SIGNATURE-----

Merge tag 'vfio-for-v3.6' of git://github.com/awilliam/linux-vfio

Pull VFIO core from Alex Williamson:
 "This series includes the VFIO userspace driver interface for the 3.6
  kernel merge window.  This driver is intended to provide a secure
  interface for device access using IOMMU protection for applications
  like assignment of physical devices to virtual machines.

  Qemu will be the first user of this interface, enabling assignment of
  PCI devices to Qemu guests.  This interface is intended to eventually
  replace the x86-specific assignment mechanism currently available in
  KVM.

  This interface has the advantage of being more secure, by working with
  IOMMU groups to ensure device isolation and providing it's own
  filtered resource access mechanism, and also more flexible, in not
  being x86 or KVM specific (extensions to enable POWER are already
  working).

  This driver is originally the work of Tom Lyon, but has since been
  handed over to me and gone through a complete overhaul thanks to the
  input from David Gibson, Ben Herrenschmidt, Chris Wright, Joerg
  Roedel, and others.  This driver has been available in linux-next for
  the last month."

Paul Mackerras says:
 "I would be glad to see it go in since we want to use it with KVM on
  PowerPC.  If possible we'd like the PowerPC bits for it to go in as
  well."

* tag 'vfio-for-v3.6' of git://github.com/awilliam/linux-vfio:
  vfio: Add PCI device driver
  vfio: Type1 IOMMU implementation
  vfio: Add documentation
  vfio: VFIO core
2012-07-31 19:17:27 -07:00
Linus Torvalds 3e9a97082f This patch series contains a major revamp of how we collect entropy
from interrupts for /dev/random and /dev/urandom.  The goal is to
 addresses weaknesses discussed in the paper "Mining your Ps and Qs:
 Detection of Widespread Weak Keys in Network Devices", by Nadia
 Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman, which will
 be published in the Proceedings of the 21st Usenix Security Symposium,
 August 2012.  (See https://factorable.net for more information and an
 extended version of the paper.)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJQF/0DAAoJENNvdpvBGATwIowQAOep9QKtLrBvb2lwIRVmeiy8
 lRf7V/tYZnz4FePbR0W92JQfKYkCV8yyOO0bmeRzWL3v4m+lRwDTSyA1DDyQMoH+
 LOMzvDKSLJMSXTXdSOIr1WYACphViCR/9CrbMBCKSkYfZLJ1MdaEDxT3rcpTGD0T
 6iknUweiSkHHhkerU5yQL7FKzD5kYUe0hsF47w7QVlHRHJsW2fsZqkFoh+RpnhNw
 03u+djxNGBo9qV81vZ9D1b0vA9uRlEjoWOOEG2XE4M2iq6TUySueA72dQnCwunfi
 3kG/u1Swv2dgq6aRrP3H7zdwhYSourGxziu3jNhEKwKEohrxYY7xjNX3RVeTqP67
 AzlKsOTWpRLIDrzjSLlb8VxRQiZewu8Unex3e1G+eo20sbcIObHGrxNp7K00zZvd
 QZiMHhOwItwFTe4lBO+XbqH2JKbL9/uJmwh5EipMpQTraKO9E6N3CJiUHjzBLo2K
 iGDZxRMKf4gVJRwDxbbP6D70JPVu8ZJ09XVIpsXQ3Z1xNqaMF0QdCmP3ty56q1o0
 NvkSXxPKrijZs8Sk0rVDqnJ3ll8PuDnXMv5eDtL42VT818I5WxESn9djjwEanGv0
 TYxbFub/NRxmPEE5B2Js5FBpqsLf5f282OSMeS/5WLBbnHJR1OoPoAhGVpHvxntC
 bi5FC1OolqhvzVIdsqgt
 =u7KM
 -----END PGP SIGNATURE-----

Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random

Pull random subsystem patches from Ted Ts'o:
 "This patch series contains a major revamp of how we collect entropy
  from interrupts for /dev/random and /dev/urandom.

  The goal is to addresses weaknesses discussed in the paper "Mining
  your Ps and Qs: Detection of Widespread Weak Keys in Network Devices",
  by Nadia Heninger, Zakir Durumeric, Eric Wustrow, J.  Alex Halderman,
  which will be published in the Proceedings of the 21st Usenix Security
  Symposium, August 2012.  (See https://factorable.net for more
  information and an extended version of the paper.)"

Fix up trivial conflicts due to nearby changes in
drivers/{mfd/ab3100-core.c, usb/gadget/omap_udc.c}

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (33 commits)
  random: mix in architectural randomness in extract_buf()
  dmi: Feed DMI table to /dev/random driver
  random: Add comment to random_initialize()
  random: final removal of IRQF_SAMPLE_RANDOM
  um: remove IRQF_SAMPLE_RANDOM which is now a no-op
  sparc/ldc: remove IRQF_SAMPLE_RANDOM which is now a no-op
  [ARM] pxa: remove IRQF_SAMPLE_RANDOM which is now a no-op
  board-palmz71: remove IRQF_SAMPLE_RANDOM which is now a no-op
  isp1301_omap: remove IRQF_SAMPLE_RANDOM which is now a no-op
  pxa25x_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
  omap_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
  goku_udc: remove IRQF_SAMPLE_RANDOM which was commented out
  uartlite: remove IRQF_SAMPLE_RANDOM which is now a no-op
  drivers: hv: remove IRQF_SAMPLE_RANDOM which is now a no-op
  xen-blkfront: remove IRQF_SAMPLE_RANDOM which is now a no-op
  n2_crypto: remove IRQF_SAMPLE_RANDOM which is now a no-op
  pda_power: remove IRQF_SAMPLE_RANDOM which is now a no-op
  i2c-pmcmsp: remove IRQF_SAMPLE_RANDOM which is now a no-op
  input/serio/hp_sdc.c: remove IRQF_SAMPLE_RANDOM which is now a no-op
  mfd: remove IRQF_SAMPLE_RANDOM which is now a no-op
  ...
2012-07-31 19:07:42 -07:00
Linus Torvalds 941c8726e4 Final RDMA changes for the 3.6 merge window:
- Fix IPoIB to stop using unsafe linkage between networking neighbour
    layer and private path database.
  - Small fixes for bugs found by Fengguang Wu's automated builds.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQGEYLAAoJEENa44ZhAt0h/o0P/itY+uIhLbNnC2lBDV8sRLYm
 3tgKUX7K3gcD5+QgDKFVEj9z/FFNA0cua9MvRYrZqIL/jg2PNANLSKhAj/0COwKG
 j8MwIbKEZUunkUjP2+9hRF+vXk7VYEyPMaq6LkMRQRRl1vARasSDWVX8cRCukG62
 /tuS3ydTrC0c79zRZWWmNoPl8Lgmw3a6XH5pxcd2Afm72/kMGmVmeVzdduEU0jWv
 VmBuO1mM+fu8Hf5x0AcI/0kuplUngabq3jAl3dJSaV66MkH98RoxRi2izRdJM28l
 F+72PxY1XA+Z18jhaTTbOp18zhovtUpDTnnYdVrfwEcaspMgxRNxbqA4BTg98rnk
 UpOoQJxIJM9RBY274I1hmZsnUBTXYNLLbAPQautpgve6Mo78agsoiaZZmZReZY1X
 refzxdKo9Tl0zVZ9RWolPDukTd+0if4yKAvaIyHYQAYny/kWxhUwzZiSvBbgdnlC
 UGNVYCndvJ94XTz0xnF8NsvjcSrk/piLuJMUSS1PEKa/gPCnds6MllpeHmx1xUxc
 /80WyjZ7Ysot2pJqO6NgK71Ky9l0xF5Q9mYn3m37TtkYSS/ur2L+/r7tebArvoQR
 01LhqRAn4Mx6nu/ebd9yhnxdeuPRvhfGpZu5O2T/DKHhJvNUGCiFCOTcQn62UUMk
 vM2NjIA+haUJTXLRJrOv
 =zFTH
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull final RDMA changes from Roland Dreier:
 - Fix IPoIB to stop using unsafe linkage between networking neighbour
   layer and private path database.
 - Small fixes for bugs found by Fengguang Wu's automated builds.

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IPoIB: Use a private hash table for path lookup in xmit path
  IB/qib: Fix size of cc_supported_table_entries
  RDMA/ucma: Convert open-coded equivalent to memdup_user()
  RDMA/ocrdma: Fix check of GSI CQs
  RDMA/cma: Use PTR_RET rather than if (IS_ERR(...)) + PTR_ERR
2012-07-31 18:59:55 -07:00
Linus Torvalds 8762541f06 Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull second set of media updates from Mauro Carvalho Chehab:

 - radio API: add support to work with radio frequency bands

 - new AM/FM radio drivers: radio-shark, radio-shark2

 - new Remote Controller USB driver: iguanair

 - conversion of several drivers to the v4l2 core control framework

 - new board additions at existing drivers

 - the remaining (and vast majority of the patches) are due to
   drivers/DocBook fixes/cleanups.

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (154 commits)
  [media] radio-tea5777: use library for 64bits div
  [media] tlg2300: Declare MODULE_FIRMWARE usage
  [media] lgs8gxx: Declare MODULE_FIRMWARE usage
  [media] xc5000: Add MODULE_FIRMWARE statements
  [media] s2255drv: Add MODULE_FIRMWARE statement
  [media] dib8000: move dereference after check for NULL
  [media] Documentation: Update cardlists
  [media] bttv: add support for Aposonic W-DVR
  [media] cx25821: Remove bad strcpy to read-only char*
  [media] pms.c: remove duplicated include
  [media] smiapp-core.c: remove duplicated include
  [media] via-camera: pass correct format settings to sensor
  [media] rtl2832.c: minor cleanup
  [media] Add support for the IguanaWorks USB IR Transceiver
  [media] Minor cleanups for MCE USB
  [media] drivers/media/dvb/siano/smscoreapi.c: use list_for_each_entry
  [media] Use a named union in struct v4l2_ioctl_info
  [media] mceusb: Add Twisted Melon USB IDs
  [media] staging/media/solo6x10: use module_pci_driver macro
  [media] staging/media/dt3155v4l: use module_pci_driver macro
  ...

Conflicts:
	Documentation/feature-removal-schedule.txt
2012-07-31 18:47:44 -07:00
Linus Torvalds 6dbb35b0a7 NFS client updates for Linux 3.6
Features include:
 - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
   separate modules.
 - Fix Oopses in the NFSv4 idmapper
 - Fix a deadlock whereby rpciod tries to allocate a new socket and
   ends up recursing into the NFS code due to memory reclaim.
 - Increase the number of permitted callback connections.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQGF+vAAoJEGcL54qWCgDy6ncP/jKVl/Yjz6i1b/8QtC0EmYFb
 XNwbjZicNupVM98XPLm+sfdWxvoGiHSMbwG8t/hx5Z6CteM18adeGNO1KqP9xQyg
 scPvFmqj5VJNHsHztcHIFHFYdpsuJqRKcW3TA2l8AkNOm6uBcnXArRosYN6LrXik
 hI/jWcyv8wZuooSQrLf463JK37t/s6LuUQo6jKP4582sbANHvPdLkHFCYg3yt1Sk
 2IIo2CLLs2cJUswGJnsHT76Jxfvh21NdGtvydjVjpr4H0/LY5GykBc23AAL9TWj6
 KlegDO902hLzEE93xazF59hQZrPuwBi3quEMJ3X0mjdNQWb4G96s/t2hmwpTjWyc
 mjpRsuaGlCXDxcrI5dnU52hqKa0Ju1ipbLpghxSVfilRy998xvGp5Yc6ADU+pYnt
 uP09lamCyjFEa+gDSqGt7lv4dFmdPJjWZdkXLPv0ah5sXVMYAT8NRLUfiE1+hLLu
 zKaM84PHSEvykFeJ2WvjDTgF3L55y3x6L3LN4UkKmfO5nqQf7csPcquU1NfHh25S
 jjKGq2SKGoYG1l6FwK3hemup5cnFUEX78E2QeLrmaHg92fJDGrz587i6MPc5jrSe
 sOSX/CpuCJd4VhodIo8T00me0GZSHEU7RP2KEc518ZvrPmz13avXeLeG9nYhjuxM
 cV9Ex8zh2Y4aiUXCy3uA
 =KSix
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull second wave of NFS client updates from Trond Myklebust:

 - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
   separate modules.

 - Fix Oopses in the NFSv4 idmapper

 - Fix a deadlock whereby rpciod tries to allocate a new socket and ends
   up recursing into the NFS code due to memory reclaim.

 - Increase the number of permitted callback connections.

* tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: explicitly reject LOCK_MAND flock() requests
  nfs: increase number of permitted callback connections.
  SUNRPC: return negative value in case rpcbind client creation error
  NFS: Convert v4 into a module
  NFS: Convert v3 into a module
  NFS: Convert v2 into a module
  NFS: Keep module parameters in the generic NFS client
  NFS: Split out remaining NFS v4 inode functions
  NFS: Pass super operations and xattr handlers in the nfs_subversion
  NFS: Only initialize the ACL client in the v3 case
  NFS: Create a try_mount rpc op
  NFS: Remove the NFS v4 xdev mount function
  NFS: Add version registering framework
  NFS: Fix a number of bugs in the idmapper
  nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
  sunrpc: clarify comments on rpc_make_runnable
  pnfsblock: bail out partial page IO
2012-07-31 18:45:44 -07:00
Linus Torvalds fd37ce34bd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking update from David S. Miller:
 "I think Eric Dumazet and I have dealt with all of the known routing
  cache removal fallout.  Some other minor fixes all around.

  1) Fix RCU of cached routes, particular of output routes which require
     liberation via call_rcu() instead of call_rcu_bh().  From Eric
     Dumazet.

  2) Make sure we purge net device references in cached routes properly.

  3) TG3 driver bug fixes from Michael Chan.

  4) Fix reported 'expires' value in ipv6 routes, from Li Wei.

  5) TUN driver ioctl leaks kernel bytes to userspace, from Mathias
     Krause."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
  ipv4: Properly purge netdev references on uncached routes.
  ipv4: Cache routes in nexthop exception entries.
  ipv4: percpu nh_rth_output cache
  ipv4: Restore old dst_free() behavior.
  bridge: make port attributes const
  ipv4: remove rt_cache_rebuild_count
  net: ipv4: fix RCU races on dst refcounts
  net: TCP early demux cleanup
  tun: Fix formatting.
  net/tun: fix ioctl() based info leaks
  tg3: Update version to 3.124
  tg3: Fix race condition in tg3_get_stats64()
  tg3: Add New 5719 Read DMA workaround
  tg3: Fix Read DMA workaround for 5719 A0.
  tg3: Request APE_LOCK_PHY before PHY access
  ipv6: fix incorrect route 'expires' value passed to userspace
  mISDN: Bugfix only few bytes are transfered on a connection
  seeq: use PTR_RET at init_module of driver
  bnx2x: remove cast around the kmalloc in bnx2x_prev_mark_path
  ipv4: clean up put_child
  ...
2012-07-31 18:43:13 -07:00
Devendra Naga 437ea90cc3 rtc/rtc-88pm80x: remove unneed devm_kfree
devm_kzalloc() doesn't need a matching devm_kfree(), the freeing mechanism
will trigger when driver unloads.

Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Ashish Jangam <ashish.jangam@kpitcummins.com>
Cc: David Dajun Chen <dchen@diasemi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:50 -07:00
Devendra Naga 7ead55119b rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
At the probe we are assigning ret to return value of PTR_ERR right after
the rtc_register_drive()r, as we would have done it in the if
(IS_ERR(ptr)) check, since the function fails and goes inside that case

Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Ashish Jangam <ashish.jangam@kpitcummins.com>
Cc: David Dajun Chen <dchen@diasemi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:50 -07:00
Mel Gorman d833352a43 mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
If a process creates a large hugetlbfs mapping that is eligible for page
table sharing and forks heavily with children some of whom fault and
others which destroy the mapping then it is possible for page tables to
get corrupted.  Some teardowns of the mapping encounter a "bad pmd" and
output a message to the kernel log.  The final teardown will trigger a
BUG_ON in mm/filemap.c.

This was reproduced in 3.4 but is known to have existed for a long time
and goes back at least as far as 2.6.37.  It was probably was introduced
in 2.6.20 by [39dde65c: shared page table for hugetlb page].  The messages
look like this;

[  ..........] Lots of bad pmd messages followed by this
[  127.164256] mm/memory.c:391: bad pmd ffff880412e04fe8(80000003de4000e7).
[  127.164257] mm/memory.c:391: bad pmd ffff880412e04ff0(80000003de6000e7).
[  127.164258] mm/memory.c:391: bad pmd ffff880412e04ff8(80000003de0000e7).
[  127.186778] ------------[ cut here ]------------
[  127.186781] kernel BUG at mm/filemap.c:134!
[  127.186782] invalid opcode: 0000 [#1] SMP
[  127.186783] CPU 7
[  127.186784] Modules linked in: af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf ext3 jbd dm_mod coretemp crc32c_intel usb_storage ghash_clmulni_intel aesni_intel i2c_i801 r8169 mii uas sr_mod cdrom sg iTCO_wdt iTCO_vendor_support shpchp serio_raw cryptd aes_x86_64 e1000e pci_hotplug dcdbas aes_generic container microcode ext4 mbcache jbd2 crc16 sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit ehci_hcd ahci libahci usbcore rtc_cmos usb_common button i2c_core intel_agp video intel_gtt fan processor thermal thermal_sys hwmon ata_generic pata_atiixp libata scsi_mod
[  127.186801]
[  127.186802] Pid: 9017, comm: hugetlbfs-test Not tainted 3.4.0-autobuild #53 Dell Inc. OptiPlex 990/06D7TR
[  127.186804] RIP: 0010:[<ffffffff810ed6ce>]  [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
[  127.186809] RSP: 0000:ffff8804144b5c08  EFLAGS: 00010002
[  127.186810] RAX: 0000000000000001 RBX: ffffea000a5c9000 RCX: 00000000ffffffc0
[  127.186811] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff88042dfdad00
[  127.186812] RBP: ffff8804144b5c18 R08: 0000000000000009 R09: 0000000000000003
[  127.186813] R10: 0000000000000000 R11: 000000000000002d R12: ffff880412ff83d8
[  127.186814] R13: ffff880412ff83d8 R14: 0000000000000000 R15: ffff880412ff83d8
[  127.186815] FS:  00007fe18ed2c700(0000) GS:ffff88042dce0000(0000) knlGS:0000000000000000
[  127.186816] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  127.186817] CR2: 00007fe340000503 CR3: 0000000417a14000 CR4: 00000000000407e0
[  127.186818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  127.186819] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  127.186820] Process hugetlbfs-test (pid: 9017, threadinfo ffff8804144b4000, task ffff880417f803c0)
[  127.186821] Stack:
[  127.186822]  ffffea000a5c9000 0000000000000000 ffff8804144b5c48 ffffffff810ed83b
[  127.186824]  ffff8804144b5c48 000000000000138a 0000000000001387 ffff8804144b5c98
[  127.186825]  ffff8804144b5d48 ffffffff811bc925 ffff8804144b5cb8 0000000000000000
[  127.186827] Call Trace:
[  127.186829]  [<ffffffff810ed83b>] delete_from_page_cache+0x3b/0x80
[  127.186832]  [<ffffffff811bc925>] truncate_hugepages+0x115/0x220
[  127.186834]  [<ffffffff811bca43>] hugetlbfs_evict_inode+0x13/0x30
[  127.186837]  [<ffffffff811655c7>] evict+0xa7/0x1b0
[  127.186839]  [<ffffffff811657a3>] iput_final+0xd3/0x1f0
[  127.186840]  [<ffffffff811658f9>] iput+0x39/0x50
[  127.186842]  [<ffffffff81162708>] d_kill+0xf8/0x130
[  127.186843]  [<ffffffff81162812>] dput+0xd2/0x1a0
[  127.186845]  [<ffffffff8114e2d0>] __fput+0x170/0x230
[  127.186848]  [<ffffffff81236e0e>] ? rb_erase+0xce/0x150
[  127.186849]  [<ffffffff8114e3ad>] fput+0x1d/0x30
[  127.186851]  [<ffffffff81117db7>] remove_vma+0x37/0x80
[  127.186853]  [<ffffffff81119182>] do_munmap+0x2d2/0x360
[  127.186855]  [<ffffffff811cc639>] sys_shmdt+0xc9/0x170
[  127.186857]  [<ffffffff81410a39>] system_call_fastpath+0x16/0x1b
[  127.186858] Code: 0f 1f 44 00 00 48 8b 43 08 48 8b 00 48 8b 40 28 8b b0 40 03 00 00 85 f6 0f 88 df fe ff ff 48 89 df e8 e7 cb 05 00 e9 d2 fe ff ff <0f> 0b 55 83 e2 fd 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 65 e0
[  127.186868] RIP  [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
[  127.186870]  RSP <ffff8804144b5c08>
[  127.186871] ---[ end trace 7cbac5d1db69f426 ]---

The bug is a race and not always easy to reproduce.  To reproduce it I was
doing the following on a single socket I7-based machine with 16G of RAM.

$ hugeadm --pool-pages-max DEFAULT:13G
$ echo $((18*1048576*1024)) > /proc/sys/kernel/shmmax
$ echo $((18*1048576*1024)) > /proc/sys/kernel/shmall
$ for i in `seq 1 9000`; do ./hugetlbfs-test; done

On my particular machine, it usually triggers within 10 minutes but
enabling debug options can change the timing such that it never hits.
Once the bug is triggered, the machine is in trouble and needs to be
rebooted.  The machine will respond but processes accessing proc like "ps
aux" will hang due to the BUG_ON.  shutdown will also hang and needs a
hard reset or a sysrq-b.

The basic problem is a race between page table sharing and teardown.  For
the most part page table sharing depends on i_mmap_mutex.  In some cases,
it is also taking the mm->page_table_lock for the PTE updates but with
shared page tables, it is the i_mmap_mutex that is more important.

Unfortunately it appears to be also insufficient. Consider the following
situation

Process A					Process B
---------					---------
hugetlb_fault					shmdt
  						LockWrite(mmap_sem)
    						  do_munmap
						    unmap_region
						      unmap_vmas
						        unmap_single_vma
						          unmap_hugepage_range
      						            Lock(i_mmap_mutex)
							    Lock(mm->page_table_lock)
							    huge_pmd_unshare/unmap tables <--- (1)
							    Unlock(mm->page_table_lock)
      						            Unlock(i_mmap_mutex)
  huge_pte_alloc				      ...
    Lock(i_mmap_mutex)				      ...
    vma_prio_walk, find svma, spte		      ...
    Lock(mm->page_table_lock)			      ...
    share spte					      ...
    Unlock(mm->page_table_lock)			      ...
    Unlock(i_mmap_mutex)			      ...
  hugetlb_no_page									  <--- (2)
						      free_pgtables
						        unlink_file_vma
							hugetlb_free_pgd_range
						    remove_vma_list

In this scenario, it is possible for Process A to share page tables with
Process B that is trying to tear them down.  The i_mmap_mutex on its own
does not prevent Process A walking Process B's page tables.  At (1) above,
the page tables are not shared yet so it unmaps the PMDs.  Process A sets
up page table sharing and at (2) faults a new entry.  Process B then trips
up on it in free_pgtables.

This patch fixes the problem by adding a new function
__unmap_hugepage_range_final that is only called when the VMA is about to
be destroyed.  This function clears VM_MAYSHARE during
unmap_hugepage_range() under the i_mmap_mutex.  This makes the VMA
ineligible for sharing and avoids the race.  Superficially this looks like
it would then be vunerable to truncate and madvise issues but hugetlbfs
has its own truncate handlers so does not use unmap_mapping_range() and
does not support madvise(DONTNEED).

This should be treated as a -stable candidate if it is merged.

Test program is as follows. The test case was mostly written by Michal
Hocko with a few minor changes to reproduce this bug.

==== CUT HERE ====

static size_t huge_page_size = (2UL << 20);
static size_t nr_huge_page_A = 512;
static size_t nr_huge_page_B = 5632;

unsigned int get_random(unsigned int max)
{
	struct timeval tv;

	gettimeofday(&tv, NULL);
	srandom(tv.tv_usec);
	return random() % max;
}

static void play(void *addr, size_t size)
{
	unsigned char *start = addr,
		      *end = start + size,
		      *a;
	start += get_random(size/2);

	/* we could itterate on huge pages but let's give it more time. */
	for (a = start; a < end; a += 4096)
		*a = 0;
}

int main(int argc, char **argv)
{
	key_t key = IPC_PRIVATE;
	size_t sizeA = nr_huge_page_A * huge_page_size;
	size_t sizeB = nr_huge_page_B * huge_page_size;
	int shmidA, shmidB;
	void *addrA = NULL, *addrB = NULL;
	int nr_children = 300, n = 0;

	if ((shmidA = shmget(key, sizeA, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
		perror("shmget:");
		return 1;
	}

	if ((addrA = shmat(shmidA, addrA, SHM_R|SHM_W)) == (void *)-1UL) {
		perror("shmat");
		return 1;
	}
	if ((shmidB = shmget(key, sizeB, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
		perror("shmget:");
		return 1;
	}

	if ((addrB = shmat(shmidB, addrB, SHM_R|SHM_W)) == (void *)-1UL) {
		perror("shmat");
		return 1;
	}

fork_child:
	switch(fork()) {
		case 0:
			switch (n%3) {
			case 0:
				play(addrA, sizeA);
				break;
			case 1:
				play(addrB, sizeB);
				break;
			case 2:
				break;
			}
			break;
		case -1:
			perror("fork:");
			break;
		default:
			if (++n < nr_children)
				goto fork_child;
			play(addrA, sizeA);
			break;
	}
	shmdt(addrA);
	shmdt(addrB);
	do {
		wait(NULL);
	} while (--n > 0);
	shmctl(shmidA, IPC_RMID, NULL);
	shmctl(shmidB, IPC_RMID, NULL);
	return 0;
}

[akpm@linux-foundation.org: name the declaration's args, fix CONFIG_HUGETLBFS=n build]
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:50 -07:00
Nathan Zimmer 09c231cb8b tmpfs: distribute interleave better across nodes
When tmpfs has the interleave memory policy, it always starts allocating
for each file from node 0 at offset 0.  When there are many small files,
the lower nodes fill up disproportionately.

This patch spreads out node usage by starting files at nodes other than 0,
by using the inode number to bias the starting node for interleave.

Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:50 -07:00
Minchan Kim 6527af5d1b mm: remove redundant initialization
pg_data_t is zeroed before reaching free_area_init_core(), so remove the
now unnecessary initializations.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:50 -07:00