Commit graph

32667 commits

Author SHA1 Message Date
Linus Torvalds d92581fcad Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "These are assorted fixes, mostly from Josef nailing down xfstests
  runs.  Zach also has a long standing fix for problems with readdir
  wrapping f_pos (or ctx->pos)

  These patches were spread out over different bases, so I rebased
  things on top of rc4 and retested overnight"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: don't loop on large offsets in readdir
  Btrfs: check to see if root_list is empty before adding it to dead roots
  Btrfs: release both paths before logging dir/changed extents
  Btrfs: allow splitting of hole em's when dropping extent cache
  Btrfs: make sure the backref walker catches all refs to our extent
  Btrfs: fix backref walking when we hit a compressed extent
  Btrfs: do not offset physical if we're compressed
  Btrfs: fix extent buffer leak after backref walking
  Btrfs: fix a bug of snapshot-aware defrag to make it work on partial extents
  btrfs: fix file truncation if FALLOC_FL_KEEP_SIZE is specified
2013-08-10 15:21:47 -07:00
Linus Torvalds b8ea0d06ff NFS client bugfixes for 3.11
- Stable patch for lockd to fix Oopses due to inappropriate calls to
   utsname()->nodename
 - Stable patches for sunrpc to fix Oopses on shutdown when using
   AF_LOCAL sockets with rpcbind
 - Fix memory leak and error checking issues in nfs4_proc_lookup_mountpoint
 - Fix a regression with the sync mount option failing to work for nfs4 mounts
 - Fix a writeback performance issue when doing cache invalidation
 - Remove an incorrect call to nfs_setsecurity in nfs_fhget
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJSBRqqAAoJEGcL54qWCgDyejoP/RNfkebEex2ugAAfymMOUNjA
 qc8X/YO8KDQD5T0X8P9MFXzgSSMPT5YfemxfKDpkjJlYwij8O9D8r1BAfgmhhKFI
 bVp9J8pH5Y3jXgLFVsubEc9N9CXLI0yRzcOFfzapYmGJV/ZO4cBAi8HMowFtbREj
 l2uk9H8XQ1p8KqVWXhFKoXVL2b9q0CGj5z3+RkE/rD5uuZyusbTB6OhNFArPnwXk
 aCx373mlvpIAhU6DkueCiXaHH06Mff2Vlu6eBkGrdfBGC6l+x1nwevxYH750fqSU
 8O94rQkw9++qnIIvBJF/g1NzqyDhychJcXtgGLdxYUdWH3c8tevJZxCEj7U/dIJQ
 ndEaZGFxSFfdnxrwJBWtB+xsEfe9K4no9JwlkyVi8oZ5j2NUv7cJpA5cdQ4IUf/1
 uqTlIxtPHQhHWCUUKpGLlhLiZyvwOPtJvuBl/Pc9UYQbyNtSqjYBk9mamrrIC6FK
 mF6jXgWe9x+miBqWYrEdPNLGdx/hUhhqGweYPJa6jTcxif+2l2xGfscWYI89Io/e
 qy8YNcHUrRci+o+YfY4lLhk88WBZogzFOYc4jDaLCEL1TE5B2k/jHr9v39V5S7Ks
 63bhmfCTxB82uMzFeUWbiPzWQzt030pXPYy/PUPaucy/+QaZC/0lZ3HJKRKI37JJ
 ygSD7ndCZJCG+PxLkk49
 =W12x
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.11-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:

 - Stable patch for lockd to fix Oopses due to inappropriate calls to
   utsname()->nodename

 - Stable patches for sunrpc to fix Oopses on shutdown when using
   AF_LOCAL sockets with rpcbind

 - Fix memory leak and error checking issues in nfs4_proc_lookup_mountpoint

 - Fix a regression with the sync mount option failing to work for nfs4
   mounts

 - Fix a writeback performance issue when doing cache invalidation

 - Remove an incorrect call to nfs_setsecurity in nfs_fhget

* tag 'nfs-for-3.11-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix up nfs4_proc_lookup_mountpoint
  NFS: Remove unnecessary call to nfs_setsecurity in nfs_fhget()
  NFSv4: Fix the sync mount option for nfs4 mounts
  NFS: Fix writeback performance issue on cache invalidation
  SUNRPC: If the rpcbind channel is disconnected, fail the call to unregister
  SUNRPC: Don't auto-disconnect from the local rpcbind socket
  LOCKD: Don't call utsname()->nodename from nlmclnt_setlockargs
2013-08-10 15:20:37 -07:00
Linus Torvalds 022e5d098b Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
 "Some fixes for a 4.1 feature that in retrospect probably should have
  waited for 3.12....  But it appears to be working now"

* 'for-3.11' of git://linux-nfs.org/~bfields/linux:
  nfsd: Fix SP4_MACH_CRED negotiation in EXCHANGE_ID
  nfsd4: Fix MACH_CRED NULL dereference
2013-08-10 15:19:58 -07:00
Zach Brown db62efbbf8 btrfs: don't loop on large offsets in readdir
When btrfs readdir() hits the last entry it sets the readdir offset to a
huge value to stop buggy apps from breaking when the same name is
returned by readdir() with concurrent rename()s.

But unconditionally setting the offset to INT_MAX causes readdir() to
loop returning any entries with offsets past INT_MAX.  It only takes a
few hours of constant file creation and removal to create entries past
INT_MAX.

So let's set the huge offset to LLONG_MAX if the last entry has already
overflowed 32bit loff_t.   Without large offsets behaviour is identical.
With large offsets 64bit apps will work and 32bit apps will be no more
broken than they currently are if they see large offsets.

Signed-off-by: Zach Brown <zab@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-08-09 19:34:56 -04:00
Josef Bacik cfad392b22 Btrfs: check to see if root_list is empty before adding it to dead roots
A user reported a panic when running with autodefrag and deleting snapshots.
This is because we could end up trying to add the root to the dead roots list
twice.  To fix this check to see if we are empty before adding ourselves to the
dead roots list.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-08-09 19:30:23 -04:00
Josef Bacik f3b15ccdbb Btrfs: release both paths before logging dir/changed extents
The ceph guys tripped over this bug where we were still holding onto the
original path that we used to copy the inode with when logging.  This is based
on Chris's fix which was reported to fix the problem.  We need to drop the paths
in two cases anyway so just move the drop up so that we don't have duplicate
code.  Thanks,

Cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-08-09 19:30:16 -04:00
Josef Bacik ee20a98314 Btrfs: allow splitting of hole em's when dropping extent cache
I noticed while running multi-threaded fsync tests that sometimes fsck would
complain about an improper gap.  This happens because we fail to add a hole
extent to the file, which was happening when we'd split a hole EM because
btrfs_drop_extent_cache was just discarding the whole em instead of splitting
it.  So this patch fixes this by allowing us to split a hole em properly, which
means that added holes actually get logged properly and we no longer see this
fsck error.  Thankfully we're tolerant of these sort of problems so a user would
not see any adverse effects of this bug, other than fsck complaining.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-08-09 19:30:09 -04:00
Josef Bacik ed8c4913da Btrfs: make sure the backref walker catches all refs to our extent
Because we don't mess with the offset into the extent for compressed we will
properly find both extents for this case

[extent a][extent b][rest of extent a]

but because we already added a ref for the front half we won't add the inode
information for the second half.  This causes us to leak that memory and not
print out the other offset when we do logical-resolve.  So fix this by calling
ulist_add_merge and then add our eie to the existing entry if there is one.
With this patch we get both offsets out of logical-resolve.  With this and the
other 2 patches I've sent we now pass btrfs/276 on my vm with compress-force=lzo
set.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-08-09 19:30:03 -04:00
Josef Bacik 8ca15e05e6 Btrfs: fix backref walking when we hit a compressed extent
If you do btrfs inspect-internal logical-resolve on a compressed extent that has
been partly overwritten it won't find anything.  This is because we try and
match the extent offset we've searched for based on the extent offset in the
data extent entry.  However this doesn't work for compressed extents because the
offsets are for the uncompressed size, not the compressed size.  So instead only
do this check if we are not compressed, that way we can get an actual entry for
the physical offset rather than nothing for compressed.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-08-09 19:29:56 -04:00
Josef Bacik b76bb70136 Btrfs: do not offset physical if we're compressed
xfstest btrfs/276 was freaking out on slower boxes partly because fiemap was
offsetting the physical based on the extent offset.  This is perfectly fine with
uncompressed extents, however the extent offset is into the uncompressed area,
not the compressed.  So we can return a physical value that isn't at all within
the area we have allocated on disk.  Fix this by returning the start of the
extent if it is compressed no matter what the offset.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-08-09 19:29:50 -04:00
Liu Bo b5b9b5b318 Btrfs: fix extent buffer leak after backref walking
commit 47fb091fb787420cd195e66f162737401cce023f(Btrfs: fix unlock after free on rewinded tree blocks)
takes an extra increment on the reference of allocated dummy extent buffer, so now we
cannot free this dummy one, and end up with extent buffer leak.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-08-09 19:29:42 -04:00
Liu Bo e68afa49ae Btrfs: fix a bug of snapshot-aware defrag to make it work on partial extents
For partial extents, snapshot-aware defrag does not work as expected,
since
a) we use the wrong logical offset to search for parents, which should be
   disk_bytenr + extent_offset, not just disk_bytenr,
b) 'offset' returned by the backref walking just refers to key.offset, not
   the 'offset' stored in btrfs_extent_data_ref which is
   (key.offset - extent_offset).

The reproducer:
$ mkfs.btrfs sda
$ mount sda /mnt
$ btrfs sub create /mnt/sub
$ for i in `seq 5 -1 1`; do dd if=/dev/zero of=/mnt/sub/foo bs=5k count=1 seek=$i conv=notrunc oflag=sync; done
$ btrfs sub snap /mnt/sub /mnt/snap1
$ btrfs sub snap /mnt/sub /mnt/snap2
$ sync; btrfs filesystem defrag /mnt/sub/foo;
$ umount /mnt
$ btrfs-debug-tree sda (Here we can check whether the defrag operation is snapshot-awared.

This addresses the above two problems.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-08-09 19:29:17 -04:00
Jie Liu 7cddc19392 btrfs: fix file truncation if FALLOC_FL_KEEP_SIZE is specified
Create a small file and fallocate it to a big size with
FALLOC_FL_KEEP_SIZE option, then truncate it back to the
small size again, the disk free space is not changed back
in this case. i.e,

total 4
-rw-r--r-- 1 root root 512 Jun 28 11:35 test

Filesystem      Size  Used Avail Use% Mounted on
....
/dev/sdb1       8.0G   56K  7.2G   1% /mnt

-rw-r--r-- 1 root root 512 Jun 28 11:35 /mnt/test

Filesystem      Size  Used Avail Use% Mounted on
....
/dev/sdb1       8.0G  5.1G  2.2G  70% /mnt

Filesystem      Size  Used Avail Use% Mounted on
....
/dev/sdb1       8.0G  5.1G  2.2G  70% /mnt

With this fix, the truncated up space is back as:
Filesystem      Size  Used Avail Use% Mounted on
....
/dev/sdb1       8.0G   56K  7.2G   1% /mnt

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-08-09 19:28:56 -04:00
Oleg Nesterov 201d3dfa4d dlm: kill the unnecessary and wrong device_close()->recalc_sigpending()
device_close()->recalc_sigpending() is not needed, sigprocmask() takes
care of TIF_SIGPENDING correctly.

And without ->siglock it is racy and wrong, it can wrongly clear
TIF_SIGPENDING and miss a signal.

But even with this patch device_close() is still buggy:

  1. sigprocmask() should not be used, we have set_task_blocked(),
     but this is minor.

  2. We should never block SIGKILL or SIGSTOP, and this is what
     the code tries to do.

  3. This can't protect against SIGKILL or SIGSTOP anyway. Another
     thread can do signal_wake_up(), say, do_signal_stop() or
     complete_signal() or debugger.

  4. sigprocmask(SIG_BLOCK, allsigs) doesn't necessarily clears
     TIF_SIGPENDING, say, freezing() or ->jobctl.

  5. device_write() looks equally wrong by the same reason.

Looks like, this tries to protect some wait_event_interruptible() logic
from signals, it should be turned into uninterruptible wait.  Or we need
to implement something like signals_stop/start for such a use-case.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-09 10:48:20 -07:00
Linus Torvalds 55f5bfd4c9 ext4 bugfixes for v3.11-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJR+YUoAAoJENNvdpvBGATwxsgQAJzxe8x1SWW8jrvasMELW3bS
 I7gl3lu789GROmZo5MI6+XI1DSCjHcMTisoPnPGx8q9LNhiYQGjp9Loxgl+L+Cem
 wbDfkSHrhxtDoEw2rdpPb/hFHOpsv8t6TT53GVbdc5l/lxwyvdrblXxhWbLuTRCQ
 G0X2VoxvTph7SyVPpRqoU4U99vW3g6LRI0dsxyU8kLc/kp7/t89gwh6Gj0wlb6tb
 4pswvEuxK4HauMPjajCIaEeGpQoHbvCVpB4P9HvEtL/wklc5hNIAWy5+KYjMFWil
 8Vc6RsJv6DZc66z+X4PT1oYrx6bkhNKNRGBETV2/dzeLU/7o2ttUIQ0vlOlN9gKV
 xSne19OKwqhhRMuWrc3LDzunlFiLex2zHbZJVDOeKNXmEJVpO9ZPVkvpoi/g0vOE
 DPwcc89J+763CVmXu8uyPqgQRfJTp2dXNggP3lINCDi1mFayrBqR0S6rQ1B2m/6R
 8HYDjJ6gBpXBcs0FTGh1cfbpI8GpPx9Xu3Q/FtjvczEXUO3831KCm+k7O3cUjB72
 I7I3wmyYQvUanOc39fg+EZ0cXpiSIZG4Mjv+8vWgADFlCMUKMaC+vp2PuKmBxTuk
 7dAkxKr2kXfMirGWF1R/vo/CaKnCWmJbAJDFJ8Mt/Nk3zn3NOl/kCRtxndlj7k+m
 /s8DHlVLyD8dgvzRO9gp
 =xqB3
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bugfixes from Ted Ts'o.

Misc ext4 fixes, delayed by Ted moving mail servers and email getting
marked as spam due to bad spf records.

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: add WARN_ON to check the length of allocated blocks
  ext4: fix retry handling in ext4_ext_truncate()
  ext4: destroy ext4_es_cachep on module unload
  ext4: make sure group number is bumped after a inode allocation race
2013-08-08 09:38:19 -07:00
Trond Myklebust b72888cb0b NFSv4: Fix up nfs4_proc_lookup_mountpoint
Currently, we do not check the return value of client = rpc_clone_client(),
nor do we shut down the resulting cloned rpc_clnt in the case where a
NFS4ERR_WRONGSEC has caused nfs4_proc_lookup_common() to replace the
original value of 'client' (causing a memory leak).

Fix both issues and simplify the code by moving the call to
rpc_clone_client() until after nfs4_proc_lookup_common() has
done its business.

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-07 20:47:26 -04:00
Trond Myklebust eddffa4084 NFS: Remove unnecessary call to nfs_setsecurity in nfs_fhget()
We only need to call it on the creation of the inode.

Reported-by: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Steve Dickson <SteveD@redhat.com>
Cc: Dave Quigley <dpquigl@davequigley.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-07 17:07:41 -04:00
Scott Mayhew e890db0104 NFSv4: Fix the sync mount option for nfs4 mounts
The sync mount option stopped working for NFSv4 mounts after commit
c02d7adf8c (NFSv4: Replace nfs4_path_walk() with
FS path lookup in a private namespace).  If MS_SYNCHRONOUS is set in the
super_block that we're cloning from, then it should be set in the new
super_block as well.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-07 17:07:41 -04:00
Trond Myklebust f8806c843f NFS: Fix writeback performance issue on cache invalidation
If a cache invalidation is triggered, and we happen to have a lot of
writebacks cached at the time, then the call to invalidate_inode_pages2()
will end up calling ->launder_page() on each and every dirty page in order
to sync its contents to disk, thus defeating write coalescing.
The following patch ensures that we try to sync the inode to disk before
calling invalidate_inode_pages2() so that we do the writeback as efficiently
as possible.

Reported-by: William Dauchy <william@gandi.net>
Reported-by: Pascal Bouchareine <pascal@gandi.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: William Dauchy <william@gandi.net>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2013-08-07 17:07:40 -04:00
Linus Torvalds b7bc9e7d80 Oleg Nesterov has been working hard in closing all the holes that can
lead to race conditions between deleting an event and accessing an event
 debugfs file. This included a fix to the debugfs system (acked by
 Greg Kroah-Hartman). We think that all the holes have been patched and
 hopefully we don't find more. I haven't marked all of them for stable
 because I need to examine them more to figure out how far back some of
 the changes need to go.
 
 Along the way, some other fixes have been made. Alexander Z Lam fixed
 some logic where the wrong buffer was being modifed.
 
 Andrew Vagin found a possible corruption for machines that actually
 allocate cpumask, as a reference to one was being zeroed out by mistake.
 
 Dhaval Giani found a bad prototype when tracing is not configured.
 
 And I not only had some changes to help Oleg, but also finally fixed
 a long standing bug that Dave Jones and others have been hitting, where
 a module unload and reload can cause the function tracing accounting
 to get screwed up.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJR/7FvAAoJEOdOSU1xswtMCAgH/RpxS5mQcQnhrGfu1qnSk+3v
 CyL1X9IkvgP5f+N59tbU3ZuE8Q3YQvTII35na38fgbHbwWrhYjLSvlf3Pwtre8zr
 Rjm9b8zA6iSobHu3DyuuupzMsLE+SpBakVSmG6mi8izZyuCi0YHMZMnHTGNnW9Vv
 YFqkfGgQQWMqiUHLcueetOqWAjR2JiJaLfr47rsRLNflF6dB2MbG/7YbYJ0TBk7E
 hemVmR7JH7606eJZ6nR7bh/hYv0sL2SSqVVaOHO5nWFLFbI8CybpNVGcoD+nihZY
 M7LrZT1C1mn3nKrfDhyXy4dwwx6wBON0fY23eJTHMVNnc0tvY1xqH52vTdPILWg=
 =UfsE
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-3.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Oleg Nesterov has been working hard in closing all the holes that can
  lead to race conditions between deleting an event and accessing an
  event debugfs file.  This included a fix to the debugfs system (acked
  by Greg Kroah-Hartman).  We think that all the holes have been patched
  and hopefully we don't find more.  I haven't marked all of them for
  stable because I need to examine them more to figure out how far back
  some of the changes need to go.

  Along the way, some other fixes have been made.  Alexander Z Lam fixed
  some logic where the wrong buffer was being modifed.

  Andrew Vagin found a possible corruption for machines that actually
  allocate cpumask, as a reference to one was being zeroed out by
  mistake.

  Dhaval Giani found a bad prototype when tracing is not configured.

  And I not only had some changes to help Oleg, but also finally fixed a
  long standing bug that Dave Jones and others have been hitting, where
  a module unload and reload can cause the function tracing accounting
  to get screwed up"

* tag 'trace-fixes-3.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix reset of time stamps during trace_clock changes
  tracing: Make TRACE_ITER_STOP_ON_FREE stop the correct buffer
  tracing: Fix trace_dump_stack() proto when CONFIG_TRACING is not set
  tracing: Fix fields of struct trace_iterator that are zeroed by mistake
  tracing/uprobes: Fail to unregister if probe event files are in use
  tracing/kprobes: Fail to unregister if probe event files are in use
  tracing: Add comment to describe special break case in probe_remove_event_call()
  tracing: trace_remove_event_call() should fail if call/file is in use
  debugfs: debugfs_remove_recursive() must not rely on list_empty(d_subdirs)
  ftrace: Check module functions being traced on reload
  ftrace: Consolidate some duplicate code for updating ftrace ops
  tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_private
  tracing: Introduce remove_event_file_dir()
  tracing: Change f_start() to take event_mutex and verify i_private != NULL
  tracing: Change event_filter_read/write to verify i_private != NULL
  tracing: Change event_enable/disable_read() to verify i_private != NULL
  tracing: Turn event/id->i_private into call->event.type
2013-08-07 13:01:30 -07:00
Weston Andros Adamson 58cd57bfd9 nfsd: Fix SP4_MACH_CRED negotiation in EXCHANGE_ID
- don't BUG_ON() when not SP4_NONE
 - calculate recv and send reserve sizes correctly

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-08-07 12:06:07 -04:00
J. Bruce Fields c47205914c nfsd4: Fix MACH_CRED NULL dereference
Fixes a NULL-dereference on attempts to use MACH_CRED protection over
auth_sys.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-08-07 12:05:51 -04:00
Trond Myklebust 9a1b6bf818 LOCKD: Don't call utsname()->nodename from nlmclnt_setlockargs
Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
which case we're in entirely the wrong namespace.

Secondly, commit 8aac62706a (move
exit_task_namespaces() outside of exit_notify()) now means that
exit_task_work() is called after exit_task_namespaces(), which
triggers an Oops when we're freeing up the locks.

Fix this by ensuring that we initialise the nlm_host's rpc_client at mount
time, so that the cl_nodename field is initialised to the value of
utsname()->nodename that the net namespace uses. Then replace the
lockd callers of utsname()->nodename.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Toralf Förster <toralf.foerster@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Nix <nix@esperi.org.uk>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 3.10.x
2013-08-05 15:03:46 -04:00
Zheng Liu 3d62c45b38 vfs: add missing check for __O_TMPFILE in fcntl_init()
As comment in include/uapi/asm-generic/fcntl.h described, when
introducing new O_* bits, we need to check its uniqueness in
fcntl_init().  But __O_TMPFILE bit is missing.  So fix it.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-08-05 18:25:32 +04:00
Andy Lutomirski bb2314b479 fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink
Every now and then someone proposes a new flink syscall, and this spawns
a long discussion of whether it would be a security problem.  I think
that this is missing the point: flink is *already* allowed without
privilege as long as /proc is mounted -- it's called AT_SYMLINK_FOLLOW.

Now that O_TMPFILE is here, the ability to create a file with O_TMPFILE,
write it, and link it in is very convenient.  The only problem is that
it requires that /proc be mounted so that you can do:

linkat(AT_FDCWD, "/proc/self/fd/<tmpfd>", dfd, path, AT_SYMLINK_NOFOLLOW)

This sucks -- it's much nicer to do:

linkat(tmpfd, "", dfd, path, AT_EMPTY_PATH)

Let's allow it.

If this turns out to be excessively scary, it we could instead require
that the inode in question be I_LINKABLE, but this seems pointless given
the /proc situation

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-08-05 18:24:11 +04:00
Andy Lutomirski e305f48bc4 fs: Fix file mode for O_TMPFILE
O_TMPFILE, like O_CREAT, should respect the requested mode and should
create regular files.

This fixes two bugs: O_TMPFILE required privilege (because the mode
ended up as 000) and it produced bogus inodes with no type.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-08-05 18:24:10 +04:00
Al Viro 672fe15d09 reiserfs: fix deadlock in umount
Since remove_proc_entry() started to wait for IO in progress (i.e.
since 2007 or so), the locking in fs/reiserfs/proc.c became wrong;
if procfs read happens between the moment when umount() locks the
victim superblock and removal of /proc/fs/reiserfs/<device>/*,
we'll get a deadlock - read will wait for s_umount (in sget(),
called by r_start()), while umount will wait in remove_proc_entry()
for that read to finish, holding s_umount all along.

Fortunately, the same change allows a much simpler race avoidance -
all we need to do is remove the procfs entries in the very beginning
of reiserfs ->kill_sb(); that'll guarantee that pointer to superblock
will remain valid for the duration for procfs IO, so we don't need
sget() to keep the sucker alive.  As the matter of fact, we can
get rid of the home-grown iterator completely, and use single_open()
instead.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-08-05 17:37:37 +04:00
Gu Zheng 62c610460d ocfs2/refcounttree: add the missing NULL check of the return value of find_or_create_page()
Add the missing NULL check of the return value of find_or_create_page() in
function ocfs2_duplicate_clusters_by_page().

[akpm@linux-foundation.org: fix layout, per Joel]
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31 14:41:02 -07:00
Oleg Nesterov 776164c1fa debugfs: debugfs_remove_recursive() must not rely on list_empty(d_subdirs)
debugfs_remove_recursive() is wrong,

1. it wrongly assumes that !list_empty(d_subdirs) means that this
   dir should be removed.

   This is not that bad by itself, but:

2. if d_subdirs does not becomes empty after __debugfs_remove()
   it gives up and silently fails, it doesn't even try to remove
   other entries.

   However ->d_subdirs can be non-empty because it still has the
   already deleted !debugfs_positive() entries.

3. simple_release_fs() is called even if __debugfs_remove() fails.

Suppose we have

	dir1/
		dir2/
			file2
		file1

and someone opens dir1/dir2/file2.

Now, debugfs_remove_recursive(dir1/dir2) succeeds, and dir1/dir2 goes
away.

But debugfs_remove_recursive(dir1) silently fails and doesn't remove
this directory. Because it tries to delete (the already deleted)
dir1/dir2/file2 again and then fails due to "Avoid infinite loop"
logic.

Test-case:

	#!/bin/sh

	cd /sys/kernel/debug/tracing
	echo 'p:probe/sigprocmask sigprocmask' >> kprobe_events
	sleep 1000 < events/probe/sigprocmask/id &
	echo -n >| kprobe_events

	[ -d events/probe ] && echo "ERR!! failed to rm probe"

And after that it is not possible to create another probe entry.

With this patch debugfs_remove_recursive() skips !debugfs_positive()
files although this is not strictly needed. The most important change
is that it does not try to make ->d_subdirs empty, it simply scans
the whole list(s) recursively and removes as much as possible.

Link: http://lkml.kernel.org/r/20130726151256.GC19472@redhat.com

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-31 12:16:31 -04:00
Zheng Liu 44fb851dfb ext4: add WARN_ON to check the length of allocated blocks
In commit 921f266b: ext4: add self-testing infrastructure to do a
sanity check, some sanity checks were added in map_blocks to make sure
'retval == map->m_len'.

Enable these checks by default and report any assertion failures using
ext4_warning() and WARN_ON() since they can help us to figure out some
bugs that are otherwise hard to hit.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-07-29 12:51:42 -04:00
Theodore Ts'o 94eec0fc35 ext4: fix retry handling in ext4_ext_truncate()
We tested for ENOMEM instead of -ENOMEM.   Oops.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2013-07-29 12:12:56 -04:00
Eric Sandeen dd12ed144e ext4: destroy ext4_es_cachep on module unload
Without this, module can't be reloaded.

[  500.521980] kmem_cache_sanity_check (ext4_extent_status): Cache name already exists.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org  # v3.8+
2013-07-26 15:21:11 -04:00
Theodore Ts'o a34eb50374 ext4: make sure group number is bumped after a inode allocation race
When we try to allocate an inode, and there is a race between two
CPU's trying to grab the same inode, _and_ this inode is the last free
inode in the block group, make sure the group number is bumped before
we continue searching the rest of the block groups.  Otherwise, we end
up searching the current block group twice, and we end up skipping
searching the last block group.  So in the unlikely situation where
almost all of the inodes are allocated, it's possible that we will
return ENOSPC even though there might be free inodes in that last
block group.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2013-07-26 15:15:46 -04:00
Linus Torvalds 6c4155a9cd xfs: fix for 3.11-rc3
- fix for regression in commit cca9f93a52, recovery causing filesystem
   corruption after a crash
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJR8ZWnAAoJENaLyazVq6ZOb7cP/iLwa59qX5sHAoYRGterE4Li
 34xPkihJEjcbvNCtK+rznXT9ohSvwTahnrdlLy/bQ6d0K1gBX3j1DD4cpTvGRWJR
 hEBbQU0PXhXjRL6ixgxfeNPfbEfNMYhiTFfjPhBjKVgzYN3NnJZ1lv8zTHaeQ8JP
 m7dEKrrg/J8LsW18fq2E0p/SjKi7cT1mEf8jkcYu0UGYd7yDtQSukMbEjfsIJq9L
 DpB3QXHQkUf1UlVdUvLncGmcDUPAEt+8/ae9uUpY2nxHv+7jmzAoCyRUCTDYsIh2
 gznQjsns56B2FWfnkyzXC3nMaoyIZpT8Fy3FRBsQRKGboOeOPS+/Yyzf/FcLQ8Jl
 yMXA0oR+3Ft7wJ62+aSuP3/dug8TbBk09bI+RqV4D+GwM7n7kLE/Fo3kQLva5Aqf
 rZIhwzfBDl51vxRzm4I29wOkfvQRXndy4c0hYtfeVy0lBA2yCFLSlzGha5EX+CxM
 s1kbpOkuOOE5k5Mgjve/iIKbwG3OKEuPCrESJPG+sTREAkkXkycnVQft2ihJYgg8
 yIgPG4fxpIIpwTdC016YAa/raOm/unIG6ko+ec3m2rB2lmo8j3vQOjoIFuV0KYV1
 enzhK5F+sQJl9evQOgfJc+uOMgjs1DrE38hnlQ8rc3LXa5Dtb7ReMRAT7z2FxicF
 keAPwJNrMlwgIyYi+3B+
 =hrYY
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.11-rc3' of git://oss.sgi.com/xfs/xfs

Pull xfs fix from Ben Myers:
 "Fix for regression in commit cca9f93a52 ("xfs: don't do IO when
  creating an new inode"), recovery causing filesystem corruption after
  a crash"

* tag 'for-linus-v3.11-rc3' of git://oss.sgi.com/xfs/xfs:
  xfs: di_flushiter considered harmful
2013-07-26 11:22:54 -07:00
Linus Torvalds f315cf5e02 Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linux
Pull nfsd fix from Bruce Fields:
 "One more nfsd bugfix for 3.11"

* 'for-3.11' of git://linux-nfs.org/~bfields/linux:
  nfsd: nfsd_open: when dentry_open returns an error do not propagate as struct file
2013-07-26 11:21:43 -07:00
Dave Chinner e1b4271ac2 xfs: di_flushiter considered harmful
When we made all inode updates transactional, we no longer needed
the log recovery detection for inodes being newer on disk than the
transaction being replayed - it was redundant as replay of the log
would always result in the latest version of the inode would be on
disk. It was redundant, but left in place because it wasn't
considered to be a problem.

However, with the new "don't read inodes on create" optimisation,
flushiter has come back to bite us. Essentially, the optimisation
made always initialises flushiter to zero in the create transaction,
and so if we then crash and run recovery and the inode already on
disk has a non-zero flushiter it will skip recovery of that inode.
As a result, log recovery does the wrong thing and we end up with a
corrupt filesystem.

Because we have to support old kernel to new kernel upgrades, we
can't just get rid of the flushiter support in log recovery as we
might be upgrading from a kernel that doesn't have fully transactional
inode updates.  Unfortunately, for v4 superblocks there is no way to
guarantee that log recovery knows about this fact.

We cannot add a new inode format flag to say it's a "special inode
create" because it won't be understood by older kernels and so
recovery could do the wrong thing on downgrade. We cannot specially
detect the combination of zero mode/non-zero flushiter on disk to
non-zero mode, zero flushiter in the log item during recovery
because wrapping of the flushiter can result in false detection.

Hence that makes this "don't use flushiter" optimisation limited to
a disk format that guarantees that we don't need it. And that means
the only fix here is to limit the "no read IO on create"
optimisation to version 5 superblocks....

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit e60896d8f2)
2013-07-25 10:41:42 -05:00
Linus Torvalds 55c62960b0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse bugfixes from Miklos Szeredi:
 "These are bugfixes and a cleanup to the "readdirplus" feature"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: readdirplus: cleanup
  fuse: readdirplus: change attributes once
  fuse: readdirplus: fix instantiate
  fuse: readdirplus: sanity checks
  fuse: readdirplus: fix dentry leak
2013-07-23 14:37:04 -07:00
Trond Myklebust 4f3cc4809a NFSv4: Fix brainfart in attribute length calculation
The calculation of the attribute length was 4 bytes off.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reported-and-tested-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-23 14:24:59 -07:00
Harshula Jayasuriya e4daf1ffbe nfsd: nfsd_open: when dentry_open returns an error do not propagate as struct file
The following call chain:
------------------------------------------------------------
nfs4_get_vfs_file
- nfsd_open
  - dentry_open
    - do_dentry_open
      - __get_file_write_access
        - get_write_access
          - return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
------------------------------------------------------------

can result in the following state:
------------------------------------------------------------
struct nfs4_file {
...
  fi_fds = {0xffff880c1fa65c80, 0xffffffffffffffe6, 0x0},
  fi_access = {{
      counter = 0x1
    }, {
      counter = 0x0
    }},
...
------------------------------------------------------------

1) First time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NULL, hence nfsd_open() is called where we get status set to an error
and fp->fi_fds[O_WRONLY] to -ETXTBSY. Thus we do not reach
nfs4_file_get_access() and fi_access[O_WRONLY] is not incremented.

2) Second time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NOT NULL (-ETXTBSY), so nfsd_open() is NOT called, but
nfs4_file_get_access() IS called and fi_access[O_WRONLY] is incremented.
Thus we leave a landmine in the form of the nfs4_file data structure in
an incorrect state.

3) Eventually, when __nfs4_file_put_access() is called it finds
fi_access[O_WRONLY] being non-zero, it decrements it and calls
nfs4_file_put_fd() which tries to fput -ETXTBSY.
------------------------------------------------------------
...
     [exception RIP: fput+0x9]
     RIP: ffffffff81177fa9  RSP: ffff88062e365c90  RFLAGS: 00010282
     RAX: ffff880c2b3d99cc  RBX: ffff880c2b3d9978  RCX: 0000000000000002
     RDX: dead000000100101  RSI: 0000000000000001  RDI: ffffffffffffffe6
     RBP: ffff88062e365c90   R8: ffff88041fe797d8   R9: ffff88062e365d58
     R10: 0000000000000008  R11: 0000000000000000  R12: 0000000000000001
     R13: 0000000000000007  R14: 0000000000000000  R15: 0000000000000000
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #9 [ffff88062e365c98] __nfs4_file_put_access at ffffffffa0562334 [nfsd]
 #10 [ffff88062e365cc8] nfs4_file_put_access at ffffffffa05623ab [nfsd]
 #11 [ffff88062e365ce8] free_generic_stateid at ffffffffa056634d [nfsd]
 #12 [ffff88062e365d18] release_open_stateid at ffffffffa0566e4b [nfsd]
 #13 [ffff88062e365d38] nfsd4_close at ffffffffa0567401 [nfsd]
 #14 [ffff88062e365d88] nfsd4_proc_compound at ffffffffa0557f28 [nfsd]
 #15 [ffff88062e365dd8] nfsd_dispatch at ffffffffa054543e [nfsd]
 #16 [ffff88062e365e18] svc_process_common at ffffffffa04ba5a4 [sunrpc]
 #17 [ffff88062e365e98] svc_process at ffffffffa04babe0 [sunrpc]
 #18 [ffff88062e365eb8] nfsd at ffffffffa0545b62 [nfsd]
 #19 [ffff88062e365ee8] kthread at ffffffff81090886
 #20 [ffff88062e365f48] kernel_thread at ffffffff8100c14a
------------------------------------------------------------

Cc: stable@vger.kernel.org
Signed-off-by: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-23 12:15:32 -04:00
Zheng Liu dda5690def ext3: fix a BUG when opening a file with O_TMPFILE flag
When we try to open a file with O_TMPFILE flag, we will trigger a bug.
The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and
this check always fails because we set ->i_nlink = 1 in
inode_init_always().  We can use the following program to trigger it:

int main(int argc, char *argv[])
{
	int fd;

	fd = open(argv[1], O_TMPFILE, 0666);
	if (fd < 0) {
		perror("open ");
		return -1;
	}
	close(fd);
	return 0;
}

The oops message looks like this:

kernel: kernel BUG at fs/ext3/namei.c:1992!
kernel: invalid opcode: 0000 [#1] SMP
kernel: Modules linked in: ext4 jbd2 crc16 cpufreq_ondemand ipv6 dm_mirror dm_region_hash dm_log dm_mod parport_pc parport serio_raw sg dcdbas pcspkr i2c_i801 ehci_pci ehci_hcd button acpi_cpufreq mperf e1000e ptp pps_core ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core ext3 jbd sd_mod ahci libahci libata scsi_mod uhci_hcd
kernel: CPU: 0 PID: 2882 Comm: tst_tmpfile Not tainted 3.11.0-rc1+ #4
kernel: Hardware name: Dell Inc. OptiPlex 780 /0V4W66, BIOS A05 08/11/2010
kernel: task: ffff880112d30050 ti: ffff8801124d4000 task.ti: ffff8801124d4000
kernel: RIP: 0010:[<ffffffffa00db5ae>] [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3]
kernel: RSP: 0018:ffff8801124d5cc8  EFLAGS: 00010202
kernel: RAX: 0000000000000000 RBX: ffff880111510128 RCX: ffff8801114683a0
kernel: RDX: 0000000000000000 RSI: ffff880111510128 RDI: ffff88010fcf65a8
kernel: RBP: ffff8801124d5d18 R08: 0080000000000000 R09: ffffffffa00d3b7f
kernel: R10: ffff8801114683a0 R11: ffff8801032a2558 R12: 0000000000000000
kernel: R13: ffff88010fcf6800 R14: ffff8801032a2558 R15: ffff8801115100d8
kernel: FS:  00007f5d172b5700(0000) GS:ffff880117c00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
kernel: CR2: 00007f5d16df15d0 CR3: 0000000110b1d000 CR4: 00000000000407f0
kernel: Stack:
kernel: 000000000000000c ffff8801048a7dc8 ffff8801114685a8 ffffffffa00b80d7
kernel: ffff8801124d5e38 ffff8801032a2558 ffff88010ce24d68 0000000000000000
kernel: ffff88011146b300 ffff8801124d5d44 ffff8801124d5d78 ffffffffa00db7e1
kernel: Call Trace:
kernel: [<ffffffffa00b80d7>] ? journal_start+0x8c/0xbd [jbd]
kernel: [<ffffffffa00db7e1>] ext3_tmpfile+0xb2/0x13b [ext3]
kernel: [<ffffffff821076f8>] path_openat+0x11f/0x5e7
kernel: [<ffffffff821c86b4>] ? list_del+0x11/0x30
kernel: [<ffffffff82065fa2>] ?  __dequeue_entity+0x33/0x38
kernel: [<ffffffff82107cd5>] do_filp_open+0x3f/0x8d
kernel: [<ffffffff82112532>] ? __alloc_fd+0x50/0x102
kernel: [<ffffffff820f9296>] do_sys_open+0x13b/0x1cd
kernel: [<ffffffff820f935c>] SyS_open+0x1e/0x20
kernel: [<ffffffff82398c02>] system_call_fastpath+0x16/0x1b
kernel: Code: 39 c7 0f 85 67 01 00 00 0f b7 03 25 00 f0 00 00 3d 00 40 00 00 74 18 3d 00 80 00 00 74 11 3d 00 a0 00 00 74 0a 83 7b 48 00 74 04 <0f> 0b eb fe 49 8b 85 50 03 00 00 4c 89 f6 48 c7 c7 c0 99 0e a0
kernel: RIP  [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3]
kernel: RSP <ffff8801124d5cc8>

Here we couldn't call clear_nlink() directly because in d_tmpfile() we
will call inode_dec_link_count() to decrease ->i_nlink.  So this commit
tries to call d_tmpfile() before ext4_orphan_add() to fix this problem.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
2013-07-20 22:03:20 -04:00
Zheng Liu e94bd3490f ext4: fix a BUG when opening a file with O_TMPFILE flag
When we try to open a file with O_TMPFILE flag, we will trigger a bug.
The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and
this check always fails because we set ->i_nlink = 1 in
inode_init_always().  We can use the following program to trigger it:

int main(int argc, char *argv[])
{
	int fd;

	fd = open(argv[1], O_TMPFILE, 0666);
	if (fd < 0) {
		perror("open ");
		return -1;
	}
	close(fd);
	return 0;
}

The oops message looks like this:

kernel BUG at fs/ext4/namei.c:2572!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dlci bridge stp hidp cmtp kernelcapi l2tp_ppp l2tp_netlink l2tp_core sctp libcrc32c rfcomm tun fuse nfnetli
nk can_raw ipt_ULOG can_bcm x25 scsi_transport_iscsi ipx p8023 p8022 appletalk phonet psnap vmw_vsock_vmci_transport af_key vmw_vmci rose vsock atm can netrom ax25 af_rxrpc ir
da pppoe pppox ppp_generic slhc bluetooth nfc rfkill rds caif_socket caif crc_ccitt af_802154 llc2 llc snd_hda_codec_realtek snd_hda_intel snd_hda_codec serio_raw snd_pcm pcsp
kr edac_core snd_page_alloc snd_timer snd soundcore r8169 mii sr_mod cdrom pata_atiixp radeon backlight drm_kms_helper ttm
CPU: 1 PID: 1812571 Comm: trinity-child2 Not tainted 3.11.0-rc1+ #12
Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
task: ffff88007dfe69a0 ti: ffff88010f7b6000 task.ti: ffff88010f7b6000
RIP: 0010:[<ffffffff8125ce69>]  [<ffffffff8125ce69>] ext4_orphan_add+0x299/0x2b0
RSP: 0018:ffff88010f7b7cf8  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff8800966d3020 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88007dfe70b8 RDI: 0000000000000001
RBP: ffff88010f7b7d40 R08: ffff880126a3c4e0 R09: ffff88010f7b7ca0
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801271fd668
R13: ffff8800966d2f78 R14: ffff88011d7089f0 R15: ffff88007dfe69a0
FS:  00007f70441a3740(0000) GS:ffff88012a800000(0000) knlGS:00000000f77c96c0
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000002834000 CR3: 0000000107964000 CR4: 00000000000007e0
DR0: 0000000000780000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Stack:
 0000000000002000 00000020810b6dde 0000000000000000 ffff88011d46db00
 ffff8800966d3020 ffff88011d7089f0 ffff88009c7f4c10 ffff88010f7b7f2c
 ffff88007dfe69a0 ffff88010f7b7da8 ffffffff8125cfac ffff880100000004
Call Trace:
 [<ffffffff8125cfac>] ext4_tmpfile+0x12c/0x180
 [<ffffffff811cba78>] path_openat+0x238/0x700
 [<ffffffff8100afc4>] ? native_sched_clock+0x24/0x80
 [<ffffffff811cc647>] do_filp_open+0x47/0xa0
 [<ffffffff811db73f>] ? __alloc_fd+0xaf/0x200
 [<ffffffff811ba2e4>] do_sys_open+0x124/0x210
 [<ffffffff81010725>] ? syscall_trace_enter+0x25/0x290
 [<ffffffff811ba3ee>] SyS_open+0x1e/0x20
 [<ffffffff816ca8d4>] tracesys+0xdd/0xe2
 [<ffffffff81001001>] ? start_thread_common.constprop.6+0x1/0xa0
Code: 04 00 00 00 89 04 24 31 c0 e8 c4 77 04 00 e9 43 fe ff ff 66 25 00 d0 66 3d 00 80 0f 84 0e fe ff ff 83 7b 48 00 0f 84 04 fe ff ff <0f> 0b 49 8b 8c 24 50 07 00 00 e9 88 fe ff ff 0f 1f 84 00 00 00

Here we couldn't call clear_nlink() directly because in d_tmpfile() we
will call inode_dec_link_count() to decrease ->i_nlink.  So this commit
tries to call d_tmpfile() before ext4_orphan_add() to fix this problem.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Tested-by: Darrick J. Wong <darrick.wong@oracle.com>
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-20 21:58:38 -04:00
Linus Torvalds 36231d255b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "The sget() one is a long-standing bug and will need to go into -stable
  (in fact, it had been originally caught in RHEL6), the other two are
  3.11-only"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: constify dentry parameter in d_count()
  livelock avoidance in sget()
  allow O_TMPFILE to work with O_WRONLY
2013-07-20 10:50:01 -07:00
Linus Torvalds 19bf1c2c7b Fixes for 3.11-rc2, sent at 5pm, in the professoinal style. :-)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJR6cmlAAoJENNvdpvBGATwZF0P/0a7ET511UJwQbgAIq5ftFlj
 86Bzvy28xo2T85t64L+Ib2XDehWHk0sZlQpB/gK8MLYn4rCRWCxkQAshKwoequsC
 AhuvQ7NtX9vJNCSR30+RrLhkvj6UKsMuM724adARLBUgMBoScABzZImR1e14ELah
 bN27a4Bk2aNUpNX68QYdQX3TGiHGZy//lNmh81JTxFS3Moqm6bIZAJbYpOslATsI
 Q5nti/TjQJKso2gF7Jx7NffXv0g5rGxaVQEZJPpfIv1Vs0b6vabK/sYp608ayM0K
 qKyjJABaHR1Pzb16V82ZqvSlsHm/ARhCF1nMM6gQ8nwl/plxcQ6Jvd/qJsNej3b/
 7Jfm86xLe+G0G5oeNEJXsoEFAsvxug6ZRMfyoRHaPlGIksmz+Jc9kzTtM3qzdzOB
 5OPJwlONlM4dRVA6rgb7KiuE3h/sRt4CctFejD0f6mUqKa+B+zyHq/a/8a+60IqQ
 /sDiTQrqrI6LWxECFasDNoGxtnvVtKC21jbg+MTzumZDvjgnJIFFe5NrinI6SB9x
 VQYVq/vVkE576VTwGAttTg3s4sRwQKd/iuQjuoP76iFFHvq/sNX6fBq0NW5gpsj2
 WAfH+fLQsMcVJ2MAcc3DwdBT1wQbLu+Y19hv4TDOZRmnKGhq9K08hzWR4tIUKdFJ
 UcjWk35Wuoz1IGpVlHJ5
 =ngfz
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bugfixes from Ted Ts'o:
 "Fixes for 3.11-rc2, sent at 5pm, in the professoinal style.  :-)"

I'm not sure I like this new level of "professionalism".
9-5, people, 9-5.

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: call ext4_es_lru_add() after handling cache miss
  ext4: yield during large unlinks
  ext4: make the extent_status code more robust against ENOMEM failures
  ext4: simplify calculation of blocks to free on error
  ext4: fix error handling in ext4_ext_truncate()
2013-07-20 10:48:59 -07:00
Linus Torvalds 3be542d464 NFS client bugfixes for 3.11
- Fix a regression against NFSv4 FreeBSD servers when creating a new file
 - Fix another regression in rpc_client_register()
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJR6ZKMAAoJEGcL54qWCgDyQX8P/19LKLNKcL+y2zVGjLbXMTq0
 TpyWdBO0ux7QcqnPEDg+Jpvu62IowYiKTtaSOXtHb5BNjQMBo2RKw3B0eMBoCp/z
 6gHmQRD2hMgqwBxBwHceV+dNwueCUiZW7GqaaNh6/3bpGQefegdONnLEifuPogEu
 oZmEuiVrGDfITEF7D4k5+shXCQN4eNH0LFuIQo4XXdCqmK6PwvOsidZ7YwHVC3Mg
 /Jzda2YsCxHj8kPi1xb9skPPAn6g4kdfYfyr/xSY7IviPixrkg/nEEK1b8xHU81e
 a0dd0Yx5kq6fR8LsBvQCHdj2m7doHM15jf5Np5G7VnnaWEjB2y+QftkxWc9lCNU3
 t2fr9YVD7ZG/GGNSFePHAHmBY0OqDB1Htp4vcwEQfzX6CAR3Hel82WVvut62Z6m4
 G5qHjwdqUFhmRN//SWlDpEqSn+pbeCvPhQS60ayN0TLivRsscm/I4yA75odAnn9b
 4su1IcUpqeJGeV6yDyMUqbx4kYZFyCZg/DNkThXiTKOs47A7ogSS9ev2fTB/V+jd
 rroNHNd/U508ze9D6D4ai9vR78uUp4wKNSSBZMCkBtNh0uSApOTgyGVhertB1EKS
 vgAr4T1tc+9t+0qg1Sb+hbKyBM/KaS5zUrPn+APHPoBXPh5PSVBzeNJkpxHRw/V0
 ZxkEgSQKLZSXYb5ab770
 =XE+7
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Fix a regression against NFSv4 FreeBSD servers when creating a new
   file
 - Fix another regression in rpc_client_register()

* tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix a regression against the FreeBSD server
  SUNRPC: Fix another issue with rpc_client_register()
2013-07-20 10:48:24 -07:00
Linus Torvalds 90290c4ebe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next
Pull btrfs fixes from Josef Bacik:
 "I'm playing the role of Chris Mason this week while he's on vacation.
  There are a few critical fixes for btrfs here, all regressions and
  have been tested well"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next:
  Btrfs: fix wrong write offset when replacing a device
  Btrfs: re-add root to dead root list if we stop dropping it
  Btrfs: fix lock leak when resuming snapshot deletion
  Btrfs: update drop progress before stopping snapshot dropping
2013-07-20 10:47:38 -07:00
Al Viro acfec9a5a8 livelock avoidance in sget()
Eric Sandeen has found a nasty livelock in sget() - take a mount(2) about
to fail.  The superblock is on ->fs_supers, ->s_umount is held exclusive,
->s_active is 1.  Along comes two more processes, trying to mount the same
thing; sget() in each is picking that superblock, bumping ->s_count and
trying to grab ->s_umount.  ->s_active is 3 now.  Original mount(2)
finally gets to deactivate_locked_super() on failure; ->s_active is 2,
superblock is still ->fs_supers because shutdown will *not* happen until
->s_active hits 0.  ->s_umount is dropped and now we have two processes
chasing each other:
s_active = 2, A acquired ->s_umount, B blocked
A sees that the damn thing is stillborn, does deactivate_locked_super()
s_active = 1, A drops ->s_umount, B gets it
A restarts the search and finds the same superblock.  And bumps it ->s_active.
s_active = 2, B holds ->s_umount, A blocked on trying to get it
... and we are in the earlier situation with A and B switched places.

The root cause, of course, is that ->s_active should not grow until we'd
got MS_BORN.  Then failing ->mount() will have deactivate_locked_super()
shut the damn thing down.  Fortunately, it's easy to do - the key point
is that grab_super() is called only for superblocks currently on ->fs_supers,
so it can bump ->s_count and grab ->s_umount first, then check MS_BORN and
bump ->s_active; we must never increment ->s_count for superblocks past
->kill_sb(), but grab_super() is never called for those.

The bug is pretty old; we would've caught it by now, if not for accidental
exclusion between sget() for block filesystems; the things like cgroup or
e.g. mtd-based filesystems don't have anything of that sort, so they get
bitten.  The right way to deal with that is obviously to fix sget()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-20 04:58:58 +04:00
Al Viro ba57ea64cb allow O_TMPFILE to work with O_WRONLY
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-20 03:11:32 +04:00
Linus Torvalds 89a8c5940d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "An update for the BFP jit to the latest and greatest, two patches to
  get kdump working again, the random-abort ptrace extention for
  transactional execution, the z90crypt module alias for ap and a tiny
  cleanup"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/zcrypt: Alias for new zcrypt device driver base module
  s390/kdump: Allow copy_oldmem_page() copy to virtual memory
  s390/kdump: Disable mmap for s390
  s390/bpf,jit: add pkt_type support
  s390/bpf,jit: address randomize and write protect jit code
  s390/bpf,jit: use generic jit dumper
  s390/bpf,jit: call module_free() from any context
  s390/qdio: remove unused variable
  s390/ptrace: PTRACE_TE_ABORT_RAND
2013-07-19 15:08:12 -07:00
Stefan Behrens 115930cb2d Btrfs: fix wrong write offset when replacing a device
Miao Xie reported the following issue:

The filesystem was corrupted after we did a device replace.

Steps to reproduce:
 # mkfs.btrfs -f -m single -d raid10 <device0>..<device3>
 # mount <device0> <mnt>
 # btrfs replace start -rfB 1 <device4> <mnt>
 # umount <mnt>
 # btrfsck <device4>

The reason for the issue is that we changed the write offset by mistake,
introduced by commit 625f1c8dc.

We read the data from the source device at first, and then write the
data into the corresponding place of the new device. In order to
implement the "-r" option, the source location is remapped using
btrfs_map_block(). The read takes place on the mapped location, and
the write needs to take place on the unmapped location. Currently
the write is using the mapped location, and this commit changes it
back by undoing the change to the write address that the aforementioned
commit added by mistake.

Reported-by: Miao Xie <miaox@cn.fujitsu.com>
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-07-19 15:07:26 -04:00
Josef Bacik d29a9f629e Btrfs: re-add root to dead root list if we stop dropping it
If we stop dropping a root for whatever reason we need to add it back to the
dead root list so that we will re-start the dropping next transaction commit.
The other case this happens is if we recover a drop because we will add a root
without adding it to the fs radix tree, so we can leak it's root and commit root
extent buffer, adding this to the dead root list makes this cleanup happen.
Thanks,

Cc: stable@vger.kernel.org
Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-07-19 15:07:19 -04:00