1
0
Fork 0
Commit Graph

33569 Commits (f10d9ba405ed250d647d6d810f8fc5ff2f62f370)

Author SHA1 Message Date
Steve French f10d9ba405 Fix unused variable warning when CIFS POSIX disabled
Fix unused variable warning when CONFIG_CIFS_POSIX disabled.

   fs/cifs/ioctl.c: In function 'cifs_ioctl':
>> fs/cifs/ioctl.c:40:8: warning: unused variable 'ExtAttrMask' [-Wunused-variable]
     __u64 ExtAttrMask = 0;
           ^
Pointed out by 0-DAY kernel build testing backend

Signed-off-by: Steve French <smfrench@gmail.com>
2013-11-02 12:52:48 -05:00
Steve French c7f508a99b Allow setting per-file compression via CIFS protocol
An earlier patch allowed setting the per-file compression flag

"chattr +c filename"

on an smb2 or smb3 mount, and also allowed lsattr to return
whether a file on a cifs, or smb2/smb3 mount was compressed.

This patch extends the ability to set the per-file
compression flag to the cifs protocol, which uses a somewhat
different IOCTL mechanism than SMB2, although the payload
(the flags stored in the compression_state) are the same.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-11-02 12:52:44 -05:00
Steven French af6a12ea8d Query File System Alignment
In SMB3 it is now possible to query the file system
alignment info, and the preferred (for performance)
sector size and whether the underlying disk
has no seek penalty (like SSD).

Query this information at mount time for SMB3,
and make it visible in /proc/fs/cifs/DebugData
for debugging purposes.

This alignment information and preferred sector
size info will be helpful for the copy offload
patches to setup the right chunks in the CopyChunk
requests.   Presumably the knowledge that the
underlying disk is SSD could also help us
make better readahead and writebehind
decisions (something to look at in the future).

Signed-off-by: Steve French <smfrench@gmail.com>
2013-11-02 12:52:41 -05:00
Steven French 2167114c6e Query device characteristics at mount time from server on SMB2/3 not just on cifs mounts
Currently SMB2 and SMB3 mounts do not query the device information at mount time
from the server as is done for cifs.  These can be useful for debugging.
This is a minor patch, that extends the previous one (which added ability to
query file system attributes at mount time - this returns the device
characteristics - also via in /proc/fs/cifs/DebugData)

Signed-off-by: Steve French <smfrench@gmail.com>
2013-11-02 12:52:38 -05:00
Shirish Pargaonkar 7f48558e64 cifs: Send a logoff request before removing a smb session
Send a smb session logoff request before removing smb session off of the list.
On a signed smb session, remvoing a session off of the list before sending
a logoff request results in server returning an error for lack of
smb signature.

Never seen an error during smb logoff, so as per MS-SMB2 3.2.5.1,
not sure how an error during logoff should be retried. So for now,
if a server returns an error to a logoff request, log the error and
remove the session off of the list.

Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-11-02 12:52:35 -05:00
Tim Gardner 3d378d3fd8 cifs: Make big endian multiplex ID sequences monotonic on the wire
The multiplex identifier (MID) in the SMB header is only
ever used by the client, in conjunction with PID, to match responses
from the server. As such, the endianess of the MID is not important.
However, When tracing packet sequences on the wire, protocol analyzers
such as wireshark display MID as little endian. It is much more informative
for the on-the-wire MID sequences to match debug information emitted by the
CIFS driver.  Therefore, one should write and read MID in the SMB header
assuming it is always little endian.

Observed from wireshark during the protocol negotiation
and session setup:

        Multiplex ID: 256
        Multiplex ID: 256
        Multiplex ID: 512
        Multiplex ID: 512
        Multiplex ID: 768
        Multiplex ID: 768

After this patch on-the-wire MID values begin at 1 and increase monotonically.

Introduce get_next_mid64() for the internal consumers that use the full 64 bit
multiplex identifier.

Introduce the helpers get_mid() and compare_mid() to make the endian
translation clear.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Tim Gardner <timg@tpi.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-11-02 12:51:53 -05:00
Tim Gardner 944d6f1a5b cifs: Remove redundant multiplex identifier check from check_smb_hdr()
The only call site for check_smb_header() assigns 'mid' from the SMB
packet, which is then checked again in check_smb_header(). This seems
like redundant redundancy.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Tim Gardner <timg@tpi.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-28 09:31:36 -05:00
Steve French 34f626406c Query file system attributes from server on SMB2, not just cifs, mounts
Currently SMB2 and SMB3 mounts do not query the file system attributes
from the server at mount time as is done for cifs.  These can be useful for debugging.

Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-28 09:22:55 -05:00
Steve French 64a5cfa6db Allow setting per-file compression via SMB2/3
Allow cifs/smb2/smb3 to return whether or not a file is compressed
via lsattr, and allow SMB2/SMB3 to set the per-file compression
flag ("chattr +c filename" on an smb3 mount).

Windows users often set the compressed flag (it can be
done from the desktop and file manager).  David Disseldorp
has patches to Samba server to support this (at least on btrfs)
which are complementary to this

Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-28 09:22:31 -05:00
Steve French 7ff8d45c9d Fix corrupt SMB2 ioctl requests
We were off by one calculating the length of ioctls in some cases
because the protocol specification for SMB2 ioctl includes a mininum
one byte payload but not all SMB2 ioctl requests actually have
a data buffer to send. We were also not zeroing out the
return buffer (in case of error this is helpful).

Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-28 09:21:36 -05:00
Linus Torvalds f55ac56d5e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes (try two) from Al Viro:
 "nfsd performance regression fix + seq_file lseek(2) fix"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  seq_file: always update file->f_pos in seq_lseek()
  nfsd regression since delayed fput()
2013-10-25 18:16:47 +01:00
Gu Zheng 05e16745c0 seq_file: always update file->f_pos in seq_lseek()
This issue was first pointed out by Jiaxing Wang several months ago, but no
further comments:
https://lkml.org/lkml/2013/6/29/41

As we know pread() does not change f_pos, so after pread(), file->f_pos
and m->read_pos become different. And seq_lseek() does not update file->f_pos
if offset equals to m->read_pos, so after pread() and seq_lseek()(lseek to
m->read_pos), then a subsequent read may read from a wrong position, the
following program produces the problem:

    char str1[32] = { 0 };
    char str2[32] = { 0 };
    int poffset = 10;
    int count = 20;

    /*open any seq file*/
    int fd = open("/proc/modules", O_RDONLY);

    pread(fd, str1, count, poffset);
    printf("pread:%s\n", str1);

    /*seek to where m->read_pos is*/
    lseek(fd, poffset+count, SEEK_SET);

    /*supposed to read from poffset+count, but this read from position 0*/
    read(fd, str2, count);
    printf("read:%s\n", str2);

out put:
pread:
 ck_netbios_ns 12665
read:
 nf_conntrack_netbios

/proc/modules:
nf_conntrack_netbios_ns 12665 0 - Live 0xffffffffa038b000
nf_conntrack_broadcast 12589 1 nf_conntrack_netbios_ns, Live 0xffffffffa0386000

So we always update file->f_pos to offset in seq_lseek() to fix this issue.

Signed-off-by: Jiaxing Wang <hello.wjx@gmail.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-25 10:46:40 -04:00
Linus Torvalds 88829dfe4b Two important fixes
- Fix long standing memory leak in the (rarely used) public key support
 - Fix large file corruption on 32 bit architectures
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABCgAGBQJSaX/HAAoJENaSAD2qAscKdpQQAI6Rvsv5y/Gj+8/9rCUnNYhw
 8YWYkOko2+cyGl6ro+nIm2nmKOuaGrjijvubOjOAe4WkMzS0EyJjku/9NT3S6KzC
 SqHC0ZeZf0jaFC9zUkUN69RY9m96Ak94HAagXO3Qm39DCSj8xijxODOVnVzkEs2x
 ylOU8OgRbD/AIDzmLxgHaOtuAmQ0GNvbVoYK6ZErVmOMENU2/67iH3OsyGD4OFpr
 Oaq1i8m7rxPmwv3QNSGhXSK6EScqs2jgM4aPWx3aG+OhYv6sGWkL8jJgPS/uSUBc
 ttD1Ou/d9yyvZPDFd9wmiHhenbCVbEdl6JAIS8zKv4NkSQ3V7AVWwAoe6JMfbREo
 U+Om7FwGLgKlZ/19+IxBMGTITuOjUkKq97vJMiYbXuWzdrZSflv5GiGGKbxchmnA
 CnfYaN1HYVcpLsbXoDTBomML7VTtbifgmY0diUJ2aJ1eTg86Gs1DXjhnuLF70Jjd
 dfuYfOKkJguuRfZ50yrpWfEQ0iOudXI1v+PrramLof33lNKWI8XeKjgDxyUrAjOZ
 UjFT639EXIRzYDIOCPZicQKdNO3BRziKi1cSnXQQp9cNTMs6/FIxK2zrQmjgqvww
 Hwj+M6czLs45lbfjQIxi3FlEAYYdXBQwrEiAu4cmt9j1bxIZnwIa7Fu0bXSxphfD
 dUo0GN7CkF45BkNvotFX
 =74EV
 -----END PGP SIGNATURE-----

Merge tag 'ecryptfs-3.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull ecryptfs fixes from Tyler Hicks:
 "Two important fixes
   - Fix long standing memory leak in the (rarely used) public key
     support
   - Fix large file corruption on 32 bit architectures"

* tag 'ecryptfs-3.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: fix 32 bit corruption issue
  ecryptfs: Fix memory leakage in keystore.c
2013-10-25 07:32:01 +01:00
Colin Ian King 43b7c6c6a4 eCryptfs: fix 32 bit corruption issue
Shifting page->index on 32 bit systems was overflowing, causing
data corruption of > 4GB files. Fix this by casting it first.

https://launchpad.net/bugs/1243636

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reported-by: Lars Duesing <lars.duesing@camelotsweb.de>
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2013-10-24 12:36:30 -07:00
Randy Dunlap 69c88dc7d9 vfs: fix new kernel-doc warnings
Move kernel-doc notation to immediately before its function to eliminate
kernel-doc warnings introduced by commit db14fc3abc ("vfs: add
d_walk()")

  Warning(fs/dcache.c:1343): No description found for parameter 'data'
  Warning(fs/dcache.c:1343): No description found for parameter 'dentry'
  Warning(fs/dcache.c:1343): Excess function parameter 'parent' description in 'check_mount'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-22 12:02:40 +01:00
Randy Dunlap 606d6fe3ff fs/namei.c: fix new kernel-doc warning
Add @path parameter to fix kernel-doc warning.
Also fix a spello/typo.

  Warning(fs/namei.c:2304): No description found for parameter 'path'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-22 12:02:40 +01:00
Linus Torvalds d24fec3991 Just a patch to fix an oops in an error path.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJSZVtaAAoJEDaohF61QIxkQwoP/2uqO2kg0b0ndR2pyCeUIu6a
 uMZ5/dC1DZ8CEVPLudu5Cb6mdS646rUEv4MjfZx6z7tJBWv0QpesiSnZN0vDlP3i
 Mj8iA/JckzbZv734Y7RQzpVfN+k/BOG/8YMrEQY3c9loD9yOzqGazOF6OK38O1E8
 CLQ2HeX0sigCdlYQOe9Lx8D0QiRlx91Yx8GH41wzAy5HGIWlJ2TxFLPf0upS1OPl
 PzH0G5mnS6apUndIxobk/z8w5q40+x2MWXG8aXNZflro7h4gp9L5DyfzaO/1dZV0
 WgS9zbjAOJKx8N0eAA1Z0PyNJ2i2/BLlpsw/6asm5CwEqMp134TCvv53oaihaIK/
 0P9Z4auXXuqKAc3Ok31HhGnWUwEhcY9TYRNqnH6dYGcg0YfQAWRpGdHPK7yFf85g
 MoTcgCqrcI9V4bxdECCdGTA798FOocuo2ShMeABJ73Zl97W3c0e91cAA2dPJ0N8+
 LaqmdP0cb0T5pJjbdQ2uDgQOK2JkoKQgkeilHHndRYT6cM+R4BFKTlft3ga/0ZLn
 GVubFNrL/T6rHVmK7014GvvX5NgsRzWd2yK01NYZGQFe/aOs0Eb86ed2R08X/+lh
 q9lmrvHZ6ATU9XvQsFMynnOLBWEMcPCC5rBEilUS70GIz8GENoG58XcBf4d2adiB
 5cDZlF5/v2BBDUt8vjK5
 =bh3d
 -----END PGP SIGNATURE-----

Merge tag 'jfs-3.12' of git://github.com/kleikamp/linux-shaggy

Pull jfs bugfix from David Kleikamp:
 "Just a patch to fix an oops in an error path"

* tag 'jfs-3.12' of git://github.com/kleikamp/linux-shaggy:
  jfs: fix error path in ialloc
2013-10-22 09:01:11 +01:00
Al Viro c7314d74fc nfsd regression since delayed fput()
Background: nfsd v[23] had throughput regression since delayed fput
went in; every read or write ends up doing fput() and we get a pair
of extra context switches out of that (plus quite a bit of work
in queue_work itselfi, apparently).  Use of schedule_delayed_work()
gives it a chance to accumulate a bit before we do __fput() on all
of them.  I'm not too happy about that solution, but... on at least
one real-world setup it reverts about 10% throughput loss we got from
switch to delayed fput.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-20 08:44:39 -04:00
Linus Torvalds bdeeab62a6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fix from Chris Mason:
 "Sage hit a deadlock with ceph on btrfs, and Josef tracked it down to a
  regression in our initial rc1 pull.  When doing nocow writes we were
  sometimes starting a transaction with locks held"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: release path before starting transaction in can_nocow_extent
2013-10-18 16:46:21 -07:00
Josef Bacik 1bda19eb73 Btrfs: release path before starting transaction in can_nocow_extent
We can't be holding tree locks while we try to start a transaction, we will
deadlock.  Thanks,

Reported-by: Sage Weil <sage@inktank.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-18 12:43:40 -04:00
Linus Torvalds 04919afb85 Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "Five small cifs fixes (includes fixes for: unmount hang, 2 security
  related, symlink, large file writes)"

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: ntstatus_to_dos_map[] is not terminated
  cifs: Allow LANMAN auth method for servers supporting unencapsulated authentication methods
  cifs: Fix inability to write files >2GB to SMB2/3 shares
  cifs: Avoid umount hangs with smb2 when server is unresponsive
  do not treat non-symlink reparse points as valid symlinks
2013-10-17 18:49:21 -07:00
Linus Torvalds 056cdce0d3 Merge branch 'akpm' (fixes from Andrew Morton)
Merge misc fixes from Andrew Morton.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits)
  mm: revert mremap pud_free anti-fix
  mm: fix BUG in __split_huge_page_pmd
  swap: fix set_blocksize race during swapon/swapoff
  procfs: call default get_unmapped_area on MMU-present architectures
  procfs: fix unintended truncation of returned mapped address
  writeback: fix negative bdi max pause
  percpu_refcount: export symbols
  fs: buffer: move allocation failure loop into the allocator
  mm: memcg: handle non-error OOM situations more gracefully
  tools/testing/selftests: fix uninitialized variable
  block/partitions/efi.c: treat size mismatch as a warning, not an error
  mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages
  mm/zswap: bugfix: memory leak when re-swapon
  mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages
  mm: migration: do not lose soft dirty bit if page is in migration state
  gcov: MAINTAINERS: Add an entry for gcov
  mm/hugetlb.c: correct missing private flag clearing
  mm/vmscan.c: don't forget to free shrinker->nr_deferred
  ipc/sem.c: synchronize semop and semctl with IPC_RMID
  ipc: update locking scheme comments
  ...
2013-10-16 21:36:03 -07:00
HATAYAMA Daisuke fad1a86e25 procfs: call default get_unmapped_area on MMU-present architectures
Commit c4fe244857 ("sparc: fix PCI device proc file mmap(2)") added
proc_reg_get_unmapped_area in proc_reg_file_ops and
proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
get_unmapped_area method is not defined for the target procfs file,
which causes regression of mmap on /proc/vmcore.

To address this issue, like get_unmapped_area(), call default
current->mm->get_unmapped_area on MMU-present architectures if
pde->proc_fops->get_unmapped_area, i.e.  the one in actual file
operation in the procfs file, is not defined.

Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16 21:35:53 -07:00
HATAYAMA Daisuke 2cbe3b0af8 procfs: fix unintended truncation of returned mapped address
Currently, proc_reg_get_unmapped_area truncates upper 32-bit of the
mapped virtual address returned from get_unmapped_area method in
pde->proc_fops due to the variable rv of signed integer on x86_64.  This
is too small to have vitual address of unsigned long on x86_64 since on
x86_64, signed integer is of 4 bytes while unsigned long is of 8 bytes.
To fix this issue, use unsigned long instead.

Fixes a regression added in commit c4fe244857 ("sparc: fix PCI device
proc file mmap(2)").

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16 21:35:53 -07:00
Johannes Weiner 84235de394 fs: buffer: move allocation failure loop into the allocator
Buffer allocation has a very crude indefinite loop around waking the
flusher threads and performing global NOFS direct reclaim because it can
not handle allocation failures.

The most immediate problem with this is that the allocation may fail due
to a memory cgroup limit, where flushers + direct reclaim might not make
any progress towards resolving the situation at all.  Because unlike the
global case, a memory cgroup may not have any cache at all, only
anonymous pages but no swap.  This situation will lead to a reclaim
livelock with insane IO from waking the flushers and thrashing unrelated
filesystem cache in a tight loop.

Use __GFP_NOFAIL allocations for buffers for now.  This makes sure that
any looping happens in the page allocator, which knows how to
orchestrate kswapd, direct reclaim, and the flushers sensibly.  It also
allows memory cgroups to detect allocations that can't handle failure
and will allow them to ultimately bypass the limit if reclaim can not
make progress.

Reported-by: azurIt <azurit@pobox.sk>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16 21:35:53 -07:00
Cyrill Gorcunov e9cdd6e771 mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages
If a page we are inspecting is in swap we may occasionally report it as
having soft dirty bit (even if it is clean).  The pte_soft_dirty helper
should be called on present pte only.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16 21:35:52 -07:00
Linus Torvalds 0056019da4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull tmpfile fix from Al Viro:
 "A fix for double iput() in ->tmpfile() on ext3 and ext4; I'd fucked it
  up, Miklos has caught it"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ext[34]: fix double put in tmpfile
2013-10-16 17:18:18 -07:00
Geyslan G. Bem 3edc8376c0 ecryptfs: Fix memory leakage in keystore.c
In 'decrypt_pki_encrypted_session_key' function:

Initializes 'payload' pointer and releases it on exit.

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: stable@vger.kernel.org # v2.6.28+
2013-10-16 15:18:01 -07:00
Miklos Szeredi 43ae9e3fc7 ext[34]: fix double put in tmpfile
d_tmpfile() already swallowed the inode ref.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-15 12:14:06 -04:00
Tim Gardner 0c26606cbe cifs: ntstatus_to_dos_map[] is not terminated
Functions that walk the ntstatus_to_dos_map[] array could
run off the end. For example, ntstatus_to_dos() loops
while ntstatus_to_dos_map[].ntstatus is not 0. Granted,
this is mostly theoretical, but could be used as a DOS attack
if the error code in the SMB header is bogus.

[Might consider adding to stable, as this patch is low risk - Steve]

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-14 12:14:01 -05:00
Linus Torvalds 9d05746e7b vfs: allow O_PATH file descriptors for fstatfs()
Olga reported that file descriptors opened with O_PATH do not work with
fstatfs(), found during further development of ksh93's thread support.

There is no reason to not allow O_PATH file descriptors here (fstatfs is
very much a path operation), so use "fdget_raw()".  See commit
55815f7014 ("vfs: make O_PATH file descriptors usable for 'fstat()'")
for a very similar issue reported for fstat() by the same team.

Reported-and-tested-by: ольга крыжановская <olga.kryzhanovska@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org	# O_PATH introduced in 3.0+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-12 13:12:31 -07:00
Linus Torvalds be5090da4a A bug fix and performance regression fix for ext4.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABCAAGBQJSWZeNAAoJENNvdpvBGATwx6kP/2mVlKlNBXVfGUmLVP3Xb68v
 4JhBzlC3ra3TRqVkw6C6kx4fbdq0cW/mDkecmYg2s+aDnswG/94/+yRdU4kQkyne
 iqN22ZYA7CumZgJvR0Z2ptWksDRpv8H5twgdVbPtad6/2cKmjseUraPo7YZhjDCe
 O9eRCXyVII305soAddZzZUgWOWCSWpdTW5zBitKaGq5x/K//rY9UlPVSuAo+9KPZ
 vyBiKJ1R6fDbtyH7JhCdXydMPKzlAPmyqYBQGLyq2GsRsXDp/VljGci6QN0iuZ5k
 lZsxFg8q0P6/R4Pjr3DDtE0tUbPXEyMxuquh/m4b3pAXRoMMCynyLP2zy7Gc7ec0
 ek2ty+sVG06JjseqigHSmS/a+PdZgDY5xEMKhaK4X38lxRPb7apNktolXxxEt6eU
 OPZsuvma1g+lbkkCdRO5FVwMllb7cuPhuZPGyxZvmP+ON59oT5QOVsDC+55WnHNs
 Ib11PCTN93Mwhrm1YPNWVV+gWG50eLZQYJam6H4mE4knaXnba6htEhYrdNczoFH4
 lcHaJzCDJLnYVRRbKXKdLSSnyz1X9cYJBP9g5ks1iNy7/JreF7WoIAOWvZWCp432
 7NC0IOmV4Q4itiCTcSh85rGlsXU8ZA7wK5HILhp9qZmNkw30OMvihNoWoTFiWTJR
 mVCkm+isBbqMP0nhV5km
 =ZwlW
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bugfixes from Ted Ts'o:
 "A bug fix and performance regression fix for ext4"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix memory leak in xattr
  ext4: fix performance regression in writeback of random writes
2013-10-12 12:55:15 -07:00
Linus Torvalds d64dab903f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "We've got more bug fixes in my for-linus branch:

  One of these fixes another corner of the compression oops from last
  time.  Miao nailed down some problems with concurrent snapshot
  deletion and drive balancing.

  I kept out one of his patches for more testing, but these are all
  stable"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix oops caused by the space balance and dead roots
  Btrfs: insert orphan roots into fs radix tree
  Btrfs: limit delalloc pages outside of find_delalloc_range
  Btrfs: use right root when checking for hash collision
2013-10-12 12:54:24 -07:00
Dave Jones 6e4ea8e33b ext4: fix memory leak in xattr
If we take the 2nd retry path in ext4_expand_extra_isize_ea, we
potentionally return from the function without having freed these
allocations.  If we don't do the return, we over-write the previous
allocation pointers, so we leak either way.

Spotted with Coverity.

[ Fixed by tytso to set is and bs to NULL after freeing these
  pointers, in case in the retry loop we later end up triggering an
  error causing a jump to cleanup, at which point we could have a double
  free bug. -- Ted ]

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable@vger.kernel.org
2013-10-12 14:39:49 -04:00
Miao Xie c00869f1ae Btrfs: fix oops caused by the space balance and dead roots
When doing space balance and subvolume destroy at the same time, we met
the following oops:

kernel BUG at fs/btrfs/relocation.c:2247!
RIP: 0010: [<ffffffffa04cec16>] prepare_to_merge+0x154/0x1f0 [btrfs]
Call Trace:
 [<ffffffffa04b5ab7>] relocate_block_group+0x466/0x4e6 [btrfs]
 [<ffffffffa04b5c7a>] btrfs_relocate_block_group+0x143/0x275 [btrfs]
 [<ffffffffa0495c56>] btrfs_relocate_chunk.isra.27+0x5c/0x5a2 [btrfs]
 [<ffffffffa0459871>] ? btrfs_item_key_to_cpu+0x15/0x31 [btrfs]
 [<ffffffffa048b46a>] ? btrfs_get_token_64+0x7e/0xcd [btrfs]
 [<ffffffffa04a3467>] ? btrfs_tree_read_unlock_blocking+0xb2/0xb7 [btrfs]
 [<ffffffffa049907d>] btrfs_balance+0x9c7/0xb6f [btrfs]
 [<ffffffffa049ef84>] btrfs_ioctl_balance+0x234/0x2ac [btrfs]
 [<ffffffffa04a1e8e>] btrfs_ioctl+0xd87/0x1ef9 [btrfs]
 [<ffffffff81122f53>] ? path_openat+0x234/0x4db
 [<ffffffff813c3b78>] ? __do_page_fault+0x31d/0x391
 [<ffffffff810f8ab6>] ? vma_link+0x74/0x94
 [<ffffffff811250f5>] vfs_ioctl+0x1d/0x39
 [<ffffffff811258c8>] do_vfs_ioctl+0x32d/0x3e2
 [<ffffffff811259d4>] SyS_ioctl+0x57/0x83
 [<ffffffff813c3bfa>] ? do_page_fault+0xe/0x10
 [<ffffffff813c73c2>] system_call_fastpath+0x16/0x1b

It is because we returned the error number if the reference of the root was 0
when doing space relocation. It was not right here, because though the root
was dead(refs == 0), but the space it held still need be relocated, or we
could not remove the block group. So in this case, we should return the root
no matter it is dead or not.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-10 21:31:02 -04:00
Miao Xie 14927d9546 Btrfs: insert orphan roots into fs radix tree
Now we don't drop all the deleted snapshots/subvolumes before the space
balance. It means we have to relocate the space which is held by the dead
snapshots/subvolumes. So we must into them into fs radix tree, or we would
forget to commit the change of them when doing transaction commit, and it
would corrupt the metadata.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-10 21:30:53 -04:00
Josef Bacik 7bf811a595 Btrfs: limit delalloc pages outside of find_delalloc_range
Liu fixed part of this problem and unfortunately I steered him in slightly the
wrong direction and so didn't completely fix the problem.  The problem is we
limit the size of the delalloc range we are looking for to max bytes and then we
try to lock that range.  If we fail to lock the pages in that range we will
shrink the max bytes to a single page and re loop.  However if our first page is
inside of the delalloc range then we will end up limiting the end of the range
to a period before our first page.  This is illustrated below

[0 -------- delalloc range --------- 256mb]
                                  [page]

So find_delalloc_range will return with delalloc_start as 0 and end as 128mb,
and then we will notice that delalloc_start < *start and adjust it up, but not
adjust delalloc_end up, so things go sideways.  To fix this we need to not limit
the max bytes in find_delalloc_range, but in find_lock_delalloc_range and that
way we don't end up with this confusion.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-10 21:27:56 -04:00
Josef Bacik 4871c1588f Btrfs: use right root when checking for hash collision
btrfs_rename was using the root of the old dir instead of the root of the new
dir when checking for a hash collision, so if you tried to move a file into a
subvol it would freak out because it would see the file you are trying to move
in its current root.  This fixes the bug where this would fail

btrfs subvol create test1
btrfs subvol create test2
mv test1 test2.

Thanks to Chris Murphy for catching this,

Cc: stable@vger.kernel.org
Reported-by: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-10 21:27:45 -04:00
Sachin Prabhu dde2356c84 cifs: Allow LANMAN auth method for servers supporting unencapsulated authentication methods
This allows users to use LANMAN authentication on servers which support
unencapsulated authentication.

The patch fixes a regression where users using plaintext authentication
were no longer able to do so because of changed bought in by patch
3f618223dc

https://bugzilla.redhat.com/show_bug.cgi?id=1011621

Reported-by: Panos Kavalagios <Panagiotis.Kavalagios@eurodyn.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-07 09:57:11 -05:00
Jan Klos 2f6c947963 cifs: Fix inability to write files >2GB to SMB2/3 shares
When connecting to SMB2/3 shares, maximum file size is set to non-LFS maximum in superblock. This is due to cap_large_files bit being different for SMB1 and SMB2/3 (where it is just an internal flag that is not negotiated and the SMB1 one corresponds to multichannel capability, so maybe LFS works correctly if server sends 0x08 flag) while capabilities are checked always for the SMB1 bit in cifs_read_super().

The patch fixes this by checking for the correct bit according to the protocol version.

CC: Stable <stable@kernel.org>
Signed-off-by: Jan Klos <honza.klos@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-07 09:54:45 -05:00
Shirish Pargaonkar eb4c7df6c2 cifs: Avoid umount hangs with smb2 when server is unresponsive
Do not send SMB2 Logoff command when reconnecting, the way smb1
code base works.

Also, no need to wait for a credit for an echo command when one is already
in flight.

Without these changes, umount command hangs if the server is unresponsive
e.g. hibernating.

Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@us.ibm.com>
2013-10-06 20:18:42 -05:00
Steve French c31f330719 do not treat non-symlink reparse points as valid symlinks
Windows 8 and later can create NFS symlinks (within reparse points)
which we were assuming were normal NTFS symlinks and thus reporting
corrupt paths for.  Add check for reparse points to make sure that
they really are normal symlinks before we try to parse the pathname.

We also should not be parsing other types of reparse points (DFS
junctions etc) as if they were a  symlink so return EOPNOTSUPP
on those.  Also fix endian errors (we were not parsing symlink
lengths as little endian).

This fixes commit d244bf2dfb
which implemented follow link for non-Unix CIFS mounts

CC: Stable <stable@kernel.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-05 21:54:18 -05:00
Linus Torvalds e62063d699 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "This is a small collection of fixes, including a regression fix from
  Liu Bo that solves rare crashes with compression on.

  I've merged my for-linus up to 3.12-rc3 because the top commit is only
  meant for 3.12.  The rest of the fixes are also available in my master
  branch on top of my last 3.11 based pull"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: Fix crash due to not allocating integrity data for a bioset
  Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing
  Btrfs: eliminate races in worker stopping code
  Btrfs: fix crash of compressed writes
  Btrfs: fix transid verify errors when recovering log tree
2013-10-05 12:17:24 -07:00
Darrick J. Wong b208c2f7ce btrfs: Fix crash due to not allocating integrity data for a bioset
When btrfs creates a bioset, we must also allocate the integrity data pool.
Otherwise btrfs will crash when it tries to submit a bio to a checksumming
disk:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
 IP: [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
 PGD 2305e4067 PUD 23063d067 PMD 0
 Oops: 0000 [#1] PREEMPT SMP
 Modules linked in: btrfs scsi_debug xfs ext4 jbd2 ext3 jbd mbcache
sch_fq_codel eeprom lpc_ich mfd_core nfsd exportfs auth_rpcgss af_packet
raid6_pq xor zlib_deflate libcrc32c [last unloaded: scsi_debug]
 CPU: 1 PID: 4486 Comm: mount Not tainted 3.12.0-rc1-mcsum #2
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 task: ffff8802451c9720 ti: ffff880230698000 task.ti: ffff880230698000
 RIP: 0010:[<ffffffff8111e28a>]  [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
 RSP: 0018:ffff880230699688  EFLAGS: 00010286
 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000005f8445
 RDX: 0000000000000001 RSI: 0000000000000010 RDI: 0000000000000000
 RBP: ffff8802306996f8 R08: 0000000000011200 R09: 0000000000000008
 R10: 0000000000000020 R11: ffff88009d6e8000 R12: 0000000000011210
 R13: 0000000000000030 R14: ffff8802306996b8 R15: ffff8802451c9720
 FS:  00007f25b8a16800(0000) GS:ffff88024fc80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000018 CR3: 0000000230576000 CR4: 00000000000007e0
 Stack:
  ffff8802451c9720 0000000000000002 ffffffff81a97100 0000000000281250
  ffffffff81a96480 ffff88024fc99150 ffff880228d18200 0000000000000000
  0000000000000000 0000000000000040 ffff880230e8c2e8 ffff8802459dc900
 Call Trace:
  [<ffffffff811b2208>] bio_integrity_alloc+0x48/0x1b0
  [<ffffffff811b26fc>] bio_integrity_prep+0xac/0x360
  [<ffffffff8111e298>] ? mempool_alloc+0x58/0x150
  [<ffffffffa03e8041>] ? alloc_extent_state+0x31/0x110 [btrfs]
  [<ffffffff81241579>] blk_queue_bio+0x1c9/0x460
  [<ffffffff8123e58a>] generic_make_request+0xca/0x100
  [<ffffffff8123e639>] submit_bio+0x79/0x160
  [<ffffffffa03f865e>] btrfs_map_bio+0x48e/0x5b0 [btrfs]
  [<ffffffffa03c821a>] btree_submit_bio_hook+0xda/0x110 [btrfs]
  [<ffffffffa03e7eba>] submit_one_bio+0x6a/0xa0 [btrfs]
  [<ffffffffa03ef450>] read_extent_buffer_pages+0x250/0x310 [btrfs]
  [<ffffffff8125eef6>] ? __radix_tree_preload+0x66/0xf0
  [<ffffffff8125f1c5>] ? radix_tree_insert+0x95/0x260
  [<ffffffffa03c66f6>] btree_read_extent_buffer_pages.constprop.128+0xb6/0x120
[btrfs]
  [<ffffffffa03c8c1a>] read_tree_block+0x3a/0x60 [btrfs]
  [<ffffffffa03caefd>] open_ctree+0x139d/0x2030 [btrfs]
  [<ffffffffa03a282a>] btrfs_mount+0x53a/0x7d0 [btrfs]
  [<ffffffff8113ab0b>] ? pcpu_alloc+0x8eb/0x9f0
  [<ffffffff81167305>] ? __kmalloc_track_caller+0x35/0x1e0
  [<ffffffff81176ba0>] mount_fs+0x20/0xd0
  [<ffffffff81191096>] vfs_kern_mount+0x76/0x120
  [<ffffffff81193320>] do_mount+0x200/0xa40
  [<ffffffff81135cdb>] ? strndup_user+0x5b/0x80
  [<ffffffff81193bf0>] SyS_mount+0x90/0xe0
  [<ffffffff8156d31d>] system_call_fastpath+0x1a/0x1f
 Code: 4c 8d 75 a8 4c 89 6d e8 45 89 e0 4c 8d 6f 30 48 89 5d d8 41 83 e0 af 48
89 fb 49 83 c6 18 4c 89 7d f8 65 4c 8b 3c 25 c0 b8 00 00 <48> 8b 73 18 44 89 c7
44 89 45 98 ff 53 20 48 85 c0 48 89 c2 74
 RIP  [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
  RSP <ffff880230699688>
 CR2: 0000000000000018
 ---[ end trace 7a96042017ed21e2 ]---

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-05 10:52:10 -04:00
Chris Mason 1329dfc8bb Merge branch 'for-linus' into for-linus-3.12 2013-10-05 10:51:32 -04:00
Linus Torvalds a5c984cc29 Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "Small set of cifs fixes.  Most important is Jeff's fix that works
  around disconnection problems which can be caused by simultaneous use
  of user space tools (starting a long running smbclient backup then
  doing a cifs kernel mount) or multiple cifs mounts through a NAT, and
  Jim's fix to deal with reexport of cifs share.

  I expect to send two more cifs fixes next week (being tested now) -
  fixes to address an SMB2 unmount hang when server dies and a fix for
  cifs symlink handling of Windows "NFS" symlinks"

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
  [CIFS] update cifs.ko version
  [CIFS] Remove ext2 flags that have been moved to fs.h
  [CIFS] Provide sane values for nlink
  cifs: stop trying to use virtual circuits
  CIFS: FS-Cache: Uncache unread pages in cifs_readpages() before freeing them
2013-10-04 20:50:16 -07:00
Linus Torvalds 3dbecf0aa9 xfs: bugfixes for 3.12-rc4
- lockdep fix for project quotas
 - fix for dirent dtype support on v4 filesystems
 - fix for a memory leak in recovery
 - fix for build failure due to the recovery fix
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJSTw/vAAoJENaLyazVq6ZOo2EP/RjhwaDqDZHB5bm/axZrtxP6
 g31TGvJ+nCUT6JjYX2wnoFuJDT2SDcs5+2gtjk1DRLb3JRQI2uJ+MtHLjDIZJSvE
 sMAADOgWvTuzx3TsnR4U0MM1/XVnv99k1vinedD6mGq16QtT0OWYsA9AKkMKWd1o
 OiTGyX4AMCNtfAZkiH9+OR8+BqH1xEEzv28H/Bf7yLSsQHM+v9uKPC5+f7I8bWvB
 YK8fAxeGmiAfDGR4tQ+tQVoIj3qrJmPyj45ElwAvGCKbOh0LG4/N+dwaCQme0teW
 xFfXMF+C/94qDom3z0gYAWzSOixgTFmy6gxt+3Mqw7uZ/dNzO+KeKE5Fm8cG11yD
 y3vxqwav/fLHv1fRUvl5abrAzl5VU8nRAbeQqZBM0xjzgfilMp5Jk2Jvix8OHcO5
 edmb7+CkkGdiYD15cSUl2242qKaukB3K1vrHoOlFte42vxELmcHWBRBxuZe8rgV1
 czf2xCHkWWjdwUrFeZoxVSEFydfoGIW0clAz8tHPQpVyvnSjRTuugJ8wuN92NyNF
 xGS5er0lyCqlBCBVCOZX/xTcwSQZ4UNG8qgdzDT26VN1VpTFeaaJlMRwD2GhYMYk
 8eYX3Ie/XdECLn5ZaG4xWEJHLarXLcqUI6eMobjkVs+qt/FQl/PzH76qOcZWKKbf
 kEOhPA1Gh97SZ66+vqaw
 =eNZa
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfs

Pull xfs bugfixes from Ben Myers:
 "There are lockdep annotations for project quotas, a fix for dirent
  dtype support on v4 filesystems, a fix for a memory leak in recovery,
  and a fix for the build error that resulted from it.  D'oh"

* tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfs:
  xfs: Use kmem_free() instead of free()
  xfs: fix memory leak in xlog_recover_add_to_trans
  xfs: dirent dtype presence is dependent on directory magic numbers
  xfs: lockdep needs to know about 3 dquot-deep nesting
2013-10-04 14:47:22 -07:00
Ilya Dryomov 1357272fc7 Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing
free_device rcu callback, scheduled from btrfs_rm_dev_replace_srcdev,
can be processed before btrfs_scratch_superblock is called, which would
result in a use-after-free on btrfs_device contents.  Fix this by
zeroing the superblock before the rcu callback is registered.

Cc: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-10-04 16:02:14 -04:00
Ilya Dryomov 964fb15acf Btrfs: eliminate races in worker stopping code
The current implementation of worker threads in Btrfs has races in
worker stopping code, which cause all kinds of panics and lockups when
running btrfs/011 xfstest in a loop.  The problem is that
btrfs_stop_workers is unsynchronized with respect to check_idle_worker,
check_busy_worker and __btrfs_start_workers.

E.g., check_idle_worker race flow:

       btrfs_stop_workers():            check_idle_worker(aworker):
- grabs the lock
- splices the idle list into the
  working list
- removes the first worker from the
  working list
- releases the lock to wait for
  its kthread's completion
                                  - grabs the lock
                                  - if aworker is on the working list,
                                    moves aworker from the working list
                                    to the idle list
                                  - releases the lock
- grabs the lock
- puts the worker
- removes the second worker from the
  working list
                              ......
        btrfs_stop_workers returns, aworker is on the idle list
                 FS is umounted, memory is freed
                              ......
              aworker is waken up, fireworks ensue

With this applied, I wasn't able to trigger the problem in 48 hours,
whereas previously I could reliably reproduce at least one of these
races within an hour.

Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-10-04 16:02:13 -04:00
Liu Bo 385fe0bede Btrfs: fix crash of compressed writes
The crash[1] is found by xfstests/generic/208 with "-o compress",
it's not reproduced everytime, but it does panic.

The bug is quite interesting, it's actually introduced by a recent commit
(573aecafca,
Btrfs: actually limit the size of delalloc range).

Btrfs implements delay allocation, so during writeback, we
(1) get a page A and lock it
(2) search the state tree for delalloc bytes and lock all pages within the range
(3) process the delalloc range, including find disk space and create
    ordered extent and so on.
(4) submit the page A.

It runs well in normal cases, but if we're in a racy case, eg.
buffered compressed writes and aio-dio writes,
sometimes we may fail to lock all pages in the 'delalloc' range,
in which case, we need to fall back to search the state tree again with
a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset).

The mentioned commit has a side effect, that is, in the fallback case,
we can find delalloc bytes before the index of the page we already have locked,
so we're in the case of (delalloc_end <= *start) and return with (found > 0).

This ends with not locking delalloc pages but making ->writepage still
process them, and the crash happens.

This fixes it by just thinking that we find nothing and returning to caller
as the caller knows how to deal with it properly.

[1]:
------------[ cut here ]------------
kernel BUG at mm/page-writeback.c:2170!
[...]
CPU: 2 PID: 11755 Comm: btrfs-delalloc- Tainted: G           O 3.11.0+ #8
[...]
RIP: 0010:[<ffffffff810f5093>]  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
[...]
[ 4934.248731] Stack:
[ 4934.248731]  ffff8801477e5dc8 ffffea00049b9f00 ffff8801869f9ce8 ffffffffa02b841a
[ 4934.248731]  0000000000000000 0000000000000000 0000000000000fff 0000000000000620
[ 4934.248731]  ffff88018db59c78 ffffea0005da8d40 ffffffffa02ff860 00000001810016c0
[ 4934.248731] Call Trace:
[ 4934.248731]  [<ffffffffa02b841a>] extent_range_clear_dirty_for_io+0xcf/0xf5 [btrfs]
[ 4934.248731]  [<ffffffffa02a8889>] compress_file_range+0x1dc/0x4cb [btrfs]
[ 4934.248731]  [<ffffffff8104f7af>] ? detach_if_pending+0x22/0x4b
[ 4934.248731]  [<ffffffffa02a8bad>] async_cow_start+0x35/0x53 [btrfs]
[ 4934.248731]  [<ffffffffa02c694b>] worker_loop+0x14b/0x48c [btrfs]
[ 4934.248731]  [<ffffffffa02c6800>] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
[ 4934.248731]  [<ffffffff810608f5>] kthread+0x8d/0x95
[ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
[ 4934.248731]  [<ffffffff814fe09c>] ret_from_fork+0x7c/0xb0
[ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
[ 4934.248731] Code: ff 85 c0 0f 94 c0 0f b6 c0 59 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 2c de 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 52 49 8b 84 24 80 00 00 00 f6 40 20 01 75 44
[ 4934.248731] RIP  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
[ 4934.248731]  RSP <ffff8801869f9c48>
[ 4934.280307] ---[ end trace 36f06d3f8750236a ]---

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-10-04 16:02:11 -04:00