1
0
Fork 0
Commit Graph

33166 Commits (65aafb1e7484b7434a0c1d4c593191ebe5776a2f)

Author SHA1 Message Date
Al Viro dfc59e2c90 exportfs: don't assume that ->iterate() won't feed us too long entries
On some filesystems it's impossible even with fs corruption, but we'd
better not rely on that, what with memcpy() into on-stack array we
are doing there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-07 19:54:55 -04:00
Al Viro 5d8943b04b afs: get rid of redundant ->d_name.len checks
No dentry can get to directory modification methods without
having passed either ->lookup() or ->atomic_open(); if name is
rejected by those two (or by ->d_hash()) with an error, it won't
be seen by anything else.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-07 19:54:55 -04:00
Weston Andros Adamson b1b3e13694 NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity
Commit 97431204ea introduced a regression
that causes SECINFO_NO_NAME to fail without sending an RPC if:

 1) the nfs_client's rpc_client is using krb5i/p (now tried by default)
 2) the current user doesn't have valid kerberos credentials

This situation is quite common - as of now a sec=sys mount would use
krb5i for the nfs_client's rpc_client and a user would hardly be faulted
for not having run kinit.

The solution is to use the machine cred when trying to use an integrity
protected auth flavor for SECINFO_NO_NAME.

Older servers may not support using the machine cred or an integrity
protected auth flavor for SECINFO_NO_NAME in every circumstance, so we fall
back to using the user's cred and the filesystem's auth flavor in this case.

We run into another problem when running against linux nfs servers -
they return NFS4ERR_WRONGSEC when using integrity auth flavor (unless the
mount is also that flavor) even though that is not a valid error for
SECINFO*.  Even though it's against spec, handle WRONGSEC errors on
SECINFO_NO_NAME by falling back to using the user cred and the
filesystem's auth flavor.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 18:39:25 -04:00
Trond Myklebust 0aea92bf67 NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set
Also don't worry about obsolete mount flags...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 18:38:04 -04:00
Trond Myklebust 47040da3c7 NFSv4: Allow security autonegotiation for submounts
In cases where the parent super block was not mounted with a 'sec=' line,
allow autonegotiation of security for the submounts.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 17:52:42 -04:00
Trond Myklebust 41d058c3ba NFSv4: Disallow security negotiation for lookups when 'sec=' is specified
Ensure that nfs4_proc_lookup_common respects the NFS_MOUNT_SECFLAVOUR
flag.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 17:52:42 -04:00
Linus Torvalds dc0755cdb1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 2 (of many) from Al Viro:
 "Mostly Miklos' series this time"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  constify dcache.c inlined helpers where possible
  fuse: drop dentry on failed revalidate
  fuse: clean up return in fuse_dentry_revalidate()
  fuse: use d_materialise_unique()
  sysfs: use check_submounts_and_drop()
  nfs: use check_submounts_and_drop()
  gfs2: use check_submounts_and_drop()
  afs: use check_submounts_and_drop()
  vfs: check unlinked ancestors before mount
  vfs: check submounts and drop atomically
  vfs: add d_walk()
  vfs: restructure d_genocide()
2013-09-07 14:36:57 -07:00
Linus Torvalds c7c4591db6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace changes from Eric Biederman:
 "This is an assorted mishmash of small cleanups, enhancements and bug
  fixes.

  The major theme is user namespace mount restrictions.  nsown_capable
  is killed as it encourages not thinking about details that need to be
  considered.  A very hard to hit pid namespace exiting bug was finally
  tracked and fixed.  A couple of cleanups to the basic namespace
  infrastructure.

  Finally there is an enhancement that makes per user namespace
  capabilities usable as capabilities, and an enhancement that allows
  the per userns root to nice other processes in the user namespace"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  userns:  Kill nsown_capable it makes the wrong thing easy
  capabilities: allow nice if we are privileged
  pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
  userns: Allow PR_CAPBSET_DROP in a user namespace.
  namespaces: Simplify copy_namespaces so it is clear what is going on.
  pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
  sysfs: Restrict mounting sysfs
  userns: Better restrictions on when proc and sysfs can be mounted
  vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces
  kernel/nsproxy.c: Improving a snippet of code.
  proc: Restrict mounting the proc filesystem
  vfs: Lock in place mounts from more privileged users
2013-09-07 14:35:32 -07:00
Linus Torvalds 11c7b03d42 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Nothing major for this kernel, just maintenance updates"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (21 commits)
  apparmor: add the ability to report a sha1 hash of loaded policy
  apparmor: export set of capabilities supported by the apparmor module
  apparmor: add the profile introspection file to interface
  apparmor: add an optional profile attachment string for profiles
  apparmor: add interface files for profiles and namespaces
  apparmor: allow setting any profile into the unconfined state
  apparmor: make free_profile available outside of policy.c
  apparmor: rework namespace free path
  apparmor: update how unconfined is handled
  apparmor: change how profile replacement update is done
  apparmor: convert profile lists to RCU based locking
  apparmor: provide base for multiple profiles to be replaced at once
  apparmor: add a features/policy dir to interface
  apparmor: enable users to query whether apparmor is enabled
  apparmor: remove minimum size check for vmalloc()
  Smack: parse multiple rules per write to load2, up to PAGE_SIZE-1 bytes
  Smack: network label match fix
  security: smack: add a hash table to quicken smk_find_entry()
  security: smack: fix memleak in smk_write_rules_list()
  xattr: Constify ->name member of "struct xattr".
  ...
2013-09-07 14:34:07 -07:00
Trond Myklebust 5e6b19901b NFSv4: Fix security auto-negotiation
NFSv4 security auto-negotiation has been broken since
commit 4580a92d44 (NFS:
Use server-recommended security flavor by default (NFSv3))
because nfs4_try_mount() will automatically select AUTH_SYS
if it sees no auth flavours.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
2013-09-07 16:18:30 -04:00
Trond Myklebust 19e7b8d240 NFS: Clean up nfs_parse_security_flavors()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 16:12:45 -04:00
Trond Myklebust 74c9881162 NFS: Clean up the auth flavour array mess
What is the point of having a 'auth_flavor_len' field, if it is
always set to 1, and can't be used to determine if the user has
selected an auth flavour?
This cleanup goes back to using auth_flavor_len for its original
intended purpose, and gets rid of the ad-hoc replacements.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 16:11:24 -04:00
Richard Weinberger 65984ff9d2 um: hostfs: Fix writeback
We have to implement ->release() and trigger writeback from it.
Otherwise we might lose dirty pages at munmap().

Signed-off-by: Richard Weinberger <richard@nod.at>
2013-09-07 10:38:29 +02:00
Yan, Zheng a8d436f015 ceph: use d_invalidate() to invalidate aliases
d_invalidate() is the standard VFS method to invalidate dentry.
compare to d_delete(), it also try shrinking children dentries.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-09-06 12:55:29 -07:00
Yan, Zheng ed284c49f6 ceph: remove ceph_lookup_inode()
commit 6f60f889 (ceph: fix freeing inode vs removing session caps race)
introduced ceph_lookup_inode(). But there is already a ceph_find_inode()
which provides similar function. So remove ceph_lookup_inode(), use
ceph_find_inode() instead.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Alex Elder <alex.elder@linary.org>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-09-06 12:55:09 -07:00
Andy Adamson 0e20162ed1 NFSv4.1 Use MDS auth flavor for data server connection
Commit 4edaa308 "NFS: Use "krb5i" to establish NFSv4 state whenever possible"
uses the nfs_client cl_rpcclient for all state management operations, and
will use krb5i or auth_sys with no regard to the mount command authflavor
choice.

The MDS, as any NFSv4.1 mount point, uses the nfs_server rpc client for all
non-state management operations with a different nfs_server for each fsid
encountered traversing the mount point, each with a potentially different
auth flavor.

pNFS data servers are not mounted in the normal sense as there is no associated
nfs_server structure. Data servers can also export multiple fsids, each with
a potentially different auth flavor.

Data servers need to use the same authflavor as the MDS server rpc client for
non-state management operations. Populate a list of rpc clients with the MDS
server rpc client auth flavor for the DS to use.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-06 14:49:16 -04:00
Milosz Tanski 971f0bdeaa ceph: trivial buildbot warnings fix
The linux-next build bot found a three of warnings, this addresses all of them.

 * non-ANSI function declaration of function 'ceph_fscache_register' and
   'ceph_fscache_unregister'
 * symbol 'ceph_cache_netfs' was not declared, now it's extern in the header.
 * warning: "pr_fmt" redefined

Signed-off-by: Milosz Tanski <milosz@adfin.com>
2013-09-06 16:50:12 +00:00
Milosz Tanski e81568eb18 ceph: Do not do invalidate if the filesystem is mounted nofsc
Previously we would always try to enqueue work even if the filesystem is not
mounted with fscache enabled (or the file has no cookie). In the case of the
filesystem mouned nofsc (but with fscache compiled in) this would lead to a
crash.

Signed-off-by: Milosz Tanski <milosz@adfin.com>
2013-09-06 16:50:12 +00:00
Milosz Tanski d4d3aa38d6 ceph: page still marked private_2
Previous patch that allowed us to cleanup most of the issues with pages marked
as private_2 when calling ceph_readpages. However, there seams to be a case in
the error case clean up in start read that still trigers this from time to
time. I've only seen this one a couple times.

BUG: Bad page state in process petabucket  pfn:335b82
page:ffffea000cd6e080 count:0 mapcount:0 mapping:          (null) index:0x0
page flags: 0x200000000001000(private_2)
Call Trace:
 [<ffffffff81563442>] dump_stack+0x46/0x58
 [<ffffffff8112c7f7>] bad_page+0xc7/0x120
 [<ffffffff8112cd9e>] free_pages_prepare+0x10e/0x120
 [<ffffffff8112e580>] free_hot_cold_page+0x40/0x160
 [<ffffffff81132427>] __put_single_page+0x27/0x30
 [<ffffffff81132d95>] put_page+0x25/0x40
 [<ffffffffa02cb409>] ceph_readpages+0x2e9/0x6f0 [ceph]
 [<ffffffff811313cf>] __do_page_cache_readahead+0x1af/0x260

Signed-off-by: Milosz Tanski <milosz@adfin.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2013-09-06 16:50:12 +00:00
Milosz Tanski 9b8dd1e8a5 ceph: ceph_readpage_to_fscache didn't check if marked
Previously ceph_readpage_to_fscache did not call if page was marked as cached
before calling fscache_write_page resulting in a BUG inside of fscache.

FS-Cache: Assertion failed
------------[ cut here ]------------
kernel BUG at fs/fscache/page.c:874!
invalid opcode: 0000 [#1] SMP
Call Trace:
 [<ffffffffa02e6566>] __ceph_readpage_to_fscache+0x66/0x80 [ceph]
 [<ffffffffa02caf84>] readpage_nounlock+0x124/0x210 [ceph]
 [<ffffffffa02cb08d>] ceph_readpage+0x1d/0x40 [ceph]
 [<ffffffff81126db6>] generic_file_aio_read+0x1f6/0x700
 [<ffffffffa02c6fcc>] ceph_aio_read+0x5fc/0xab0 [ceph]

Signed-off-by: Milosz Tanski <milosz@adfin.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2013-09-06 16:50:12 +00:00
Milosz Tanski 76be778b3a ceph: clean PgPrivate2 on returning from readpages
In some cases the ceph readapages code code bails without filling all the pages
already marked by fscache. When we return back to readahead code this causes
a BUG.

Signed-off-by: Milosz Tanski <milosz@adfin.com>
2013-09-06 16:50:11 +00:00
Milosz Tanski 99ccbd229c ceph: use fscache as a local presisent cache
Adding support for fscache to the Ceph filesystem. This would bring it to on
par with some of the other network filesystems in Linux (like NFS, AFS, etc...)

In order to mount the filesystem with fscache the 'fsc' mount option must be
passed.

Signed-off-by: Milosz Tanski <milosz@adfin.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2013-09-06 16:50:11 +00:00
Milosz Tanski cd0a2df681 Patches for Ceph FS-Cache support
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIVAwUAUimQLxOxKuMESys7AQJc+Q/+N3BN+ZWRhfqSKANFyEuXIsUmzueCmmYc
 ZOgdRGTorlYKefqFHTFOHLvPbbxXaT2/HKUhq38yuA8UuIkoPL3rRQpjNcvWR+TX
 s/H17MRqPeZOfaDC/p9y2uL5MbuUvzhlCZ/GTi5w2ZwiNuWBo10gxeyCrXQSOFtH
 YFq0dVuG0AXzWdZuWcM3MtgY0llcMmfnZpIDjF4JDXJidgXY+wtjNUF3ByYZ+33+
 +CmzXrnCaY+3N44Ji2Dn+ci8tym8uht4dnbTZFkQ0I6B+k93V2RkZeHWnDQWqW+c
 THjyG9c+LSf0m8FIh43DNNJSkywbh5dxsBgnqxhQTJMij0dV1ne8wjKptJMgz+0b
 HFUi4rE6oRQtbLdTtJhdjdFFORBGdFj71gW8foBdFAZTP4Amf/fbiAfHeK+33oDt
 s5PMJyfA3BDM90eBFoxDWjCEe+o6YcccHC0SVWM1ZJPQ/U0hXL99O6NSHBrr9iNP
 spBgM+fNTgtUMf6P6MwjJfTQHov5xevBNaLB3boUPAI6/yK8KQt9xoevJ7t902uQ
 /19bXoNgMmwAti1Gd6T5UnWlAHsOWdnIASUu8LVqEoh2PY42T2I4NTWhgs4NFntu
 91MYysF93sTx0sMvGvWdCUSl7zMMeKXCUSUnPvMld+BMqyuK3XzsO3xkMn97t2/U
 p6kQZXZDlwU=
 =s35L
 -----END PGP SIGNATURE-----

Merge tag 'fscache-fixes-for-ceph' into wip-fscache

Patches for Ceph FS-Cache support
2013-09-06 16:41:20 +00:00
Linus Torvalds 2e515bf096 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "The usual trivial updates all over the tree -- mostly typo fixes and
  documentation updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits)
  doc: Documentation/cputopology.txt fix typo
  treewide: Convert retrun typos to return
  Fix comment typo for init_cma_reserved_pageblock
  Documentation/trace: Correcting and extending tracepoint documentation
  mm/hotplug: fix a typo in Documentation/memory-hotplug.txt
  power: Documentation: Update s2ram link
  doc: fix a typo in Documentation/00-INDEX
  Documentation/printk-formats.txt: No casts needed for u64/s64
  doc: Fix typo "is is" in Documentations
  treewide: Fix printks with 0x%#
  zram: doc fixes
  Documentation/kmemcheck: update kmemcheck documentation
  doc: documentation/hwspinlock.txt fix typo
  PM / Hibernate: add section for resume options
  doc: filesystems : Fix typo in Documentations/filesystems
  scsi/megaraid fixed several typos in comments
  ppc: init_32: Fix error typo "CONFIG_START_KERNEL"
  treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks
  page_isolation: Fix a comment typo in test_pages_isolated()
  doc: fix a typo about irq affinity
  ...
2013-09-06 09:36:28 -07:00
Linus Torvalds ec0ad73080 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3, reiserfs, udf & isofs fixes from Jan Kara:
 "The contains a bunch of ext3 cleanups and minor improvements, major
  reiserfs locking changes which should hopefully fix deadlocks
  introduced by BKL removal, and udf/isofs changes to refuse mounting fs
  rw instead of mounting it ro automatically which makes eject button
  work as expected for all media (see the changelog for why userspace
  should be ok with this change)"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  jbd: use a single printk for jbd_debug()
  reiserfs: locking, release lock around quota operations
  reiserfs: locking, handle nested locks properly
  reiserfs: locking, push write lock out of xattr code
  jbd: relocate assert after state lock in journal_commit_transaction()
  udf: Refuse RW mount of the filesystem instead of making it RO
  udf: Standardize return values in mount sequence
  isofs: Refuse RW mount of the filesystem instead of making it RO
  ext3: allow specifying external journal by pathname mount option
  jbd: remove unneeded semicolon
2013-09-06 09:06:02 -07:00
Linus Torvalds eb97a784f0 f2fs updates for v3.12
This patch-set includes the following major enhancement patches.
 o support inline xattrs
 o add sysfs support to control GCs explicitly
 o add proc entry to show the current segment usage information
 o improve the GC/SSR performance
 
 The other bug fixes are as follows.
 o avoid the overflow on status calculation
 o fix some error handling routines
 o fix inconsistent xattr states after power-off-recovery
 o fix incorrect xattr node offset definition
 o fix deadlock condition in fsync
 o fix the fdatasync routine for power-off-recovery
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJSKDoaAAoJEEAUqH6CSFDSovoQAJSWnvRfeu4olkKe7LblVXA5
 NFYsjtdtnWsmSY1kq2j541SLo8Kw2UibozbrN6BaJ9MOKnTz1+x0R9U0vpewmCO4
 FkxlGX/3i3k/4tR0AvD4U56xgqh+IhYi18nBN8kOTwhLqjFtx5JFKAHBnGwjbB4T
 YpEaitNY6dL8l+DUxs11KnPmNazbck6iNGOYXpvfhTS4DNSJTT0L/fLqugDhFJNI
 7e3f6vVORRwC5UdtJk6B6HXxv1pHv4uGeLki0W4jgGp7AdxpawbfeDrDcrECjoc+
 0s/QQTsjoeIKeCfojSEgLGSSl8PZpx2VVCxri+nMPjLzY81QUXbpsAlhB2RW9Uz/
 E9ESAPpzL9ykh35THALic7N0ATXGlepnu0EGU6+fjWGUIyHeV+2yoswz599VliRO
 GunHgwrfNMyXWHw9zw6SPIJvN3caPn3wlDhffei9wOl92YkleBuHA7ojIzfRc2vz
 YQ7jKmZNZ/CM2qiw350XIfSaa+3iszlxwoWK7DLWQZm3um0MpYme9RmadnPvxsRM
 gnUYiovPwR+om3zAnURMvq/LNKi6NjflRgu2OAU/0CpJMEX9vaVe/xOKdjCs19je
 dxinQGuOS5P+J141SkM3jJ1eyZLC4zCyp42xQSZ1+Zg+6BU4PVY3+i4hCQFopHDt
 fyeav8SM/fk9HrKU3npq
 =YMk3
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This patch-set includes the following major enhancement patches:
   - support inline xattrs
   - add sysfs support to control GCs explicitly
   - add proc entry to show the current segment usage information
   - improve the GC/SSR performance

  The other bug fixes are as follows:
   - avoid the overflow on status calculation
   - fix some error handling routines
   - fix inconsistent xattr states after power-off-recovery
   - fix incorrect xattr node offset definition
   - fix deadlock condition in fsync
   - fix the fdatasync routine for power-off-recovery"

* tag 'for-f2fs-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
  f2fs: optimize gc for better performance
  f2fs: merge more bios of node block writes
  f2fs: avoid an overflow during utilization calculation
  f2fs: trigger GC when there are prefree segments
  f2fs: use strncasecmp() simplify the string comparison
  f2fs: fix omitting to update inode page
  f2fs: support the inline xattrs
  f2fs: add the truncate_xattr_node function
  f2fs: introduce __find_xattr for readability
  f2fs: reserve the xattr space dynamically
  f2fs: add flags for inline xattrs
  f2fs: fix error return code in init_f2fs_fs()
  f2fs: fix wrong BUG_ON condition
  f2fs: fix memory leak when init f2fs filesystem fail
  f2fs: fix a compound statement label error
  f2fs: avoid writing inode redundantly when creating a file
  f2fs: alloc_page() doesn't return an ERR_PTR
  f2fs: should cover i_xattr_nid with its xattr node page lock
  f2fs: check the free space first in new_node_page
  f2fs: clean up the needless end 'return' of void function
  ...
2013-09-06 09:04:34 -07:00
Trond Myklebust 4109bb7496 NFS: Don't check lock owner compatability unless file is locked (part 2)
When coalescing requests into a single READ or WRITE RPC call, and there
is no file locking involved, we don't have to refuse coalescing for
requests where the lock owner information doesn't match.

Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-06 11:27:41 -04:00
Milosz Tanski 5a6f282a20 fscache: Netfs function for cleanup post readpages
Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
inside the aops readpages callback.  It marks all the pages in the list
provided by readahead with PG_private_2.  In the cases that the netfs fails to
read all the pages (which is legal) it ends up returning to the readahead and
triggering a BUG.  This happens because the page list still contains marked
pages.

This patch implements a simple fscache_readpages_cancel function that the netfs
should call before returning from readpages.  It will revoke the pages from the
underlying cache backend and unmark them.

The problem was originally worked out in the Ceph devel tree, but it also
occurs in CIFS.  It appears that NFS, AFS and 9P are okay as read_cache_pages()
will clean up the unprocessed pages in the case of an error.

This can be used to address the following oops:

[12410647.597278] BUG: Bad page state in process petabucket  pfn:3d504e
[12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
	(null) index:0x0
[12410647.597298] page flags: 0x200000000001000(private_2)

...

[12410647.597334] Call Trace:
[12410647.597345]  [<ffffffff815523f2>] dump_stack+0x19/0x1b
[12410647.597356]  [<ffffffff8111def7>] bad_page+0xc7/0x120
[12410647.597359]  [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120
[12410647.597361]  [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170
[12410647.597363]  [<ffffffff81123507>] __put_single_page+0x27/0x30
[12410647.597365]  [<ffffffff81123df5>] put_page+0x25/0x40
[12410647.597376]  [<ffffffffa02bdcf9>] ceph_readpages+0x2e9/0x6e0 [ceph]
[12410647.597379]  [<ffffffff81122a8f>] __do_page_cache_readahead+0x1af/0x260
[12410647.597382]  [<ffffffff81122ea1>] ra_submit+0x21/0x30
[12410647.597384]  [<ffffffff81118f64>] filemap_fault+0x254/0x490
[12410647.597387]  [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0
[12410647.597391]  [<ffffffff810125bd>] ? __switch_to+0x16d/0x4a0
[12410647.597395]  [<ffffffff810865ba>] ? finish_task_switch+0x5a/0xc0
[12410647.597398]  [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930
[12410647.597401]  [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110
[12410647.597403]  [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10
[12410647.597405]  [<ffffffff81005469>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[12410647.597407]  [<ffffffff8113f361>] handle_mm_fault+0x251/0x370
[12410647.597411]  [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30
[12410647.597414]  [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550
[12410647.597418]  [<ffffffff8108011d>] ? up_write+0x1d/0x20
[12410647.597422]  [<ffffffff8113141c>] ? vm_mmap_pgoff+0xbc/0xe0
[12410647.597425]  [<ffffffff81143bb8>] ? SyS_mmap_pgoff+0xd8/0x240
[12410647.597427]  [<ffffffff8155c3ae>] do_page_fault+0xe/0x10
[12410647.597431]  [<ffffffff81558818>] page_fault+0x28/0x30

Signed-off-by: Milosz Tanski <milosz@adfin.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-06 09:17:30 +01:00
David Howells 5002d7bef8 CacheFiles: Implement interface to check cache consistency
Implement the FS-Cache interface to check the consistency of a cache object in
CacheFiles.

Original-author: Hongyi Jia <jiayisuse@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hongyi Jia <jiayisuse@gmail.com>
cc: Milosz Tanski <milosz@adfin.com>
2013-09-06 09:17:30 +01:00
David Howells da9803bc88 FS-Cache: Add interface to check consistency of a cached object
Extend the fscache netfs API so that the netfs can ask as to whether a cache
object is up to date with respect to its corresponding netfs object:

	int fscache_check_consistency(struct fscache_cookie *cookie)

This will call back to the netfs to check whether the auxiliary data associated
with a cookie is correct.  It returns 0 if it is and -ESTALE if it isn't; it
may also return -ENOMEM and -ERESTARTSYS.

The backends now have to implement a mandatory operation pointer:

	int (*check_consistency)(struct fscache_object *object)

that corresponds to the above API call.  FS-Cache takes care of pinning the
object and the cookie in memory and managing this call with respect to the
object state.

Original-author: Hongyi Jia <jiayisuse@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hongyi Jia <jiayisuse@gmail.com>
cc: Milosz Tanski <milosz@adfin.com>
2013-09-06 09:17:30 +01:00
Linus Torvalds b14662cae0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc changes from David Miller:
 "Several bug fixes (from Kirill Tkhai, Geery Uytterhoeven, and Alexey
  Dobriyan) and some support for Fujitsu sparc64x chips (from Allen
  Pais)"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Export flush_ptrace_access() (needed by lustre)
  sparc: fix PCI device proc file mmap(2)
  sparc64: Remove RWSEM export leftovers
  sparc64: Fix off by one in trampoline TLB mapping installation loop.
  sparc64: Fix ITLB handler of null page
  esp_scsi: Fix tag state corruption when autosensing.
  sparc64: Fix not SRA'ed %o5 in 32-bit traced syscall
  sparc64: cleanup: Rename ret_from_syscall to ret_from_fork
  sparc32: Fix exit flag passed from traced sys_sigreturn
  sparc64: Fix wrong syscall return value passed to trace_sys_exit()
  support sparc64x chip type in cpumap.c
  cpu hw caps support for sparc64x
2013-09-05 15:28:17 -07:00
Trond Myklebust 0f1d260550 NFS: Don't check lock owner compatibility in writes unless file is locked
If we're doing buffered writes, and there is no file locking involved,
then we don't have to worry about whether or not the lock owner information
is identical.
By relaxing this check, we ensure that fork()ed child processes can write
to a page without having to first sync dirty data that was written
by the parent to disk.

Reported-by: Quentin Barnes <qbarnes@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Quentin Barnes <qbarnes@gmail.com>
2013-09-05 18:11:42 -04:00
Anand Avati 46ea1562da fuse: drop dentry on failed revalidate
Drop a subtree when we find that it has moved or been delated.  This can be
done as long as there are no submounts under this location.

If the directory was moved and we come across the same directory in a
future lookup it will be reconnected by d_materialise_unique().

Signed-off-by: Anand Avati <avati@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05 16:23:54 -04:00
Miklos Szeredi e2a6b95236 fuse: clean up return in fuse_dentry_revalidate()
On errors unrelated to the filesystem's state (ENOMEM, ENOTCONN) return the
error itself from ->d_revalidate() insted of returning zero (invalid).

Also make a common label for invalidating the dentry.  This will be used by
the next patch.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05 16:23:54 -04:00
Miklos Szeredi 5835f3390e fuse: use d_materialise_unique()
Use d_materialise_unique() instead of d_splice_alias().  This allows dentry
subtrees to be moved to a new place if there moved, even if something is
referencing a dentry in the subtree (open fd, cwd, etc..).

This will also allow us to drop a subtree if it is found to be replaced by
something else.  In this case the disconnected subtree can later be
reconnected to its new location.

d_materialise_unique() ensures that a directory entry only ever has one
alias.  We keep fc->inst_mutex around the calls for d_materialise_unique()
on directories to prevent a race with mkdir "stealing" the inode.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05 16:23:53 -04:00
Miklos Szeredi 6497d160f6 sysfs: use check_submounts_and_drop()
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically.

check_submounts_and_drop() can deal with negative dentries and
non-directories as well.

Non-directories can also be mounted on.  And just like directories we don't
want these to disappear with invalidation.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05 16:23:53 -04:00
Miklos Szeredi 13caa9fb5b nfs: use check_submounts_and_drop()
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically.

check_submounts_and_drop() can deal with negative dentries and
non-directories as well.

Non-directories can also be mounted on.  And just like directories we don't
want these to disappear with invalidation.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05 16:23:52 -04:00
Miklos Szeredi 1191a2bdf0 gfs2: use check_submounts_and_drop()
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically.

check_submounts_and_drop() can deal with negative dentries and
non-directories as well.

Non-directories can also be mounted on.  And just like directories we don't
want these to disappear with invalidation.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05 16:23:51 -04:00
Miklos Szeredi ba81238076 afs: use check_submounts_and_drop()
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically.

check_submounts_and_drop() can deal with negative dentries as well.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05 16:23:51 -04:00
Miklos Szeredi eed8100766 vfs: check unlinked ancestors before mount
We check submounts before doing d_drop() on a non-empty directory dentry in
NFS (have_submounts()), but we do not exclude a racing mount.  Nor do we
prevent mounts to be added to the disconnected subtree using relative paths
after the d_drop().

This patch fixes these issues by checking for unlinked (unhashed, non-root)
ancestors before proceeding with the mount.  This is done with rename
seqlock taken for write and with ->d_lock grabbed on each ancestor in turn,
including our dentry itself.  This ensures that the only one of
check_submounts_and_drop() or has_unlinked_ancestor() can succeed.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05 16:23:50 -04:00
Miklos Szeredi 848ac114e8 vfs: check submounts and drop atomically
We check submounts before doing d_drop() on a non-empty directory dentry in
NFS (have_submounts()), but we do not exclude a racing mount.

 Process A: have_submounts() -> returns false
 Process B: mount() -> success
 Process A: d_drop()

This patch prepares the ground for the fix by doing the following
operations all under the same rename lock:

  have_submounts()
  shrink_dcache_parent()
  d_drop()

This is actually an optimization since have_submounts() and
shrink_dcache_parent() both traverse the same dentry tree separately.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: David Howells <dhowells@redhat.com>
CC: Steven Whitehouse <swhiteho@redhat.com>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05 16:23:41 -04:00
Miklos Szeredi db14fc3abc vfs: add d_walk()
This one replaces three instances open coded tree walking (have_submounts,
select_parent, d_genocide) with a common helper.

In addition to slightly reducing the kernel size, this simplifies the
callers and makes them less bug prone.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05 16:22:44 -04:00
Miklos Szeredi 01ddc4ede5 vfs: restructure d_genocide()
It shouldn't matter when we decrement the refcount during the walk as long
as we do it exactly once.

Restructure d_genocide() to do the killing on entering the dentry instead
of when leaving it.  This helps creating a common helper for tree walking.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05 16:22:43 -04:00
Alexey Dobriyan c4fe244857 sparc: fix PCI device proc file mmap(2)
Commit 786d7e1612 "Fix rmmod/read/write races in /proc entries"
must have broken mmapping of PCI device proc files on Sparc.

Notice how it adds wrapper around ->mmap but doesn't do it around ->get_unmapped_area.
Add wrapper around ->get_unmapped_area.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05 12:12:51 -07:00
Linus Torvalds 45d9a2220f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 1 from Al Viro:
 "Unfortunately, this merge window it'll have a be a lot of small piles -
  my fault, actually, for not keeping #for-next in anything that would
  resemble a sane shape ;-/

  This pile: assorted fixes (the first 3 are -stable fodder, IMO) and
  cleanups + %pd/%pD formats (dentry/file pathname, up to 4 last
  components) + several long-standing patches from various folks.

  There definitely will be a lot more (starting with Miklos'
  check_submount_and_drop() series)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
  direct-io: Handle O_(D)SYNC AIO
  direct-io: Implement generic deferred AIO completions
  add formats for dentry/file pathnames
  kvm eventfd: switch to fdget
  powerpc kvm: use fdget
  switch fchmod() to fdget
  switch epoll_ctl() to fdget
  switch copy_module_from_fd() to fdget
  git simplify nilfs check for busy subtree
  ibmasmfs: don't bother passing superblock when not needed
  don't pass superblock to hypfs_{mkdir,create*}
  don't pass superblock to hypfs_diag_create_files
  don't pass superblock to hypfs_vm_create_files()
  oprofile: get rid of pointless forward declarations of struct super_block
  oprofilefs_create_...() do not need superblock argument
  oprofilefs_mkdir() doesn't need superblock argument
  don't bother with passing superblock to oprofile_create_stats_files()
  oprofile: don't bother with passing superblock to ->create_files()
  don't bother passing sb to oprofile_create_files()
  coh901318: don't open-code simple_read_from_buffer()
  ...
2013-09-05 08:50:26 -07:00
Weston Andros Adamson 8897538e97 nfs4: Map NFS4ERR_WRONG_CRED to EPERM
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05 10:51:22 -04:00
Weston Andros Adamson 8c21c62c44 nfs4.1: Add SP4_MACH_CRED write and commit support
WRITE and COMMIT can use the machine credential.

If WRITE is supported and COMMIT is not, make all (mach cred) writes FILE_SYNC4.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05 10:50:45 -04:00
Weston Andros Adamson 3787d5063c nfs4.1: Add SP4_MACH_CRED stateid support
TEST_STATEID and FREE_STATEID can use the machine credential.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05 10:49:35 -04:00
Weston Andros Adamson 8b5bee2e1b nfs4.1: Add SP4_MACH_CRED secinfo support
SECINFO and SECINFO_NONAME can use the machine credential.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05 10:48:30 -04:00
Weston Andros Adamson fa940720ce nfs4.1: Add SP4_MACH_CRED cleanup support
CLOSE and LOCKU can use the machine credential.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05 10:44:17 -04:00
Weston Andros Adamson ab4c236135 nfs4.1: Add state protection handler
Add nfs4_state_protect - the function responsible for switching to the machine
credential and the correct rpc client when SP4_MACH_CRED is in use.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05 10:43:33 -04:00
Weston Andros Adamson 2031cd1af1 nfs4.1: Minimal SP4_MACH_CRED implementation
This is a minimal client side implementation of SP4_MACH_CRED.  It will
attempt to negotiate SP4_MACH_CRED iff the EXCHANGE_ID is using
krb5i or krb5p auth.  SP4_MACH_CRED will be used if the server supports the
minimal operations:

 BIND_CONN_TO_SESSION
 EXCHANGE_ID
 CREATE_SESSION
 DESTROY_SESSION
 DESTROY_CLIENTID

This patch only includes the EXCHANGE_ID negotiation code because
the client will already use the machine cred for these operations.

If the server doesn't support SP4_MACH_CRED or doesn't support the minimal
operations, the exchange id will be resent with SP4_NONE.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05 10:40:45 -04:00
Benjamin Marzinski 0c9018097f GFS2: dirty inode correctly in gfs2_write_end
GFS2 was only setting I_DIRTY_DATASYNC on files that it wrote to, when
it actually increased the file size.  If gfs2_fsync was called without
I_DIRTY_DATASYNC set, it didn't flush the incore data to the log before
returning, so any metadata or journaled data changes were not getting
fsynced. This meant that writes to the middle of files were not always
getting fsynced properly.

This patch makes gfs2 set I_DIRTY_DATASYNC whenever metadata has been
updated during a write. It also make gfs2_sync flush the incore log
if I_DIRTY_PAGES is set, and the file is using data journalling. This
will make sure that all incore logged data gets written to disk before
returning from a fsync.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-09-05 09:04:24 +01:00
Bob Peterson 1d12d175ea GFS2: Don't flag consistency error if first mounter is a spectator
This patch checks for the first mounter being a specator. If so, it
makes sure all the journals are clean. If there's a dirty journal,
the mount fails.

Testing results:

# insmod gfs2.ko
# mount -tgfs2 -o spectator /dev/sasdrives/scratch /mnt/gfs2
mount: permission denied
# dmesg | tail -2
[ 3390.655996] GFS2: fsid=MUSKETEER:home: Now mounting FS...
[ 3390.841336] GFS2: fsid=MUSKETEER:home.s: jid=0: Journal is dirty, so the first mounter must not be a spectator.
# mount -tgfs2 /dev/sasdrives/scratch /mnt/gfs2
# umount /mnt/gfs2
# mount -tgfs2 -o spectator /dev/sasdrives/scratch /mnt/gfs2
# ls /mnt/gfs2|wc -l
352
# umount /mnt/gfs2

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-09-05 09:03:57 +01:00
Jin Xu a26b7c8a01 f2fs: optimize gc for better performance
This patch improves the gc efficiency by optimizing the victim
selection policy. With this optimization, the random re-write
performance could increase up to 20%.

For f2fs, when disk is in shortage of free spaces, gc will selects
dirty segments and moves valid blocks around for making more space
available. The gc cost of a segment is determined by the valid blocks
in the segment. The less the valid blocks, the higher the efficiency.
The ideal victim segment is the one that has the most garbage blocks.

Currently, it searches up to 20 dirty segments for a victim segment.
The selected victim is not likely the best victim for gc when there
are much more dirty segments. Why not searching more dirty segments
for a better victim? The cost of searching dirty segments is
negligible in comparison to moving blocks.

In this patch, it enlarges the MAX_VICTIM_SEARCH to 4096 to make
the search more aggressively for a possible better victim. Since
it also applies to victim selection for SSR, it will likely improve
the SSR efficiency as well.

The test case is simple. It creates as many files until the disk full.
The size for each file is 32KB. Then it writes as many as 100000
records of 4KB size to random offsets of random files in sync mode.
The testing was done on a 2GB partition of a SDHC card. Let's see the
test result of f2fs without and with the patch.

---------------------------------------
2GB partition, SDHC
create 52023 files of size 32768 bytes
random re-write 100000 records of 4KB
---------------------------------------
| file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
[no patch]  341         4227             1174          174840
[patched]   324         2958             645           106682

It's obvious that, with the patch, f2fs finishes the test in 20+% less
time than without the patch. And internally it does much less gc with
higher efficiency than before.

Since the performance improvement is related to gc, it might not be so
obvious for other tests that do not trigger gc as often as this one (
This is because f2fs selects dirty segments for SSR use most of the
time when free space is in shortage). The well-known iozone test tool
was not used for benchmarking the patch becuase it seems do not have
a test case that performs random re-write on a full disk.

This patch is the revised version based on the suggestion from
Jaegeuk Kim.

Signed-off-by: Jin Xu <jinuxstyle@gmail.com>
[Jaegeuk Kim: suggested simpler solution]
Reviewed-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-09-05 13:50:32 +09:00
Jaegeuk Kim 423e95ccbe f2fs: merge more bios of node block writes
Previously, we experience bio traces as follows when running simple sequential
write test.

 f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500104928, size = 4K
 f2fs_do_submit_bio: type = NODE, io = no sync, sector = 499922208, size = 368K
 f2fs_do_submit_bio: type = NODE, io = no sync, sector = 499914752, size = 140K

 -> total 512K

The first one is to write an indirect node block, and the others are to write
direct node blocks.

The reason why there are two separate bios for direct node blocks is:
0. initial state
------------------    ------------------
|                |    |xxxxxxxx        |
------------------    ------------------

1. write 368K
------------------    ------------------
|                |    |xxxxxxxxWWWWWWWW|
------------------    ------------------

2. write 140K
------------------    ------------------
|WWWWWWW         |    |xxxxxxxxWWWWWWWW|
------------------    ------------------

This is because f2fs_write_node_pages tries to write just 512K totally, so that
we can lose the chance to merge more bios nicely.

After this patch is applied, we can get the following bio traces.

  f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500103168, size = 8K
  f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500111368, size = 4K
  f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500107272, size = 512K
  f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500108296, size = 512K
  f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500109320, size = 500K

And finally, we can improve the sequential write performance,
    from 458.775 MB/s to 479.945 MB/s on SSD.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-09-05 10:17:19 +09:00
Linus Torvalds 27703bb4a6 PTR_RET() is a weird name, and led to some confusing usage. We ended
up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages.
 
 This has been sitting in linux-next for a whole cycle.
 
 Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJSJo+1AAoJENkgDmzRrbjxIC4QALJK95o8AUXuwUkl+2fmFkUt
 hh2/PJ1vDYgk4Xt0J6hyoK7XMa0H1RkbBrROuDdsBnorMFpEsGcgdkUZte9ufoAS
 97Bg+7N0KPbTB/S8vOwtW1vbERTJIVPN2uf6h1Wqm9Xc2puCh3HbMMr1AWMGu0WQ
 NqY5+Zz8zecy1UOrMhEP6H1CjeQcL1w1DO6YM5ydeqlKNzAz+JMfDXriLPDwiE7+
 XFPDF/O3Vtd2ckA7L70Lio7hfHwxV5U4WwFVfiwls98XB4jcZqDKIoh1r8z4SRgR
 +0Rae2DN3BaOabGMr//5XdrzQVpwJTh5m2w8BAOHJvCJ9HR7Sq29UIN4u+TowZBy
 L2xYo4dvFxkympwu5zEd3c7vHYWKIaqmSq5PIjr4gF/uIo2OeOTrpPIK782ZEYb7
 e+qUgOEM05V9AmQZCrSZeP9u474Sj8ow3sCtWxfdRtwNfoEIcUXsNNJd/zDHlVtW
 cEtXqc2xXIpcuUJQWlSaGp8fmRQjVZPzrLKYLM2m39ZcOOJbf5rzQAYS7hHPosIa
 SK+YVux/+Zzi+Xo/vXq1OlM/SruCr5S7JOgCxLowoQ88vupgXME6uPyC8EO+QQ50
 GsrHes5ZNLbk0uVsfcexIyojkUnyvDmmnDpv+1zdC6RgZLJQn8OXp5yNhHhnhrFT
 BiHX6YFWtDDqRlVv8Q0F
 =LeaW
 -----END PGP SIGNATURE-----

Merge tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull PTR_RET() removal patches from Rusty Russell:
 "PTR_RET() is a weird name, and led to some confusing usage.  We ended
  up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages.

  This has been sitting in linux-next for a whole cycle"

[ There are still some PTR_RET users scattered about, with some of them
  possibly being new, but most of them existing in Rusty's tree too.  We
  have that

      #define PTR_RET(p) PTR_ERR_OR_ZERO(p)

  thing in <linux/err.h>, so they continue to work for now  - Linus ]

* tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  GFS2: Replace PTR_RET with PTR_ERR_OR_ZERO
  Btrfs: volume: Replace PTR_RET with PTR_ERR_OR_ZERO
  drm/cma: Replace PTR_RET with PTR_ERR_OR_ZERO
  sh_veu: Replace PTR_RET with PTR_ERR_OR_ZERO
  dma-buf: Replace PTR_RET with PTR_ERR_OR_ZERO
  drivers/rtc: Replace PTR_RET with PTR_ERR_OR_ZERO
  mm/oom_kill: remove weird use of ERR_PTR()/PTR_ERR().
  staging/zcache: don't use PTR_RET().
  remoteproc: don't use PTR_RET().
  pinctrl: don't use PTR_RET().
  acpi: Replace weird use of PTR_RET.
  s390: Replace weird use of PTR_RET.
  PTR_RET is now PTR_ERR_OR_ZERO(): Replace most.
  PTR_RET is now PTR_ERR_OR_ZERO
2013-09-04 17:31:11 -07:00
Linus Torvalds 6f3bc58d84 dlm for 3.12
This set includes a workqueue cleanup and the removal
 of incorrect and unneeded signal blocking.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJSJ4DKAAoJEDgbc8f8gGmqJ3kP/2UwnKt9degadDAPGy+xORUK
 WDkQ3YHacNkdVaIdSh6o4pAHG0Z2JU8zykJkHOAJd+SZhEp0hxEu05rfUSZKxJH/
 b3pG6dpjX8pC1TtSibFXRrpH2BHIzpPRdxiDkmMs71myExwxeMbIQuGxmft8nigB
 6yPBbb8biVTCNTzdpR2b0KbVKzrkIBotwWTUEZd3HJedxRO/BlQxOgHDgnicupfH
 0FsU3aa+p986wwajZBPXPFrVcBGHMXOXc3Lpm56Uh7OwqCVYe2gE7BMW3Y7rcZ1G
 sGiB542m5i/LAYnf8sFXm2OLVjaUjWPyrWIyrLe5pPzZtHxctdhfyOHzXBqvKkj3
 VRdcQN1rob+/8FGyu3tmYjhlhkEVQqIEX6sweuuGG5sE/zzvUzcUP0qIAT11QQJ/
 Bt7nv5VpfT7kxIf7ToL/NtxHulU66TxC387d/QnkXftlDbzpzEY0YcCmcIdZMVEr
 e6rgvp1iWTvasLh/UFjTBkoRAX4DGpFQasYcS1S+Uj3IZx/wIbN8sIEijWcVvvuz
 fLeF2idgNqFCseWceBJGoHPsgFPRVDdFk9+7isu4FQYOEH0+VfNqJz3YU/bTehGK
 +GcfORSKohw2M3EUarcRHN0dV6vy6C/ACgTiUb7CubgbiUUzNWKTVVVuzRGqzBNE
 +k7w/Zbbz+Eugm4Zkty8
 =MaZw
 -----END PGP SIGNATURE-----

Merge tag 'dlm-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm

Pull dlm updates from David Teigland:
 "This set includes a workqueue cleanup and the removal of incorrect and
  unneeded signal blocking"

* tag 'dlm-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: remove signal blocking
  dlm: WQ_NON_REENTRANT is meaningless and going away
2013-09-04 17:23:59 -07:00
Linus Torvalds ae67d9a888 New features for 3.12:
* Added aggressive extent caching using the extent status tree.  This
   can actually decrease memory usage in read-mostly workloads since
   the information is much more compactly stored in the extent status
   tree than if we had to keep the extent tree metadata blocks in the
   buffer cache.  This also improves Asynchronous I/O since it is it
   makes much less likely that we need to do metadata I/O to lookup the
   extent tree information.
 * Improve the recovery after corrupted allocation bitmaps are found
   when running in errors=ignore mode.
 
 Also fixed some writeback vs. truncate races when using a blocksize
 less than the page size.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABCAAGBQJSJ1SxAAoJENNvdpvBGATw6xAP/250u0YggRHup5cxmkJ7x+EH
 sv/Kbe8r1ftUY7aBQP1awHlVYnOZehh+kYUj+eIVPPXKananhu99qcJy99KFm8W9
 gWVP5G0+zvKD++S8yHKhyKjqUtzZwhlYJU7oyqptPr903CVlfjsKx1OtGvUlbnde
 Hh/e+XpbltICPIa/O6gsE3SyRakbPtI0gvC4GbsD6EvAl+Rj3l6l+Ty9IkDqGFs9
 YCVA2MUly6ZFYNRS8wkOPRP8T8lLwqIa7CNc75bEJPrGQL1R0iiIez0yaoZ83SOu
 HMC6wo3XjfgcsuMwJo/mtYsw06rjQy5oNPD5bISRaDtocI5v5Rv8t5EmANnoJFbu
 gy+psJ0XcKimL1BfsQ4vFCNiAkskkCQaFr2yJbo6VTDtHS8XV39MeMZ6IvcSqO+6
 DQafMcKNiltDbdsywncsee+8ecncv/ZEZDiA6pIUm0lbljPopuzf6sBvxWOFGiHM
 xMBD0eyhns/TzfYHzzI+fTcR+GdBDqAkNOrA9i4medffS6iJDAJ6qC6ZhgQh32oR
 MCfYosVQwxmCInqtCh51+od29rk7ZIuBrPjp1+uMHjHqG5jDKcANgB7g3VAeQOf0
 zuEYTFvGk6cLKfuJtlnaItKXN+eRTtVtfHlLRRq1+wR9UK+dFONV0Jufzs7Y1URI
 LbsmGkgxTL9xZEskZXgQ
 =tosu
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "New features for 3.12:

   - Added aggressive extent caching using the extent status tree.  This
     can actually decrease memory usage in read-mostly workloads since
     the information is much more compactly stored in the extent status
     tree than if we had to keep the extent tree metadata blocks in the
     buffer cache.  This also improves Asynchronous I/O since it is it
     makes much less likely that we need to do metadata I/O to lookup
     the extent tree information.

   - Improve the recovery after corrupted allocation bitmaps are found
     when running in errors=ignore mode.

  Also fixed some writeback vs truncate races when using a blocksize
  less than the page size"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits)
  ext4: allow specifying external journal by pathname mount option
  ext4: mark group corrupt on group descriptor checksum
  ext4: mark block group as corrupt on inode bitmap error
  ext4: mark block group as corrupt on block bitmap error
  ext4: fix type declaration of ext4_validate_block_bitmap
  ext4: error out if verifying the block bitmap fails
  jbd2: Fix endian mixing problems in the checksumming code
  ext4: isolate ext4_extents.h file
  ext4: Fix misspellings using 'codespell' tool
  ext4: convert write_begin methods to stable_page_writes semantics
  ext4: fix use of potentially uninitialized variables in debugging code
  ext4: fix lost truncate due to race with writeback
  ext4: simplify truncation code in ext4_setattr()
  ext4: fix ext4_writepages() in presence of truncate
  ext4: move test whether extent to map can be extended to one place
  ext4: fix warning in ext4_da_update_reserve_space()
  quota: provide interface for readding allocated space into reserved space
  ext4: avoid reusing recently deleted inodes in no journal mode
  ext4: allocate delayed allocation blocks before rename
  ext4: start handle at least possible moment when renaming files
  ...
2013-09-04 17:19:27 -07:00
Trond Myklebust f6de7a39c1 NFSv4: Document the recover_lost_locks kernel parameter
Rename the new 'recover_locks' kernel parameter to 'recover_lost_locks'
and change the default to 'false'. Document why in
Documentation/kernel-parameters.txt

Move the 'recover_lost_locks' kernel parameter to fs/nfs/super.c to
make it easy to backport to kernels prior to 3.6.x, which don't have
a separate NFSv4 module.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-04 12:26:32 -04:00
NeilBrown ef1820f9be NFSv4: Don't try to recover NFSv4 locks when they are lost.
When an NFSv4 client loses contact with the server it can lose any
locks that it holds.

Currently when it reconnects to the server it simply tries to reclaim
those locks.  This might succeed even though some other client has
held and released a lock in the mean time.  So the first client might
think the file is unchanged, but it isn't.  This isn't good.

If, when recovery happens, the locks cannot be claimed because some
other client still holds the lock, then we get a message in the kernel
logs, but the client can still write.  So two clients can both think
they have a lock and can both write at the same time.  This is equally
not good.

There was a patch a while ago
  http://comments.gmane.org/gmane.linux.nfs/41917

which tried to address some of this, but it didn't seem to go
anywhere.  That patch would also send a signal to the process.  That
might be useful but for now this patch just causes writes to fail.

For NFSv4 (unlike v2/v3) there is a strong link between the lock and
the write request so we can fairly easily fail any IO of the lock is
gone.  While some applications might not expect this, it is still
safer than allowing the write to succeed.

Because this is a fairly big change in behaviour a module parameter,
"recover_locks", is introduced which defaults to true (the current
behaviour) but can be set to "false" to tell the client not to try to
recover things that were lost.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-04 12:26:32 -04:00
Chuck Lever b6a85258d8 NFS: Fix warning introduced by NFSv4.0 transport blocking patches
When CONFIG_NFS_V4_1 is not enabled, gcc emits this warning:

linux/fs/nfs/nfs4state.c:255:12: warning:
 ‘nfs4_begin_drain_session’ defined but not used [-Wunused-function]
 static int nfs4_begin_drain_session(struct nfs_client *clp)
            ^

Eventually NFSv4.0 migration recovery will invoke this function, but
that has not yet been merged.  Hide nfs4_begin_drain_session()
behind CONFIG_NFS_V4_1 for now.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-04 12:26:30 -04:00
Chuck Lever 1cec16abf2 When CONFIG_NFS_V4_1 is not enabled, "make C=2" emits this warning:
linux/fs/nfs/nfs4session.c:337:6: warning:
 symbol 'nfs41_set_target_slotid' was not declared. Should it be static?

Move nfs41_set_target_slotid() and nfs41_update_target_slotid() back
behind CONFIG_NFS_V4_1, since, in the final revision of this work,
they are used only in NFSv4.1 and later.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-04 12:26:30 -04:00
Dong Fang 05726acabe fuse: use list_for_each_entry() for list traversing
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-09-04 17:42:42 +02:00
Bob Peterson 068213f7d3 GFS2: Remove unnecessary memory barrier
Function test_and_clear_bit implies a memory barrier, so subsequent
memory barriers are unnecessary.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-09-04 15:58:21 +01:00
Christoph Hellwig 02afc27fae direct-io: Handle O_(D)SYNC AIO
Call generic_write_sync() from the deferred I/O completion handler if
O_DSYNC is set for a write request.  Also make sure various callers
don't call generic_write_sync if the direct I/O code returns
-EIOCBQUEUED.

Based on an earlier patch from Jan Kara <jack@suse.cz> with updates from
Jeff Moyer <jmoyer@redhat.com> and Darrick J. Wong <darrick.wong@oracle.com>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-04 09:23:46 -04:00
Christoph Hellwig 7b7a8665ed direct-io: Implement generic deferred AIO completions
Add support to the core direct-io code to defer AIO completions to user
context using a workqueue.  This replaces opencoded and less efficient
code in XFS and ext4 (we save a memory allocation for each direct IO)
and will be needed to properly support O_(D)SYNC for AIO.

The communication between the filesystem and the direct I/O code requires
a new buffer head flag, which is a bit ugly but not avoidable until the
direct I/O code stops abusing the buffer_head structure for communicating
with the filesystems.

Currently this creates a per-superblock unbound workqueue for these
completions, which is taken from an earlier patch by Jan Kara.  I'm
not really convinced about this use and would prefer a "normal" global
workqueue with a high concurrency limit, but this needs further discussion.

JK: Fixed ext4 part, dynamic allocation of the workqueue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-04 09:23:46 -04:00
Linus Torvalds f83b0a4e4c Big part of this is the addition of compression to the
generic pstore layer so that all backends can use the
 pitiful amounts of storage they control more effectively.
 Three other small fixes/cleanups too.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJSJhMJAAoJEKurIx+X31iBcpsP/1DfQDD9IHEt7E+tTObIdKOx
 1yygmQu3IL2pvdUIkKRfblDP0PJRuwJmRPdp48Rb5CV+8i4WQuTCueOvNomIMZ4G
 1DkAzJPD2W4C+dSBD3rvWk4totXCsqLwCkaWp/FzWM7OW+cJN3Kj3Vdh3k01CVxn
 PqfCoZB5y8CVVd6EQiUmAU8z/LoGE0psxxO/CL7lBC1IsAlVqSO93hqHwE9cm/f4
 7EN069sHbKV5zOT1QsV7agn9TfHAHuj0fr3t8aDs/cC0rgz19VAsxg0sRkMHzKJD
 MGLqGCTHyQvYmo2xfHZLb1NjhL6EmHsQqhM9KQnkdDyQ3QgYOCSsglcYygSdZcN1
 ZjynLbD/OSkLABefyw/ZjW1GGzIqU6jWzj+0fmbkUa0Pe1B1aDuf/OiAcYMRJdmg
 clWE7RbnisSlrLYl3MI36T4XvyzzHVfumOVET6foWcnVb8/t/gHhC6Yf96FSUWVj
 jISE8ECfjkk7SQTYHE0ThvBmqv8M/IwUEg6wA9tnsQByLMAO+VqGoXcBMKOxz+hD
 7TPoQpa04Sqnplwx7HefbhErKH9TCQWUOgT5kjB7+bzbPpEG1cB3pb7izg3VFd7k
 l7d6vfXtAukeuYmU8PgY/ea6/Nnw7iMcDZCegzv+A4Ms/kWM9oH2ctvV6V/edYyb
 0x7BgSOu69HztW5IveB6
 =6H8r
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull pstore changes from Tony Luck:
 "A big part of this is the addition of compression to the generic
  pstore layer so that all backends can use the pitiful amounts of
  storage they control more effectively.  Three other small
  fixes/cleanups too.

* tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  pstore/ram: (really) fix undefined usage of rounddown_pow_of_two
  pstore/ram: Read and write to the 'compressed' flag of pstore
  efi-pstore: Read and write to the 'compressed' flag of pstore
  erst: Read and write to the 'compressed' flag of pstore
  powerpc/pseries: Read and write to the 'compressed' flag of pstore
  pstore: Add file extension to pstore file if compressed
  pstore: Add decompression support to pstore
  pstore: Introduce new argument 'compressed' in the read callback
  pstore: Add compression support to pstore
  pstore/Kconfig: Select ZLIB_DEFLATE and ZLIB_INFLATE when PSTORE is selected
  pstore: Add new argument 'compressed' in pstore write callback
  powerpc/pseries: Remove (de)compression in nvram with pstore enabled
  pstore: d_alloc_name() doesn't return an ERR_PTR
  acpi/apei/erst: Add missing iounmap() on error in erst_exec_move_data()
2013-09-03 21:14:06 -07:00
Al Viro 173c84012a switch fchmod() to fdget
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-03 23:04:45 -04:00
Al Viro 7e3fb5842e switch epoll_ctl() to fdget
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-03 23:04:44 -04:00
Al Viro e95c311e17 git simplify nilfs check for busy subtree
Reviewed-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-03 22:52:50 -04:00
Al Viro badcf2b7b8 constify touch_atime()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-03 22:52:45 -04:00
Jeff Layton 8033426e6b vfs: allow umount to handle mountpoints without revalidating them
Christopher reported a regression where he was unable to unmount a NFS
filesystem where the root had gone stale. The problem is that
d_revalidate handles the root of the filesystem differently from other
dentries, but d_weak_revalidate does not. We could simply fix this by
making d_weak_revalidate return success on IS_ROOT dentries, but there
are cases where we do want to revalidate the root of the fs.

A umount is really a special case. We generally aren't interested in
anything but the dentry and vfsmount that's attached at that point. If
the inode turns out to be stale we just don't care since the intent is
to stop using it anyway.

Try to handle this situation better by treating umount as a special
case in the lookup code. Have it resolve the parent using normal
means, and then do a lookup of the final dentry without revalidating
it. In most cases, the final lookup will come out of the dcache, but
the case where there's a trailing symlink or !LAST_NORM entry on the
end complicates things a bit.

Cc: Neil Brown <neilb@suse.de>
Reported-by: Christopher T Vogan <cvogan@us.ibm.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-03 22:50:29 -04:00
Yan, Zheng 590fb51f1c vfs: call d_op->d_prune() before unhashing dentry
The d_prune dentry operation is used to notify filesystem when VFS
about to prune a hashed dentry from the dcache. There are three
code paths that prune dentries: shrink_dcache_for_umount_subtree(),
prune_dcache_sb() and d_prune_aliases(). For the d_prune_aliases()
case, VFS unhashes the dentry first, then call the d_prune dentry
operation. This confuses ceph_d_prune() (ceph uses the d_prune
dentry operation to maintain a flag indicating whether the complete
contents of a directory are in the dcache, pruning unhashed dentry
does not affect dir's completeness)

This patch fixes the issue by calling the d_prune dentry operation
in d_prune_aliases(), before unhashing the dentry. Also make VFS
only call the d_prune dentry operation for hashed dentry, to avoid
calling the d_prune dentry operation twice when dentry is pruned
by d_prune_aliases().

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-03 22:50:28 -04:00
Al Viro 184cacabe2 only regular files with FMODE_WRITE need to be on s_files
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-03 22:50:28 -04:00
Al Viro 301f0268b6 nfsd: racy access to ->d_name in nsfd4_encode_path()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-03 22:50:28 -04:00
Linus Torvalds 32dad03d16 Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "A lot of activities on the cgroup front.  Most changes aren't visible
  to userland at all at this point and are laying foundation for the
  planned unified hierarchy.

   - The biggest change is decoupling the lifetime management of css
     (cgroup_subsys_state) from that of cgroup's.  Because controllers
     (cpu, memory, block and so on) will need to be dynamically enabled
     and disabled, css which is the association point between a cgroup
     and a controller may come and go dynamically across the lifetime of
     a cgroup.  Till now, css's were created when the associated cgroup
     was created and stayed till the cgroup got destroyed.

     Assumptions around this tight coupling permeated through cgroup
     core and controllers.  These assumptions are gradually removed,
     which consists bulk of patches, and css destruction path is
     completely decoupled from cgroup destruction path.  Note that
     decoupling of creation path is relatively easy on top of these
     changes and the patchset is pending for the next window.

   - cgroup has its own event mechanism cgroup.event_control, which is
     only used by memcg.  It is overly complex trying to achieve high
     flexibility whose benefits seem dubious at best.  Going forward,
     new events will simply generate file modified event and the
     existing mechanism is being made specific to memcg.  This pull
     request contains prepatory patches for such change.

   - Various fixes and cleanups"

Fixed up conflict in kernel/cgroup.c as per Tejun.

* 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (69 commits)
  cgroup: fix cgroup_css() invocation in css_from_id()
  cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp()
  cgroup: make cgroup_event hold onto cgroup_subsys_state instead of cgroup
  cgroup: implement CFTYPE_NO_PREFIX
  cgroup: make cgroup_css() take cgroup_subsys * instead and allow NULL subsys
  cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax
  cgroup: fix cgroup_write_event_control()
  cgroup: fix subsystem file accesses on the root cgroup
  cgroup: change cgroup_from_id() to css_from_id()
  cgroup: use css_get() in cgroup_create() to check CSS_ROOT
  cpuset: remove an unncessary forward declaration
  cgroup: RCU protect each cgroup_subsys_state release
  cgroup: move subsys file removal to kill_css()
  cgroup: factor out kill_css()
  cgroup: decouple cgroup_subsys_state destruction from cgroup destruction
  cgroup: replace cgroup->css_kill_cnt with ->nr_css
  cgroup: bounce cgroup_subsys_state ref kill confirmation to a work item
  cgroup: move cgroup->subsys[] assignment to online_css()
  cgroup: reorganize css init / exit paths
  cgroup: add __rcu modifier to cgroup->subsys[]
  ...
2013-09-03 18:25:03 -07:00
Dave Chinner 1d03c6fa88 xfs: XFS_MOUNT_QUOTA_ALL needed by userspace
So move it to a header file shared with userspace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-03 15:00:06 -05:00
Dave Chinner 50fc5f7acc xfs: dtype changed xfs_dir2_sfe_put_ino to xfs_dir3_sfe_put_ino
So fix up the export in xfs_dir2.h that is needed by userspace.

<sigh>

Now xfs_dir3_sfe_put_ino has been made static. Revert 98f7462 ("xfs:
xfs_dir3_sfe_put_ino can be static") to being non static so that the
code shared with userspace is identical again.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-03 14:51:16 -05:00
Chuck Lever 2cf8bca8b9 NFS: Update session draining barriers for NFSv4.0 transport blocking
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:26:38 -04:00
Chuck Lever be05c860d7 NFS: Add nfs4_sequence calls for OPEN_CONFIRM
Ensure OPEN_CONFIRM is not emitted while the transport is plugged.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:26:37 -04:00
Chuck Lever fbd4bfd1d9 NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER
Ensure RELEASE_LOCKOWNER is not emitted while the transport is
plugged.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:26:37 -04:00
Chuck Lever 160881e33d NFS: Enable nfs4_setup_sequence() for DELEGRETURN
When CONFIG_NFS_V4_1 is disabled, the calls to nfs4_setup_sequence()
and nfs4_sequence_done() are compiled out for the DELEGRETURN
operation.  To allow NFSv4.0 transport blocking to work for
DELEGRETURN, these call sites have to be present all the time.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:26:36 -04:00
Chuck Lever 3bd2384a77 NFS: NFSv4.0 transport blocking
Plumb in a mechanism for plugging an NFSv4.0 mount, using the
same infrastructure as NFSv4.1 sessions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:26:35 -04:00
Chuck Lever abf79bb341 NFS: Add a slot table to struct nfs_client for NFSv4.0 transport blocking
Anchor an nfs4_slot_table in the nfs_client for use with NFSv4.0
transport blocking.  It is initialized only for NFSv4.0 nfs_client's.

Introduce appropriate minor version ops to handle nfs_client
initialization and shutdown requirements that differ for each minor
version.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:26:35 -04:00
Chuck Lever eb2a1cd3c9 NFS: Add global helper for releasing slot table resources
The nfs4_destroy_slot_tables() function is renamed to avoid
confusion with the new helper.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:26:34 -04:00
Chuck Lever 744aa52530 NFS: Add global helper to set up a stand-along nfs4_slot_table
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:26:34 -04:00
Chuck Lever 9d33059c1b NFS: Enable slot table helpers for NFSv4.0
I'd like to re-use NFSv4.1's slot table machinery for NFSv4.0
transport blocking.  Re-organize some of nfs4session.c so the slot
table code is built even when NFS_V4_1 is disabled.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:26:33 -04:00
Chuck Lever 220e09ccd3 NFS: Remove unused call_sync minor version op
Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:26:33 -04:00
Chuck Lever 9915ea7e0a NFS: Add RPC callouts to start NFSv4.0 synchronous requests
Refactor nfs4_call_sync_sequence() so it is used for NFSv4.0 now.
The RPC callouts will house transport blocking logic similar to
NFSv4.1 sessions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:26:32 -04:00
Chuck Lever a9c92d6b85 NFS: Common versions of sequence helper functions
NFSv4.0 will have need for this functionality when I add the ability
to block NFSv4.0 traffic before migration recovery.

I'm not really clear on why nfs4_set_sequence_privileged() gets a
generic name, but nfs41_init_sequence() gets a minor
version-specific name.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:26:32 -04:00
Chuck Lever 5a580e0ae2 NFS: Clean up nfs4_setup_sequence()
Clean up: Both the NFSv4.0 and NFSv4.1 version of
nfs4_setup_sequence() are used only in fs/nfs/nfs4proc.c.  No need
to keep global header declarations for either version.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:26:31 -04:00
Chuck Lever 2a3eb2b97b NFS: Rename nfs41_call_sync_data as a common data structure
Clean up: rename nfs41_call_sync_data for use as a data structure
common to all NFSv4 minor versions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:26:31 -04:00
Chuck Lever e8d92382dd NFS: When displaying session slot numbers, use "%u" consistently
Clean up, since slot and sequence numbers are all unsigned anyway.

Among other things, squelch compiler warnings:

linux/fs/nfs/nfs4proc.c: In function ‘nfs4_setup_sequence’:
linux/fs/nfs/nfs4proc.c:703:2: warning: signed and unsigned type in
	conditional expression [-Wsign-compare]

and

linux/fs/nfs/nfs4session.c: In function ‘nfs4_alloc_slot’:
linux/fs/nfs/nfs4session.c:151:31: warning: signed and unsigned type in
	conditional expression [-Wsign-compare]

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:26:30 -04:00
Trond Myklebust ba6c05928d NFS: Ensure that rmdir() waits for sillyrenames to complete
If an NFS client does

	mkdir("dir");
	fd = open("dir/file");
	unlink("dir/file");
	close(fd);
	rmdir("dir");

then the asynchronous nature of the sillyrename operation means that
we can end up getting EBUSY for the rmdir() in the above test. Fix
that by ensuring that we wait for any in-progress sillyrenames
before sending the rmdir() to the server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:26:29 -04:00
Weston Andros Adamson a5250def7c NFSv4: use the mach cred for SECINFO w/ integrity
Commit 5ec16a8500 introduced a regression
that causes SECINFO to fail without actualy sending an RPC if:

 1) the nfs_client's rpc_client was using KRB5i/p (now tried by default)
 2) the current user doesn't have valid kerberos credentials

This situation is quite common - as of now a sec=sys mount would use
krb5i for the nfs_client's rpc_client and a user would hardly be faulted
for not having run kinit.

The solution is to use the machine cred when trying to use an integrity
protected auth flavor for SECINFO.

Older servers may not support using the machine cred or an integrity
protected auth flavor for SECINFO in every circumstance, so we fall back
to using the user's cred and the filesystem's auth flavor in this case.

We run into another problem when running against linux nfs servers -
they return NFS4ERR_WRONGSEC when using integrity auth flavor (unless the
mount is also that flavor) even though that is not a valid error for
SECINFO*.  Even though it's against spec, handle WRONGSEC errors on SECINFO
by falling back to using the user cred and the filesystem's auth flavor.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:25:10 -04:00
Andy Adamson dc24826bfc NFS avoid expired credential keys for buffered writes
We must avoid buffering a WRITE that is using a credential key (e.g. a GSS
context key) that is about to expire or has expired.  We currently will
paint ourselves into a corner by returning success to the applciation
for such a buffered WRITE, only to discover that we do not have permission when
we attempt to flush the WRITE (and potentially associated COMMIT) to disk.

Use the RPC layer credential key timeout and expire routines which use a
a watermark, gss_key_expire_timeo. We test the key in nfs_file_write.

If a WRITE is using a credential with a key that will expire within
watermark seconds, flush the inode in nfs_write_end and send only
NFS_FILE_SYNC WRITEs by adding nfs_ctx_key_to_expire to nfs_need_sync_write.
Note that this results in single page NFS_FILE_SYNC WRITEs.

Signed-off-by: Andy Adamson <andros@netapp.com>
[Trond: removed a pr_warn_ratelimited() for now]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03 15:25:09 -04:00
Linus Torvalds 542a086ac7 Driver core patches for 3.12-rc1
Here's the big driver core pull request for 3.12-rc1.
 
 Lots of tiny changes here fixing up the way sysfs attributes are
 created, to try to make drivers simpler, and fix a whole class race
 conditions with creations of device attributes after the device was
 announced to userspace.
 
 All the various pieces are acked by the different subsystem maintainers.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.21 (GNU/Linux)
 
 iEYEABECAAYFAlIlIPcACgkQMUfUDdst+ynUMwCaAnITsxyDXYQ4DqEsz8EcOtMk
 718AoLrgnUZs3B+70AT34DVktg4HSThk
 =USl9
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core patches from Greg KH:
 "Here's the big driver core pull request for 3.12-rc1.

  Lots of tiny changes here fixing up the way sysfs attributes are
  created, to try to make drivers simpler, and fix a whole class race
  conditions with creations of device attributes after the device was
  announced to userspace.

  All the various pieces are acked by the different subsystem
  maintainers"

* tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (119 commits)
  firmware loader: fix pending_fw_head list corruption
  drivers/base/memory.c: introduce help macro to_memory_block
  dynamic debug: line queries failing due to uninitialized local variable
  sysfs: sysfs_create_groups returns a value.
  debugfs: provide debugfs_create_x64() when disabled
  rbd: convert bus code to use bus_groups
  firmware: dcdbas: use binary attribute groups
  sysfs: add sysfs_create/remove_groups for when SYSFS is not enabled
  driver core: add #include <linux/sysfs.h> to core files.
  HID: convert bus code to use dev_groups
  Input: serio: convert bus code to use drv_groups
  Input: gameport: convert bus code to use drv_groups
  driver core: firmware: use __ATTR_RW()
  driver core: core: use DEVICE_ATTR_RO
  driver core: bus: use DRIVER_ATTR_WO()
  driver core: create write-only attribute macros for devices and drivers
  sysfs: create __ATTR_WO()
  driver-core: platform: convert bus code to use dev_groups
  workqueue: convert bus code to use dev_groups
  MEI: convert bus code to use dev_groups
  ...
2013-09-03 11:37:15 -07:00
Linus Torvalds fc6d0b0376 Merge branch 'lockref' (locked reference counts)
Merge lockref infrastructure code by me and Waiman Long.

I already merged some of the preparatory patches that didn't actually do
any semantic changes earlier, but this merges the actual _reason_ for
those preparatory patches.

The "lockref" structure is a combination "spinlock and reference count"
that allows optimized reference count accesses.  In particular, it
guarantees that the reference count will be updated AS IF the spinlock
was held, but using atomic accesses that cover both the reference count
and the spinlock words, we can often do the update without actually
having to take the lock.

This allows us to avoid the nastiest cases of spinlock contention on
large machines under heavy pathname lookup loads.  When updating the
dentry reference counts on a large system, we'll still end up with the
cache line bouncing around, but that's much less noticeable than
actually having to spin waiting for the lock.

* lockref:
  lockref: implement lockless reference count updates using cmpxchg()
  lockref: uninline lockref helper functions
  vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock()
  vfs: use lockref_get_not_zero() for optimistic lockless dget_parent()
  lockref: add 'lockref_get_or_lock() helper
2013-09-03 08:08:21 -07:00
Miklos Szeredi efeb9e60d4 fuse: readdir: check for slash in names
Userspace can add names containing a slash character to the directory
listing.  Don't allow this as it could cause all sorts of trouble.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org
2013-09-03 14:28:38 +02:00