Commit graph

34648 commits

Author SHA1 Message Date
Linus Torvalds 1d32bdafaa xfs: update for v3.14-rc1
For 3.14-rc1 there are fixes in the areas of remote attributes, discard,
 growfs, memory leaks in recovery, directory v2, quotas, the MAINTAINERS
 file, allocation alignment, extent list locking, and in
 xfs_bmapi_allocate.  There are cleanups in xfs_setsize_buftarg, removing
 unused macros, quotas, setattr, and freeing of inode clusters.  The
 in-memory and on-disk log format have been decoupled, a common helper to
 calculate the number of blocks in an inode cluster has been added, and
 handling of i_version has been pulled into the filesystems that use it.
 
 - cleanup in xfs_setsize_buftarg
 - removal of remaining unused flags for vop toss/flush/flushinval
 - fix for memory corruption in xfs_attrlist_by_handle
 - fix for out-of-date comment in xfs_trans_dqlockedjoin
 - fix for discard if range length is less than one block
 - fix for overrun of agfl buffer using growfs on v4 superblock filesystems
 - pull i_version handling out into the filesystems that use it
 - don't leak recovery items on error
 - fix for memory leak in xfs_dir2_node_removename
 - several cleanups for quotas
 - fix bad assertion in xfs_qm_vop_create_dqattach
 - cleanup for xfs_setattr_mode, and add xfs_setattr_time
 - fix quota assert in xfs_setattr_nonsize
 - fix an infinite loop when turning off group/project quota before user
   quota
 - fix for temporary buffer allocation failure in xfs_dir2_block_to_sf
   with large directory block sizes
 - fix Dave's email address in MAINTAINERS
 - cleanup calculation of freed inode cluster blocks
 - fix alignment of initial file allocations to match filesystem geometry
 - decouple in-memory and on-disk log format
 - introduce a common helper to calculate the number of filesystem
   blocks in an inode cluster
 - fixes for extent list locking
 - fix for off-by-one in xfs_attr3_rmt_verify
 - fix for missing destroy_work_on_stack in xfs_bmapi_allocate
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJS4C2sAAoJENaLyazVq6ZOF/UP/A/FmscsEbgz+KHtsg1UXP+g
 EMkrD8WOOzLnm7/GhkfvmmFLQrWrrNPmiP54MPJ/rxpfDb7rV60HgS0iHm13U+IR
 0uPyk2NIQKG/Sj6FHCrgn4oh19B0tmer6lLE32UPrlNM7w/0NRm32Ms/jKz7WNvc
 9Tqw3qNdtmP7leHG8EyD1ISzqUz6FNSC+qGyOTuBQUqG24LqsmZ1qD2Nw/HEz0ir
 1YCqS+cjj8c3WaWMsdjEFdCvEIKS/sMYM+ZihATIl/5ggwtVobP7PH/4cG8Biby3
 5wvcl3/jM+xjgGDC1YMQQeSADieus6ER4aZTjCvGLn9RajQTOBjP0QZTu2OHntTI
 kD/52DfYErthGijS2qJcqXy11AaYtWQU2yTZXuMpaACIM1DDSD4NyGFP99BzbBaX
 D4BgnC+vmfpbXO37PmophkeZwAvj+9K2BBG7X+g4sLynj/BtmvFCxIyK+SLHE3hN
 kn+Pn03yTw0VGgvj0krXSllbYKE0LyLElaA6h0LejQgcOZM0W6Q8qaj+RODLitbw
 HkQNKYCOMCzbdNu8rKAfup9UPZloiMukyMdMaw1MeCgCQSQLkZVxTegmVW9gzpIh
 QJEARDaOnKB/G/CLCwbPZR09DBpg3yBt/eHjt0VnV+z2x+u9DTInAp6f013g7E/W
 gU+vnBBPY1AYI8iiDraU
 =t8nB
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-v3.14-rc1' of git://oss.sgi.com/xfs/xfs

Pull xfs update from Ben Myers:
 "This is primarily bug fixes, many of which you already have.  New
  stuff includes a series to decouple the in-memory and on-disk log
  format, helpers in the area of inode clusters, and i_version handling.

  We decided to try to use more topic branches this release, so there
  are some merge commits in there on account of that.  I'm afraid I
  didn't do a good job of putting meaningful comments in the first
  couple of merges.  Sorry about that.  I think I have the hang of it
  now.

  For 3.14-rc1 there are fixes in the areas of remote attributes,
  discard, growfs, memory leaks in recovery, directory v2, quotas, the
  MAINTAINERS file, allocation alignment, extent list locking, and in
  xfs_bmapi_allocate.  There are cleanups in xfs_setsize_buftarg,
  removing unused macros, quotas, setattr, and freeing of inode
  clusters.  The in-memory and on-disk log format have been decoupled, a
  common helper to calculate the number of blocks in an inode cluster
  has been added, and handling of i_version has been pulled into the
  filesystems that use it.

   - cleanup in xfs_setsize_buftarg
   - removal of remaining unused flags for vop toss/flush/flushinval
   - fix for memory corruption in xfs_attrlist_by_handle
   - fix for out-of-date comment in xfs_trans_dqlockedjoin
   - fix for discard if range length is less than one block
   - fix for overrun of agfl buffer using growfs on v4 superblock
     filesystems
   - pull i_version handling out into the filesystems that use it
   - don't leak recovery items on error
   - fix for memory leak in xfs_dir2_node_removename
   - several cleanups for quotas
   - fix bad assertion in xfs_qm_vop_create_dqattach
   - cleanup for xfs_setattr_mode, and add xfs_setattr_time
   - fix quota assert in xfs_setattr_nonsize
   - fix an infinite loop when turning off group/project quota before
     user quota
   - fix for temporary buffer allocation failure in xfs_dir2_block_to_sf
     with large directory block sizes
   - fix Dave's email address in MAINTAINERS
   - cleanup calculation of freed inode cluster blocks
   - fix alignment of initial file allocations to match filesystem
     geometry
   - decouple in-memory and on-disk log format
   - introduce a common helper to calculate the number of filesystem
     blocks in an inode cluster
   - fixes for extent list locking
   - fix for off-by-one in xfs_attr3_rmt_verify
   - fix for missing destroy_work_on_stack in xfs_bmapi_allocate"

* tag 'xfs-for-linus-v3.14-rc1' of git://oss.sgi.com/xfs/xfs: (51 commits)
  xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
  xfs: fix off-by-one error in xfs_attr3_rmt_verify
  xfs: assert that we hold the ilock for extent map access
  xfs: use xfs_ilock_attr_map_shared in xfs_attr_list_int
  xfs: use xfs_ilock_attr_map_shared in xfs_attr_get
  xfs: use xfs_ilock_data_map_shared in xfs_qm_dqiterate
  xfs: use xfs_ilock_data_map_shared in xfs_qm_dqtobp
  xfs: take the ilock around xfs_bmapi_read in xfs_zero_remaining_bytes
  xfs: reinstate the ilock in xfs_readdir
  xfs: add xfs_ilock_attr_map_shared
  xfs: rename xfs_ilock_map_shared
  xfs: remove xfs_iunlock_map_shared
  xfs: no need to lock the inode in xfs_find_handle
  xfs: use xfs_icluster_size_fsb in xfs_imap
  xfs: use xfs_icluster_size_fsb in xfs_ifree_cluster
  xfs: use xfs_icluster_size_fsb in xfs_ialloc_inode_init
  xfs: use xfs_icluster_size_fsb in xfs_bulkstat
  xfs: introduce a common helper xfs_icluster_size_fsb
  xfs: get rid of XFS_IALLOC_BLOCKS macros
  xfs: get rid of XFS_INODE_CLUSTER_SIZE macros
  ...
2014-01-23 09:16:20 -08:00
Linus Torvalds bb1281f2aa Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual rocket science stuff from trivial.git"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  neighbour.h: fix comment
  sched: Fix warning on make htmldocs caused by wait.h
  slab: struct kmem_cache is protected by slab_mutex
  doc: Fix typo in USB Gadget Documentation
  of/Kconfig: Spelling s/one/once/
  mkregtable: Fix sscanf handling
  lp5523, lp8501: comment improvements
  thermal: rcar: comment spelling
  treewide: fix comments and printk msgs
  IXP4xx: remove '1 &&' from a condition check in ixp4xx_restart()
  Documentation: update /proc/uptime field description
  Documentation: Fix size parameter for snprintf
  arm: fix comment header and macro name
  asm-generic: uaccess: Spelling s/a ny/any/
  mtd: onenand: fix comment header
  doc: driver-model/platform.txt: fix a typo
  drivers: fix typo in DEVTMPFS_MOUNT Kconfig help text
  doc: Fix typo (acces_process_vm -> access_process_vm)
  treewide: Fix typos in printk
  drivers/gpu/drm/qxl/Kconfig: reformat the help text
  ...
2014-01-22 21:21:55 -08:00
Jaegeuk Kim bf39c00a9a f2fs: drop obsolete node page when it is truncated
If a node page is trucated, we'd better drop the page in the node_inode's page
cache for better memory footprint.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-23 08:04:21 +09:00
Andrew Gallagher 7678ac5061 fuse: support clients that don't implement 'open'
open/release operations require userspace transitions to keep track
of the open count and to perform any FS-specific setup.  However,
for some purely read-only FSs which don't need to perform any setup
at open/release time, we can avoid the performance overhead of
calling into userspace for open/release calls.

This patch adds the necessary support to the fuse kernel modules to prevent
open/release operations from hitting in userspace. When the client returns
ENOSYS, we avoid sending the subsequent release to userspace, and also
remember this so that future opens also don't trigger a userspace
operation.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-01-22 19:36:59 +01:00
Andrew Gallagher 451418fc92 fuse: don't invalidate attrs when not using atime
Various read operations (e.g. readlink, readdir) invalidate the cached
attrs for atime changes.  This patch adds a new function
'fuse_invalidate_atime', which checks for a read-only super block and
avoids the attr invalidation in that case.

Signed-off-by: Andrew Gallagher <andrewjcg@fb.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-01-22 19:36:58 +01:00
Miklos Szeredi 063ec1e595 fuse: fix SetPageUptodate() condition in STORE
As noticed by Coverity the "num != 0" condition never triggers.  Instead it
should check for a complete page.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-01-22 19:36:58 +01:00
Miklos Szeredi 28a625cbc2 fuse: fix pipe_buf_operations
Having this struct in module memory could Oops when if the module is
unloaded while the buffer still persists in a pipe.

Since sock_pipe_buf_ops is essentially the same as fuse_dev_pipe_buf_steal
merge them into nosteal_pipe_buf_ops (this is the same as
default_pipe_buf_ops except stealing the page from the buffer is not
allowed).

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org
2014-01-22 19:36:57 +01:00
Jaegeuk Kim 4ef51a8fcc f2fs: introduce NODE_MAPPING for code consistency
This patch adds NODE_MAPPING which is similar as META_MAPPING introduced by
Gu Zheng.

Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-22 18:41:08 +09:00
Gu Zheng 63f5384c9a f2fs: remove the orphan block page array
As the orphan_blocks may be max to 504, so it is not security
and rigorous to store such a large array in the kernel stack
as Dan Carpenter said.
In fact, grab_meta_page has locked the page in the page cache,
and we can use find_get_page() to fetch the page safely in the
downstream, so we can remove the page array directly.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-22 18:41:08 +09:00
Gu Zheng 9df27d982d f2fs: add help function META_MAPPING
Introduce help function META_MAPPING() to get the cache meta blocks'
address space.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-22 18:41:07 +09:00
Jaegeuk Kim e8dae60458 f2fs: move a branch for code redability
This patch moves a function in f2fs_delete_entry for code readability.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-22 18:41:07 +09:00
Jaegeuk Kim a18ff06340 f2fs: call mark_inode_dirty to flush dirty pages
If a dentry page is updated, we should call mark_inode_dirty to add the inode
into the dirty list, so that its dentry pages are flushed to the disk.
Otherwise, the inode can be evicted without flush.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-22 18:40:34 +09:00
Linus Torvalds df32e43a54 Merge branch 'akpm' (incoming from Andrew)
Merge first patch-bomb from Andrew Morton:

 - a couple of misc things

 - inotify/fsnotify work from Jan

 - ocfs2 updates (partial)

 - about half of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits)
  mm/migrate: remove unused function, fail_migrate_page()
  mm/migrate: remove putback_lru_pages, fix comment on putback_movable_pages
  mm/migrate: correct failure handling if !hugepage_migration_support()
  mm/migrate: add comment about permanent failure path
  mm, page_alloc: warn for non-blockable __GFP_NOFAIL allocation failure
  mm: compaction: reset scanner positions immediately when they meet
  mm: compaction: do not mark unmovable pageblocks as skipped in async compaction
  mm: compaction: detect when scanners meet in isolate_freepages
  mm: compaction: reset cached scanner pfn's before reading them
  mm: compaction: encapsulate defer reset logic
  mm: compaction: trace compaction begin and end
  memcg, oom: lock mem_cgroup_print_oom_info
  sched: add tracepoints related to NUMA task migration
  mm: numa: do not automatically migrate KSM pages
  mm: numa: trace tasks that fail migration due to rate limiting
  mm: numa: limit scope of lock for NUMA migrate rate limiting
  mm: numa: make NUMA-migrate related functions static
  lib/show_mem.c: show num_poisoned_pages when oom
  mm/hwpoison: add '#' to hwpoison_inject
  mm/memblock: use WARN_ONCE when MAX_NUMNODES passed as input parameter
  ...
2014-01-21 19:05:45 -08:00
wangweidong 048ed4b626 sctp: remove macros sctp_{lock|release}_sock
Redefined {lock|release}_sock to sctp_{lock|release}_sock for user space friendly
code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:41:36 -08:00
Linus Torvalds ff0bc6cc7f dlm for 3.14
This set includes a single change to speed up
 recovery times when using SCTP connections.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJS3rFtAAoJEDgbc8f8gGmqLDcP/RHy6yJD25Tqcww8CrE+wZj5
 b8sf9zsZR0tbT/L9LiB7F3gwoHoS5BUBs7vq8PUX/qAWLgw5dIL9ZqWBujN0M9pr
 AOIyAwQtY1U9jINunfYXc4EXfLHYO9pbW7ZJuieg+Tls5meb+werBEbnmNsETht8
 NYLcVYQ1VnxR6ohnTzUsheQ5lV8J5AKBYC53ZH32gU1Ua1d6xnxl0Fvq2gwM5uOg
 Xxo5ugpes5NZLwcUB+Smsf6Fqq3z3oyJkWWh3Vq8BJSduOcNDVd45HXiDM7D/LT9
 S6TJAbUH+nTesb4FOP7ew8OewxNJN0TbygJ4CVHvbiM3+7jGJi6esfurNyuATGQ5
 VJ18el9T46k5D9s309aa3oNdd1CUbAUJ4eEJU+5/nssuVaAt8H0ETfvme7uIXlCp
 Pyws08NDT3U91q6Xe1R1FTPFdu0DmqyM9uvMFrtNGMXQxqxXbwco70Y/3HJGhHxb
 Uk8mOtzjCvVN0gAvB5fB71qNBWBmVsO8M10TsCErbjXb7HJkBKQIq/O5eVRDIVGZ
 jBenSj6nK3ouQUTNFRas+6pQRR9Pa3m0WH4o9Gj2aCMp48XAc1TKd1ySxRuIplWQ
 JnJg3BOFpILOqwCNVPXX6TRiWJKdtFrCAm/Q484AChh5GQjhyPWeQg/hiNS8SxG7
 UXmHPpAUuoG7LcPfuabr
 =/Wic
 -----END PGP SIGNATURE-----

Merge tag 'dlm-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm

Pull dlm update from David Teigland:
 "A single change to speed up recovery times when using SCTP
  connections"

* tag 'dlm-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: set zero linger time on sctp socket
2014-01-21 17:42:55 -08:00
Linus Torvalds 2182c815f3 The main topics this time are allocation, in the form of Bob's
improvements when searching resource groups and several updates
 to quotas which should increase scalability. The quota changes
 follow on from those in the last merge window, and there will
 likely be further work to come in this area in due course.
 
 There are also a few patches which help to improve efficiency
 of adding entries into directories, and clean up some of that
 code.
 
 One on-disk change is included this time, which is to write some
 additional information which should be useful to fsck and
 also potentially for debugging.
 
 Other than that, its just a few small random bug fixes and
 clean ups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJS3RZUAAoJEMrg3m4a/8jS/C4QAKAoGbGH8UxG2fNRUWK94UBL
 lxxn0nUaQ7bwKpIbysEO6HHg2rqcfjRSdO/H6kfD51Fgg39RfIfvXF9woJw/5Is6
 1xcn3faBl3dQhIq81+lSKnEJnJbQ49dbIcgxt1FGqPI8ZY6st/CBvnvqh0TRAcvN
 7t0TYSPxmA/7BnwynstTbpK2l9uUisau8z12QzihJ6lRiR7QlfFSUq6Xs/ICSmkE
 LE2dC6gXMIIDyMpPX5Fg9NQNEAPKOzMwduUaHnv2vpuh1jkk3ObHg4mUkYtdoUZx
 Hbs1Ey5oqRiMs03I5wm1ayZg+6i/pT7HTM5vbrunTa21AMmJ0NTFdZ4ihVZqZwu4
 ynpXY0OT/U7CpFyx6FSKtlq0kzwx5A51mL9bpSlLU9pggJY6NHQdmSJ9fdLW9cdS
 Wwi2Br/0rBvIhm0ejy8kxK904DhqCGsH+RXbfOc8J4VBE4R/z5eMmBG9UqMm7c+7
 hO6FhpJo9QsNMe19A8IANzG2gT+PcpgrXs2GTS1QFGhFgn08+k3RLDT/SrGRxU6r
 MaCM9rms0SgMr2LX6MEPZdo3LJgBTNr1rKDEIzQ7BUXs3FHJ7U0swuHOTDXFVm51
 gQanPYmVvFV7aiom3ExUhegDefBu1jMnY26SVxxLAbWN1Ek2Ikg6rO6wx3ckXvYo
 SKY3qAewE5pNpHe7u6SS
 =/tv0
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw

Pull GFS2 updates from Steven Whitehouse:
 "The main topics this time are allocation, in the form of Bob's
  improvements when searching resource groups and several updates to
  quotas which should increase scalability.  The quota changes follow on
  from those in the last merge window, and there will likely be further
  work to come in this area in due course.

  There are also a few patches which help to improve efficiency of
  adding entries into directories, and clean up some of that code.

  One on-disk change is included this time, which is to write some
  additional information which should be useful to fsck and also
  potentially for debugging.

  Other than that, its just a few small random bug fixes and clean ups"

* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (24 commits)
  GFS2: revert "GFS2: d_splice_alias() can't return error"
  GFS2: Small cleanup
  GFS2: Don't use ENOBUFS when ENOMEM is the correct error code
  GFS2: Fix kbuild test robot reported warning
  GFS2: Move quota bitmap operations under their own lock
  GFS2: Clean up quota slot allocation
  GFS2: Only run logd and quota when mounted read/write
  GFS2: Use RCU/hlist_bl based hash for quotas
  GFS2: No need to invalidate pages for a dio read
  GFS2: Add initialization for address space in super block
  GFS2: Add hints to directory leaf blocks
  GFS2: For exhash conversion, only one block is needed
  GFS2: Increase i_writecount during gfs2_setattr_chown
  GFS2: Remember directory insert point
  GFS2: Consolidate transaction blocks calculation for dir add
  GFS2: Add directory addition info structure
  GFS2: Use only a single address space for rgrps
  GFS2: Use range based functions for rgrp sync/invalidation
  GFS2: Remove test which is always true
  GFS2: Remove gfs2_quota_change_host structure
  ...
2014-01-21 17:42:00 -08:00
Rik van Riel 34e431b0ae /proc/meminfo: provide estimated available memory
Many load balancing and workload placing programs check /proc/meminfo to
estimate how much free memory is available.  They generally do this by
adding up "free" and "cached", which was fine ten years ago, but is
pretty much guaranteed to be wrong today.

It is wrong because Cached includes memory that is not freeable as page
cache, for example shared memory segments, tmpfs, and ramfs, and it does
not include reclaimable slab memory, which can take up a large fraction
of system memory on mostly idle systems with lots of files.

Currently, the amount of memory that is available for a new workload,
without pushing the system into swap, can be estimated from MemFree,
Active(file), Inactive(file), and SReclaimable, as well as the "low"
watermarks from /proc/zoneinfo.

However, this may change in the future, and user space really should not
be expected to know kernel internals to come up with an estimate for the
amount of free memory.

It is more convenient to provide such an estimate in /proc/meminfo.  If
things change in the future, we only have to change it in one place.

Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Erik Mouw <erik.mouw_2@nxp.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:43 -08:00
Paul Gortmaker af52b040eb fs/ramfs: don't use module_init for non-modular core code
The ramfs is always built in.  It will never be modular, so using
module_init as an alias for __initcall is rather misleading.

Fix this up now, so that we can relocate module_init from init.h into
module.h in the future.  If we don't do this, we'd have to add module.h
to obviously non-modular code, and that would be a worse thing.

Note that direct use of __initcall is discouraged, vs.  one of the
priority categorized subgroups.  As __initcall gets mapped onto
device_initcall, our use of fs_initcall (which makes sense for fs code)
will thus change this registration from level 6-device to level 5-fs
(i.e. slightly earlier).  However no observable impact of that small
difference has been observed during testing, or is expected.

Also note that this change uncovers a missing semicolon bug in the
registration of the initcall.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Vladimir Davydov b5bd856a0c fs/super.c: fix WARN on alloc_super() fail path
On fail path alloc_super() calls destroy_super(), which issues a warning
if the sb's s_mounts list is not empty, in particular if it has not been
initialized.  That said s_mounts must be initialized in alloc_super()
before any possible failure, but currently it is initialized close to
the end of the function leading to a useless warning dumped to log if
either percpu_counter_init() or list_lru_init() fails.  Let's fix this.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Corey Minyard 4e4f9e66a7 fs/read_write.c:compat_readv(): remove bogus area verify
The compat_do_readv_writev() function was doing a verify_area on the
incoming iov, but the nr_segs value is not checked.  If someone passes
in a -1 for nr_segs, for instance, the function should return an EINVAL.
However, it returns a EFAULT because the verify_area fails because it is
checking an array of size MAX_UINT.  The check is bogus, anyway, because
the next check, compat_rw_copy_check_uvector(), will do all the
necessary checking, anyway.  The non-compat do_readv_writev() function
doesn't do this check, so I think it's safe to just remove the code.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Dan Carpenter 38316c8ab7 fs/compat_ioctl.c: fix an underflow issue (harmless)
We cap "nmsgs" at I2C_RDRW_IOCTL_MAX_MSGS (42) but the current code
allows negative values.  It's harmless but it makes my static checker
upset so I've made nsmgs unsigned.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Andrew Morton 0afaa12047 posix_acl: uninlining
Uninline vast tracts of nested inline functions in
include/linux/posix_acl.h.

This reduces the text+data+bss size of x86_64 allyesconfig vmlinux by
8026 bytes.

The patch also regularises the positioning of the EXPORT_SYMBOLs in
posix_acl.c.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Benny Halevy <bhalevy@primarydata.com>
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Yiwen Jiang 75f82eaa50 ocfs2: fix NULL pointer dereference when dismount and ocfs2rec simultaneously
2 nodes cluster, say Node A and Node B, mount the same ocfs2 volume, and
create a file 1.

Node A			Node B
open 1, get open lock
                        rm 1, and then add 1 to orphan_dir
storage link down,
o2hb_write_timeout
->o2quo_disk_timeout
->emergency_restart
                        at the moment, Node B dismount and do
			ocfs2rec simultaneously
                        1) ocfs2_dismount_volume
			->ocfs2_recovery_exit
			->wait_event(osb->recovery_event)
			->flush_workqueue(ocfs2_wq)
			2) ocfs2rec
			->queue_work(&journal->j_recovery_work)
                        ->ocfs2_recover_orphans
			->ocfs2_commit_truncate
                        ->queue_delayed_work(&osb->osb_truncate_log_wq)

In ocfs2_recovery_exit, it flushes workqueue and then releases system
inodes.  When doing ocfs2rec, it will call ocfs2_flush_truncate_log
which will try to get sys_root_inode, and NULL pointer dereference
occurs.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Signed-off-by: joyce <xuejiufei@huawei.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Tariq Saeed a2a3b39824 ocfs2: punch hole should return EINVAL if the length argument in ioctl is negative
An unreserve space ioctl OCFS2_IOC_UNRESVSP/64 should reject a negative
length.

Orabug:14789508

Signed-off-by: Tariq Saseed <tariq.x.saeed@oracle.com>
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Wei Yongjun 16eac4be46 ocfs2: fix sparse non static symbol warning
Fixes the following sparse warning:

  fs/ocfs2/stack_user.c:930:32: warning:
   symbol 'ocfs2_ls_ops' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Jie Liu 1ba2212bb3 ocfs2: adjust minlen with discard_granularity in the FITRIM ioctl
Adjust minlen with discard_granularity for FITRIM ioctl(2) if the given
minimum size in bytes is less than it because, discard granularity is
used to tell us that the minimum size of extent that can be discarded by
the storage device.

This is inspired by ext4 commit 5c2ed62fd4 ("ext4: Adjust minlen with
discard_granularity in the FITRIM ioctl") from Lukas Czerner.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Jie Liu aa89762c54 ocfs2: return EINVAL if the given range to discard is less than block size
For FITRIM ioctl(2), we should not keep silence if the given range
length ls less than a block size as there is no data blocks would be
discareded.  Hence it should return EINVAL instead.  This issue can be
verified via xfstests/generic/288 which is used for FITRIM argument
handling tests.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Jie Liu 19e8ac2721 ocfs2: return EOPNOTSUPP if the device does not support discard
For FITRIM ioctl(2), we should return EOPNOTSUPP to inform the user that
the storage device does not support discard if it is, otherwise return
success would confuse the user even though there is no free blocks were
trimmed at all.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Younger Liu 0a2fcd8988 ocfs2: remove redundant ocfs2_alloc_dinode_update_counts() and ocfs2_block_group_set_bits()
ocfs2_alloc_dinode_update_counts() and ocfs2_block_group_set_bits() are
already provided in suballoc.c.  So, the same functions in
move_extents.c are not needed any more.

Declare the functions in suballoc.h and remove redundant functions in
move_extents.c.

Signed-off-by: Younger Liu <liuyiyang@hisense.com>
Cc: Younger Liu <younger.liucn@gmail.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Goldwyn Rodrigues c994c2ebdb ocfs2: use the new DLM operation callbacks while requesting new lockspace
Attempt to use the new DLM operations.  If it is not supported, use the
traditional ocfs2_controld.

To exchange ocfs2 versioning, we use the LVB of the version dlm lock.
It first attempts to take the lock in EX mode (non-blocking).  If
successful (which means it is the first mount), it writes the version
number and downconverts to PR lock.  If it is unsuccessful, it reads the
version from the lock.

If this becomes the standard (with o2cb as well), it could simplify
userspace tools to check if the filesystem is mounted on other nodes.

Dan: Since ocfs2_protocol_version are two u8 values, the additional
checks with LONG* don't make sense.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Goldwyn Rodrigues 4150363033 ocfs2: framework for version LVB
Use the native DLM locks for version control negotiation.  Most of the
framework is taken from gfs2/lock_dlm.c

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Goldwyn Rodrigues 3e83415164 ocfs2: pass ocfs2_cluster_connection to ocfs2_this_node
This is done to differentiate between using and not using controld and
use the connection information accordingly.

We need to be backward compatible.  So, we use a new enum
ocfs2_connection_type to identify when controld is used and when it is
not.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Goldwyn Rodrigues 24aa338611 ocfs2: shift allocation ocfs2_live_connection to user_connect()
We perform this because the DLM recovery callbacks will require the
ocfs2_live_connection structure to record the node information when
dlm_new_lockspace() is updated (in the last patch of the series).

Before calling dlm_new_lockspace(), we need the structure ready for the
.recover_done() callback, which would set oc_this_node.  This is the
reason we allocate ocfs2_live_connection beforehand in user_connect().

[AKPM] rc initialization is not required because it assigned in case of
errors.  It will be cleared by compiler anyways.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reveiwed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Goldwyn Rodrigues 66e188fc31 ocfs2: add DLM recovery callbacks
These are the callbacks called by the fs/dlm code in case the membership
changes.  If there is a failure while/during calling any of these, the
DLM creates a new membership and relays to the rest of the nodes.

 - recover_prep() is called when DLM understands a node is down.
 - recover_slot() is called once all nodes have acknowledged
   recover_prep and recovery can begin.
 - recover_done() is called once the recovery is complete.  It returns
   the new membership.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Goldwyn Rodrigues c74a3bdd9b ocfs2: add clustername to cluster connection
This is an effort of removing ocfs2_controld.pcmk and getting ocfs2 DLM
handling up to the times with respect to DLM (>=4.0.1) and corosync
(2.3.x).  AFAIK, cman also is being phased out for a unified corosync
cluster stack.

fs/dlm performs all the functions with respect to fencing and node
management and provides the API's to do so for ocfs2.  For all future
references, DLM stands for fs/dlm code.

The advantages are:
 + No need to run an additional userspace daemon (ocfs2_controld)
 + No controld device handling and controld protocol
 + Shifting responsibilities of node management to DLM layer

For backward compatibility, we are keeping the controld handling code.
Once enough time has passed we can remove a significant portion of the
code.  This was tested by using the kernel with changes on older
unmodified tools.  The kernel used ocfs2_controld as expected, and
displayed the appropriate warning message.

This feature requires modification in the userspace ocfs2-tools.  The
changes can be found at: https://github.com/goldwynr/ocfs2-tools branch:
nocontrold Currently, not many checks are present in the userspace code,
but that would change soon.

This patch (of 6):

Add clustername to cluster connection.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Goldwyn Rodrigues ff8fb33522 ocfs2: remove versioning information
The versioning information is confusing for end-users.  The numbers are
stuck at 1.5.0 when the tools version have moved to 1.8.2.  Remove the
versioning system in the OCFS2 modules and let the kernel version be the
guide to debug issues.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Jan Kara 56b27cf603 fsnotify: remove pointless NULL initializers
We usually rely on the fact that struct members not specified in the
initializer are set to NULL.  So do that with fsnotify function pointers
as well.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Jan Kara 83c4c4b0a3 fsnotify: remove .should_send_event callback
After removing event structure creation from the generic layer there is
no reason for separate .should_send_event and .handle_event callbacks.
So just remove the first one.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Jan Kara 7053aee26a fsnotify: do not share events between notification groups
Currently fsnotify framework creates one event structure for each
notification event and links this event into all interested notification
groups.  This is done so that we save memory when several notification
groups are interested in the event.  However the need for event
structure shared between inotify & fanotify bloats the event structure
so the result is often higher memory consumption.

Another problem is that fsnotify framework keeps path references with
outstanding events so that fanotify can return open file descriptors
with its events.  This has the undesirable effect that filesystem cannot
be unmounted while there are outstanding events - a regression for
inotify compared to a situation before it was converted to fsnotify
framework.  For fanotify this problem is hard to avoid and users of
fanotify should kind of expect this behavior when they ask for file
descriptors from notified files.

This patch changes fsnotify and its users to create separate event
structure for each group.  This allows for much simpler code (~400 lines
removed by this patch) and also smaller event structures.  For example
on 64-bit system original struct fsnotify_event consumes 120 bytes, plus
additional space for file name, additional 24 bytes for second and each
subsequent group linking the event, and additional 32 bytes for each
inotify group for private data.  After the conversion inotify event
consumes 48 bytes plus space for file name which is considerably less
memory unless file names are long and there are several groups
interested in the events (both of which are uncommon).  Fanotify event
fits in 56 bytes after the conversion (fanotify doesn't care about file
names so its events don't have to have it allocated).  A win unless
there are four or more fanotify groups interested in the event.

The conversion also solves the problem with unmount when only inotify is
used as we don't have to grab path references for inotify events.

[hughd@google.com: fanotify: fix corruption preventing startup]
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Jan Kara e9fe69045b inotify: provide function for name length rounding
Rounding of name length when passing it to userspace was done in several
places.  Provide a function to do it and use it in all places.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Linus Torvalds d3bad75a6d Driver core / sysfs patches for 3.14-rc1
Here's the big driver core and sysfs patch set for 3.14-rc1.
 
 There's a lot of work here moving sysfs logic out into a "kernfs" to
 allow other subsystems to also have a virtual filesystem with the same
 attributes of sysfs (handle device disconnect, dynamic creation /
 removal  as needed / unneeded, etc.  This is primarily being done for
 the cgroups filesystem, but the goal is to also move debugfs to it when
 it is ready, solving all of the known issues in that filesystem as well.
 The code isn't completed yet, but all should be stable now (there is a
 big section that was reverted due to problems found when testing.)
 
 There's also some other smaller fixes, and a driver core addition that
 allows for a "collection" of objects, that the DRM people will be using
 soon (it's in this tree to make merges after -rc1 easier.)
 
 All of this has been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iEYEABECAAYFAlLdh0cACgkQMUfUDdst+ylv4QCfeDKDgLo4LsaBIIrFSxLoH/c7
 UUsAoMPRwA0h8wy+BQcJAg4H4J4maKj3
 =0pc0
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core / sysfs patches from Greg KH:
 "Here's the big driver core and sysfs patch set for 3.14-rc1.

  There's a lot of work here moving sysfs logic out into a "kernfs" to
  allow other subsystems to also have a virtual filesystem with the same
  attributes of sysfs (handle device disconnect, dynamic creation /
  removal as needed / unneeded, etc)

  This is primarily being done for the cgroups filesystem, but the goal
  is to also move debugfs to it when it is ready, solving all of the
  known issues in that filesystem as well.  The code isn't completed
  yet, but all should be stable now (there is a big section that was
  reverted due to problems found when testing)

  There's also some other smaller fixes, and a driver core addition that
  allows for a "collection" of objects, that the DRM people will be
  using soon (it's in this tree to make merges after -rc1 easier)

  All of this has been in linux-next with no reported issues"

* tag 'driver-core-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (113 commits)
  kernfs: associate a new kernfs_node with its parent on creation
  kernfs: add struct dentry declaration in kernfs.h
  kernfs: fix get_active failure handling in kernfs_seq_*()
  Revert "kernfs: fix get_active failure handling in kernfs_seq_*()"
  Revert "kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitq"
  Revert "kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep()"
  Revert "kernfs: remove KERNFS_REMOVED"
  Revert "kernfs: restructure removal path to fix possible premature return"
  Revert "kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove()"
  Revert "kernfs: remove kernfs_addrm_cxt"
  Revert "kernfs: make kernfs_get_active() block if the node is deactivated but not removed"
  Revert "kernfs: implement kernfs_{de|re}activate[_self]()"
  Revert "kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers"
  Revert "pci: use device_remove_file_self() instead of device_schedule_callback()"
  Revert "scsi: use device_remove_file_self() instead of device_schedule_callback()"
  Revert "s390: use device_remove_file_self() instead of device_schedule_callback()"
  Revert "sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()"
  Revert "kernfs: remove unnecessary NULL check in __kernfs_remove()"
  kernfs: remove unnecessary NULL check in __kernfs_remove()
  drivers/base: provide an infrastructure for componentised subsystems
  ...
2014-01-20 15:49:44 -08:00
Chris Fries 6c311ec6c2 f2fs: clean checkpatch warnings
Fixed a variety of trivial checkpatch warnings.  The only delta should
be some minor formatting on log strings that were split / too long.

Signed-off-by: Chris Fries <cfries@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-20 10:27:12 +09:00
J. Bruce Fields d57b9c9a99 GFS2: revert "GFS2: d_splice_alias() can't return error"
0d0d110720 asserts that "d_splice_alias()
can't return error unless it was given an IS_ERR(inode)".

That was true of the implementation of d_splice_alias, but this is
really a problem with d_splice_alias: at a minimum it should be able to
return -ELOOP in the case where inserting the given dentry would cause a
directory loop.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-18 09:50:53 +00:00
Linus Torvalds 48ba620aab Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace fixes from Eric Biederman:
 "This is a set of 3 regression fixes.

  This fixes /proc/mounts when using "ip netns add <netns>" to display
  the actual mount point.

  This fixes a regression in clone that broke lxc-attach.

  This fixes a regression in the permission checks for mounting /proc
  that made proc unmountable if binfmt_misc was in use.  Oops.

  My apologies for sending this pull request so late.  Al Viro gave
  interesting review comments about the d_path fix that I wanted to
  address in detail before I sent this pull request.  Unfortunately a
  bad round of colds kept from addressing that in detail until today.
  The executive summary of the review was:

  Al: Is patching d_path really sufficient?
      The prepend_path, d_path, d_absolute_path, and __d_path family of
      functions is a really mess.

  Me: Yes, patching d_path is really sufficient.  Yes, the code is mess.
      No it is not appropriate to rewrite all of d_path for a regression
      that has existed for entirely too long already, when a two line
      change will do"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  vfs: Fix a regression in mounting proc
  fork:  Allow CLONE_PARENT after setns(CLONE_NEWPID)
  vfs: In d_path don't call d_dname on a mount point
2014-01-17 17:29:36 -08:00
Tejun Heo db4aad209b kernfs: associate a new kernfs_node with its parent on creation
Once created, a kernfs_node is always destroyed by kernfs_put().
Since ba7443bc65 ("sysfs, kernfs: implement
kernfs_create/destroy_root()"), kernfs_put() depends on kernfs_root()
to locate the ino_ida.  kernfs_root() in turn depends on
kernfs_node->parent being set for !dir nodes.  This means that
kernfs_put() of a !dir node requires its ->parent to be initialized.

This leads to oops when a newly created !dir node is destroyed without
going through kernfs_add_one() or after failing kernfs_add_one()
before ->parent is set.  kernfs_root() invoked from kernfs_put() will
try to dereference NULL parent.

Fix it by moving parent association to kernfs_new_node() from
kernfs_add_one().  kernfs_new_node() now takes @parent instead of
@root and determines the root from the parent and also sets the new
node's parent properly.  @parent parameter is removed from
kernfs_add_one().  As there's no parent when creating the root node,
__kernfs_new_node() which takes @root as before and doesn't set the
parent is used in that case.

This ensures that a kernfs_node in any stage in its life has its
parent associated and thus can be put.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-17 11:50:07 -08:00
Bob Peterson 8b127d0494 GFS2: Small cleanup
This is a small cleanup to function gfs2_rgrp_go_lock so that it
uses rgd instead of its more complicated twin.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-16 14:22:12 +00:00
Steven Whitehouse ac3beb6a5d GFS2: Don't use ENOBUFS when ENOMEM is the correct error code
Al Viro has tactfully pointed out that we are using the incorrect
error code in some cases. This patch fixes that, and also removes
the (unused) return value for glock dumping.

>        * gfs2_iget() - ENOBUFS instead of ENOMEM.  ENOBUFS is
> "No buffer space available (POSIX.1 (XSI STREAMS option))" and since
> we don't support STREAMS it's probably fair game, but... what the hell?

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
2014-01-16 10:31:13 +00:00
Changman Lee c434cbc0ed f2fs: missing REQ_META and REQ_PRIO when sync_meta_pages(META_FLUSH)
Doing sync_meta_pages with META_FLUSH when checkpoint, we overide rw
using WRITE_FLUSH_FUA. At this time, we also should set
REQ_META|REQ_PRIO.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-16 17:28:35 +09:00
Jaegeuk Kim c33ec32692 f2fs: avoid f2fs_balance_fs call during pageout
This patch should resolve the following bug.

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
3.13.0-rc5.f2fs+ #6 Not tainted
---------------------------------------------------------
kswapd0/41 just changed the state of lock:
 (&sbi->gc_mutex){+.+.-.}, at: [<ffffffffa030503e>] f2fs_balance_fs+0xae/0xd0 [f2fs]
but this lock took another, RECLAIM_FS-READ-unsafe lock in the past:
 (&sbi->cp_rwsem){++++.?}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
Chain exists of:
  &sbi->gc_mutex --> &sbi->cp_mutex --> &sbi->cp_rwsem

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&sbi->cp_rwsem);
                               local_irq_disable();
                               lock(&sbi->gc_mutex);
                               lock(&sbi->cp_mutex);
  <Interrupt>
    lock(&sbi->gc_mutex);

 *** DEADLOCK ***

This bug is due to the f2fs_balance_fs call in f2fs_write_data_page.
If f2fs_write_data_page is triggered by wbc->for_reclaim via kswapd, it should
not call f2fs_balance_fs which tries to get a mutex grabbed by original syscall
flow.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-16 16:20:40 +09:00
Linus Torvalds 70b23ce347 fix data corruption on NFS writeback
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJS080OAAoJECvKgwp+S8JaIdUQAJKNZTzXKylUjUZty42t57Jh
 1qRrQeJ6ha+JVSpYX4jJz/mSzUdJdjoFg7J3O54OnVFj/CnlcY7GRZj3VMel9ijf
 uhlf8DcU6JsThcFK4Q6mqXtdAHDPkQ1jkQHLNCe7bow9AjCzHymAZWJix4YvEsXF
 zeJJURMqSaJeo/44MynnXyn/h5RRhg+5HWErhoFiVUzDzHR3RoQqmt3lPVVJkdj1
 iokHLMzGui2vs52vUJj2yx7m9kaoDx/6bJpqR61qHfk5S4wjLkUI+1ID8dsTNVF2
 4O3THb0nUDWx4wuJIxrAKoPiYjiemX1KmQXlUVr3IsfhDiiBbLyviVyn4aRaFIxV
 IRCVXCj1CWw+cFLeCA5E+/WvpxjLfKs4WNBxIqjes5YRPM4PLpU3MDiabssaUzHI
 0VPbU8TQ05hqH0wbs0hIgXyvED6yNn9d3sPHS2Lb5i2tp3E0FzVEoh2EH2jn8lmQ
 1DAdi+ezk9EiJs8AFiN6MSIBpAZosX3Nq+RTmYGKqLZMGnxlJ30YspNlipiBPFpC
 4xokkMZAZ0+wzpVabOMie36Rc/AaOAqiOjS1C6UIoOSrBTgtwWL7Ft2Da3SKb0KX
 XQhNWCHNYcgOn9/DDDmGxzwt6HsEzOIYinMwrG37LSass5KEvopssmiLCXn8wry+
 QXUoiFFFAPpg8iXaqj4X
 =AHdo
 -----END PGP SIGNATURE-----

Merge tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

Pull writeback fix from Wu Fengguang:
 "Fix data corruption on NFS writeback.

  It has been in linux-next for one month"

* tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Fix data corruption on NFS
2014-01-16 08:23:34 +07:00
Steven Whitehouse 1e3d36206b GFS2: Fix kbuild test robot reported warning
Well I don't get the same warning locally as the kbuild
robot, but I guess this should fix the problem, anyway.
Here is the warning:

head:   2d9e72303d
commit: ee2411a8db [19/20] GFS2: Clean up quota slot allocation
config: make ARCH=powerpc allmodconfig

All error/warnings:

   fs/gfs2/quota.c: In function 'gfs2_quota_init':
>> fs/gfs2/quota.c:1246:3: error: implicit declaration of function '__vmalloc' [-Werror=implicit-function-declaration]
      sdp->sd_quota_bitmap = __vmalloc(bm_size, GFP_NOFS, PAGE_KERNEL);
      ^
>> fs/gfs2/quota.c:1246:24: warning: assignment makes pointer from integer without a cast [enabled by default]
      sdp->sd_quota_bitmap = __vmalloc(bm_size, GFP_NOFS, PAGE_KERNEL);
                           ^
   fs/gfs2/quota.c: In function 'gfs2_quota_cleanup':
>> fs/gfs2/quota.c:1361:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
       vfree(sdp->sd_quota_bitmap);

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-15 12:57:25 +00:00
Andreas Rohner 70f2fe3a26 nilfs2: fix segctor bug that causes file system corruption
There is a bug in the function nilfs_segctor_collect, which results in
active data being written to a segment, that is marked as clean.  It is
possible, that this segment is selected for a later segment
construction, whereby the old data is overwritten.

The problem shows itself with the following kernel log message:

  nilfs_sufile_do_cancel_free: segment 6533 must be clean

Usually a few hours later the file system gets corrupted:

  NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0
  NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660)

The issue can be reproduced with a file system that is nearly full and
with the cleaner running, while some IO intensive task is running.
Although it is quite hard to reproduce.

This is what happens:

 1. The cleaner starts the segment construction
 2. nilfs_segctor_collect is called
 3. sc_stage is on NILFS_ST_SUFILE and segments are freed
 4. sc_stage is on NILFS_ST_DAT current segment is full
 5. nilfs_segctor_extend_segments is called, which
    allocates a new segment
 6. The new segment is one of the segments freed in step 3
 7. nilfs_sufile_cancel_freev is called and produces an error message
 8. Loop around and the collection starts again
 9. sc_stage is on NILFS_ST_SUFILE and segments are freed
    including the newly allocated segment, which will contain active
    data and can be allocated at a later time
10. A few hours later another segment construction allocates the
    segment and causes file system corruption

This can be prevented by simply reordering the statements.  If
nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
the freed segments are marked as dirty and cannot be allocated any more.

Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Reviewed-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-15 14:19:42 +07:00
Steven Whitehouse 2d9e72303d GFS2: Move quota bitmap operations under their own lock
Gradually, the global qd_lock is being used for less and less.
After this patch it will only be used for the per super block
list whose purpose is to allow syncing of changes back to the
master quota file from the local quota changes file. Fixing
up that process to make it more efficient will be the subject
of a later patch, however this patch removes another barrier
to doing that.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
2014-01-14 19:29:06 +00:00
Steven Whitehouse ee2411a8db GFS2: Clean up quota slot allocation
Quota slot allocation has historically used a vector of pages
and a set of homegrown find/test/set/clear bit functions. Since
the size of the bitmap is likely to be based on the default
qc file size, thats a couple of pages at most. So we ought
to be able to allocate that as a single chunk, with a vmalloc
fallback, just in case of memory fragmentation.

We are then able to use the kernel's own find/test/set/clear
bit functions, rather than rolling our own.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
2014-01-14 19:28:49 +00:00
Steven Whitehouse 8ad151c2ac GFS2: Only run logd and quota when mounted read/write
While investigating a rather strange bit of code in the quota
clean up function, I spotted that the reason for its existence
was that when remounting read only, we were not stopping the
quotad thread, and thus it was possible for it to still have
a reference to some of the quotas in that case.

This patch moves the logd and quota thread start and stop into
the make_fs_rw/ro functions, so that we now stop those threads
when mounted read only.

This means that quotad will always be stopped before we call
the quota clean up function, and we can thus dispose of the
(rather hackish) code that waits for it to give up its
reference on the quotas.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
2014-01-14 19:28:25 +00:00
Steven Whitehouse c754fbbb1b GFS2: Use RCU/hlist_bl based hash for quotas
Prior to this patch, GFS2 kept all the quotas for each
super block in a single linked list. This is rather slow
when there are large numbers of quotas.

This patch introduces a hlist_bl based hash table, similar
to the one used for glocks. The initial look up of the quota
is now lockless in the case where it is already cached,
although we still have to take the per quota spinlock in
order to bump the ref count. Either way though, this is a
big improvement on what was there before.

The qd_lock and the per super block list is preserved, for
the time being. However it is intended that since this is no
longer used for its original role, it should be possible to
shrink the number of items on that list in due course and
remove the requirement to take qd_lock in qd_get.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-01-14 19:27:56 +00:00
Steven Whitehouse 086352f1aa GFS2: No need to invalidate pages for a dio read
We recently fixed the writeback of pages prior to performing
direct i/o, however the initial fix was perhaps a bit heavy
handed. There is no need to invalidate pages if the direct i/o
is only a read, since they will be identical to what has been
flushed to disk anyway.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-14 19:20:49 +00:00
Tejun Heo bb305947bd kernfs: fix get_active failure handling in kernfs_seq_*()
When kernfs_seq_start() fails to obtain an active reference, it
returns ERR_PTR(-ENODEV).  kernfs_seq_stop() is then invoked with the
error pointer value; however, it still proceeds to invoke
kernfs_put_active() on the node leading to unbalanced put.

If kernfs_seq_stop() is called even after active ref failure, it
should skip invocation of @ops->seq_stop() and put_active.
Unfortunately, this is a bit complicated because active ref failure
isn't the only thing which may fail with ERR_PTR(-ENODEV).
@ops->seq_start/next() may also fail with the error value and
kernfs_seq_stop() doesn't have a way to tell apart those failures.

Work it around by factoring out the active part of kernfs_seq_stop()
into kernfs_seq_stop_active() and invoking it directly if
@ops->seq_start/next() fail with ERR_PTR(-ENODEV) and updating
kernfs_seq_stop() to skip kernfs_seq_stop_active() on
ERR_PTR(-ENODEV).  This is a bit nasty but ensures that the active put
is skipped iff get_active failed in kernfs_seq_start().

tj: This was originally committed as d92d2e6bd7 but got reverted by
    683bb2761f along with other kernfs self removal patches.
    However, this one is an independent fix and shouldn't have been
    reverted together.  Reinstate the change.  Sorry about the mess.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-14 08:49:22 -08:00
Changman Lee 499046ab2c f2fs: add delimiter to seperate name and value in debug phrase
Support for f2fs-tools/tools/f2stat to monitor
/sys/kernel/debug/f2fs/status

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-14 18:22:17 +09:00
Gu Zheng 17b692f60e f2fs: use spinlock rather than mutex for better speed
With the 2 previous changes, all the long time operations are moved out
of the protection region, so here we can use spinlock rather than mutex
(orphan_inode_mutex) for lower overhead.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-14 18:12:05 +09:00
Gu Zheng c1ef372572 f2fs: move alloc new orphan node out of lock protection region
Move alloc new orphan node out of lock protection region.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-14 18:12:04 +09:00
Gu Zheng 4531929e39 f2fs: move grabing orphan pages out of protection region
Move grabing orphan block page out of protection region, and grab all
the orphan block pages ahead.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: remove unnecessary code pointed by Chao Yu]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-14 18:11:20 +09:00
Yuan Zhong 5514f0aadd f2fs: remove the needless parameter of f2fs_wait_on_page_writeback
"boo sync" parameter is never referenced in f2fs_wait_on_page_writeback.
We should remove this parameter.

Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-14 17:45:54 +09:00
Greg Kroah-Hartman 683bb2761f Revert "kernfs: fix get_active failure handling in kernfs_seq_*()"
This reverts commit d92d2e6bd7.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:49:01 -08:00
Greg Kroah-Hartman 87da149343 Revert "kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitq"
This reverts commit ea1c472dfe.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:43:11 -08:00
Greg Kroah-Hartman 0890147fe0 Revert "kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep()"
This reverts commit a69d001cfc.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:39:52 -08:00
Greg Kroah-Hartman 798c75a0d4 Revert "kernfs: remove KERNFS_REMOVED"
This reverts commit ae34372eb8.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:36:03 -08:00
Greg Kroah-Hartman 4f4b1b6471 Revert "kernfs: restructure removal path to fix possible premature return"
This reverts commit 45a140e587.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:30:47 -08:00
Greg Kroah-Hartman 55f6e30d0a Revert "kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove()"
This reverts commit f601f9a2bf.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:27:16 -08:00
Greg Kroah-Hartman 7653fe9d6c Revert "kernfs: remove kernfs_addrm_cxt"
This reverts commit 99177a3411.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:20:56 -08:00
Greg Kroah-Hartman f4b3e631b3 Revert "kernfs: make kernfs_get_active() block if the node is deactivated but not removed"
This reverts commit 895a068a52.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:13:39 -08:00
Greg Kroah-Hartman 9b0925a6ff Revert "kernfs: implement kernfs_{de|re}activate[_self]()"
This reverts commit 9f010c2ad5.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:09:38 -08:00
Greg Kroah-Hartman a9f138b0e5 Revert "kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers"
This reverts commit 1ae06819c7.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:05:13 -08:00
Greg Kroah-Hartman a30f82b7eb Revert "sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()"
This reverts commit d1ba277e79.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 13:51:36 -08:00
Greg Kroah-Hartman ce9b499c9f Revert "kernfs: remove unnecessary NULL check in __kernfs_remove()"
This reverts commit 88533f990c.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 13:50:31 -08:00
Tejun Heo 88533f990c kernfs: remove unnecessary NULL check in __kernfs_remove()
895a068a52 ("kernfs: make kernfs_get_active() block if the node is
deactivated but not removed") added "struct kernfs_root *root =
kernfs_root(kn);" at the head of the function; however, the parameter
@kn is checked for later implying that the function may be called with
NULL.  This means that we may end up invoking kernfs_root() with NULL
which will oops.  None of the existing users invokes removal with NULL
@kn, so this bug doesn't actually trigger.

We can relocate kernfs_root() invocation after NULL check; however,
allowing NULL param tends to cause more confusion than actually
helping anything.  As there's no existing user, let's remove the
spurious NULL check.

This bug was detected by smatch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-11 15:40:19 -08:00
Tejun Heo d1ba277e79 sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()
All device_schedule_callback_owner() users are converted to use
device_remove_file_self().  Remove now unused
{sysfs|device}_schedule_callback_owner().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 16:03:19 -08:00
Linus Torvalds e2bc44706f xfs: bugfixes for 3.13-rc8
- fix off-by-one in xfs_attr3_rmt_verify
 - fix missing destroy_work_on_stack() in xfs_bmapi_allocate
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJS0ECWAAoJENaLyazVq6ZOgn0QAKSC/pkP4km+QbmL0R7SqSJH
 ZSSj16gIjR5lHlwI3PQzv5BgyEC9BcRDKWXN6dy+GHHuMtP4qYK8cLWFcyl7EysH
 HAyDBnaJVphXt23C5iIzk+iseNfRYXA2LOpYSH6qfhZ5bxEeYzQS42zL4YhxZrXq
 kzLHojcTLUx0IzJ+4oHn5AXSgPt+PXxNz3s+TU9virFnfSMlw2qYukxQtG49nbQr
 kQjNHgeTIBKzeHdlnxmv5Rd2bD//397w5aWXxmaUh8fk6Z7VJi40ALAG4Pks81HF
 +TEgMtF9/xTXdlwrYJDoHp++vUs6HANCX+wSAb4MdrBQvjh/USytK2WFwOeMyyR6
 L/iogfPXHHizTkoYSzPwPdEmCCFhzidvBEqNX68+ojlJnDtoart7IgkOcm9LvaQI
 j//u76CPRcd8tFh+1fDNaXn1ykJ6/CepSY13/yOnbpc7JoDbtqK2R8HFxdSlkDDg
 UooLF2AfQ6lX280cUWwV0flqGO6iTIM3Fw1mIq3z8X4usNn+bMnlOu/DUnCbF5bB
 YJCV4uT7f04w7oJqin9a7LHaHKRD56tWQun/OCEd7ZV/hJ1YRYlhhLfSdWdX7+SX
 oIawXJy7NvCPQLaTwycD3h2gDlaxw17GAc9rA3AcCknxBsgNosv1ETQnEPC4iIAq
 QsVal7p6oMLZ/qx6mvX7
 =Xpq3
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-v3.13-rc8' of git://oss.sgi.com/xfs/xfs

Pull xfs bugfixes from Ben Myers:
 "Here we have a bugfix for an off-by-one in the remote attribute
  verifier that results in a forced shutdown which you can hit with v5
  superblock by creating a 64k xattr, and a fix for a missing
  destroy_work_on_stack() in the allocation worker.

  It's a bit late, but they are both fairly straightforward"

* tag 'xfs-for-linus-v3.13-rc8' of git://oss.sgi.com/xfs/xfs:
  xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
  xfs: fix off-by-one error in xfs_attr3_rmt_verify
2014-01-11 06:33:03 +07:00
Tejun Heo 1ae06819c7 kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers
Sometimes it's necessary to implement a node which wants to delete
nodes including itself.  This isn't straightforward because of kernfs
active reference.  While a file operation is in progress, an active
reference is held and kernfs_remove() waits for all such references to
drain before completing.  For a self-deleting node, this is a deadlock
as kernfs_remove() ends up waiting for an active reference that itself
is sitting on top of.

This currently is worked around in the sysfs layer using
sysfs_schedule_callback() which makes such removals asynchronous.
While it works, it's rather cumbersome and inherently breaks
synchronicity of the operation - the file operation which triggered
the operation may complete before the removal is finished (or even
started) and the removal may fail asynchronously.  If a removal
operation is immmediately followed by another operation which expects
the specific name to be available (e.g. removal followed by rename
onto the same name), there's no way to make the latter operation
reliable.

The thing is there's no inherent reason for this to be asynchrnous.
All that's necessary to do this synchronous is a dedicated operation
which drops its own active ref and deactivates self.  This patch
implements kernfs_remove_self() and its wrappers in sysfs and driver
core.  kernfs_remove_self() is to be called from one of the file
operations, drops the active ref and deactivates using
__kernfs_deactivate_self(), removes the self node, and restores active
ref to the dead node using __kernfs_reactivate_self() so that the ref
is balanced afterwards.  __kernfs_remove() is updated so that it takes
an early exit if the target node is already fully removed so that the
active ref restored by kernfs_remove_self() after removal doesn't
confuse the deactivation path.

This makes implementing self-deleting nodes very easy.  The normal
removal path doesn't even need to be changed to use
kernfs_remove_self() for the self-deleting node.  The method can
invoke kernfs_remove_self() on itself before proceeding the normal
removal path.  kernfs_remove() invoked on the node by the normal
deletion path will simply be ignored.

This will replace sysfs_schedule_callback().  A subtle feature of
sysfs_schedule_callback() is that it collapses multiple invocations -
even if multiple removals are triggered, the removal callback is run
only once.  An equivalent effect can be achieved by testing the return
value of kernfs_remove_self() - only the one which gets %true return
value should proceed with actual deletion.  All other instances of
kernfs_remove_self() will wait till the enclosing kernfs operation
which invoked the winning instance of kernfs_remove_self() finishes
and then return %false.  This trivially makes all users of
kernfs_remove_self() automatically show correct synchronous behavior
even when there are multiple concurrent operations - all "echo 1 >
delete" instances will finish only after the whole operation is
completed by one of the instances.

v2: For !CONFIG_SYSFS, dummy version kernfs_remove_self() was missing
    and sysfs_remove_file_self() had incorrect return type.  Fix it.
    Reported by kbuild test bot.

v3: Updated to use __kernfs_{de|re}activate_self().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 14:01:05 -08:00
Tejun Heo 9f010c2ad5 kernfs: implement kernfs_{de|re}activate[_self]()
This patch implements four functions to manipulate deactivation state
- deactivate, reactivate and the _self suffixed pair.  A new fields
kernfs_node->deact_depth is added so that concurrent and nested
deactivations are handled properly.  kernfs_node->hash is moved so
that it's paired with the new field so that it doesn't increase the
size of kernfs_node.

A kernfs user's lock would normally nest inside active ref but during
removal the user may want to perform kernfs_remove() while holding the
said lock, which would introduce a reverse locking dependency.  This
function can be used to break such reverse dependency by allowing
deactivation step to performed separately outside user's critical
section.

This will also be used implement kernfs_remove_self().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:51:21 -08:00
Tejun Heo 895a068a52 kernfs: make kernfs_get_active() block if the node is deactivated but not removed
Currently, kernfs_get_active() fails if the target node is
deactivated.  This is fine as a node always gets removed after
deactivation; however, we're gonna add reactivation so the assumption
won't hold.  It'd be incorrect for kernfs_get_active() to fail for a
node which was deactivated only temporarily.

This patch makes kernfs_get_active() block if the node is deactivated
but not removed.  If the node gets reactivated (not yet implemented),
it will be retried and succeed.  If the node gets removed, it will be
woken up and fail.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:48:08 -08:00
Tejun Heo 99177a3411 kernfs: remove kernfs_addrm_cxt
kernfs_addrm_cxt and the accompanying kernfs_addrm_start/finish() were
added because there were operations which should be performed outside
kernfs_mutex after adding and removing kernfs_nodes.  The necessary
operations were recorded in kernfs_addrm_cxt and performed by
kernfs_addrm_finish(); however, after the recent changes which
relocated deactivation and unmapping so that they're performed
directly during removal, the only operation kernfs_addrm_finish()
performs is kernfs_put(), which can be moved inside the removal path
too.

This patch moves the kernfs_put() of the base ref to __kernfs_remove()
and remove kernfs_addrm_cxt and kernfs_addrm_start/finish().

* kernfs_add_one() is updated to grab and release the parent's active
  ref and kernfs_mutex itself.  kernfs_get/put_active() and
  kernfs_addrm_start/finish() invocations around it are removed from
  all users.

* __kernfs_remove() puts an unlinked node directly instead of chaining
  it to kernfs_addrm_cxt.  Its callers are updated to grab and release
  kernfs_mutex instead of calling kernfs_addrm_start/finish() around
  it.

v2: Updated to fit the v2 restructuring of removal path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:48:08 -08:00
Tejun Heo f601f9a2bf kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove()
kernfs_unmap_bin_file() is supposed to unmap all memory mappings of
the target file before kernfs_remove() finishes; however, it currently
is being called from kernfs_addrm_finish() and has the same race
problem as the original implementation of deactivation when there are
multiple removers - only the remover which snatches the node to its
addrm_cxt->removed list is guaranteed to wait for its completion
before returning.

It can be fixed by moving kernfs_unmap_bin_file() invocation from
kernfs_addrm_finish() to __kernfs_remove().  The function may be
called multiple times but that shouldn't do any harm.

We end up dropping kernfs_mutex in the removal loop and the node may
be removed inbetween by someone else.  kernfs_unlink_sibling() is
updated to test whether the node has already been removed and return
accordingly.  __kernfs_remove() in turn performs post-unlinking
cleanup only if it actually unlinked the node.

KERNFS_HAS_MMAP test is moved out of the unmap function into
__kernfs_remove() so that we don't unlock kernfs_mutex unnecessarily.
While at it, drop the now meaningless "bin" qualifier from the
function name.

v2: Rewritten to fit the v2 restructuring of removal path.  HAS_MMAP
    test relocated.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:48:08 -08:00
Tejun Heo 45a140e587 kernfs: restructure removal path to fix possible premature return
The recursive nature of kernfs_remove() means that, even if
kernfs_remove() is not allowed to be called multiple times on the same
node, there may be race conditions between removal of parent and its
descendants.  While we can claim that kernfs_remove() shouldn't be
called on one of the descendants while the removal of an ancestor is
in progress, such rule is unnecessarily restrictive and very difficult
to enforce.  It's better to simply allow invoking kernfs_remove() as
the caller sees fit as long as the caller ensures that the node is
accessible.

The current behavior in such situations is broken.  Whoever enters
removal path first takes the node off the hierarchy and then
deactivates.  Following removers either return as soon as it notices
that it's not the first one or can't even find the target node as it
has already been removed from the hierarchy.  In both cases, the
following removers may finish prematurely while the nodes which should
be removed and drained are still being processed by the first one.

This patch restructures so that multiple removers, whether through
recursion or direction invocation, always follow the following rules.

* When there are multiple concurrent removers, only one puts the base
  ref.

* Regardless of which one puts the base ref, all removers are blocked
  until the target node is fully deactivated and removed.

To achieve the above, removal path now first deactivates the subtree,
drains it and then unlinks one-by-one.  __kernfs_deactivate() is
called directly from __kernfs_removal() and drops and regrabs
kernfs_mutex for each descendant to drain active refs.  As this means
that multiple removers can enter __kernfs_deactivate() for the same
node, the function is updated so that it can handle multiple
deactivators of the same node - only one actually deactivates but all
wait till drain completion.

The restructured removal path guarantees that a removed node gets
unlinked only after the node is deactivated and drained.  Combined
with proper multiple deactivator handling, this guarantees that any
invocation of kernfs_remove() returns only after the node itself and
all its descendants are deactivated, drained and removed.

v2: Draining separated into a separate loop (used to be in the same
    loop as unlink) and done from __kernfs_deactivate().  This is to
    allow exposing deactivation as a separate interface later.

    Root node removal was broken in v1 patch.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:48:08 -08:00
Tejun Heo ae34372eb8 kernfs: remove KERNFS_REMOVED
KERNFS_REMOVED is used to mark half-initialized and dying nodes so
that they don't show up in lookups and deny adding new nodes under or
renaming it; however, its role overlaps those of deactivation and
removal from rbtree.

It's necessary to deny addition of new children while removal is in
progress; however, this role considerably intersects with deactivation
- KERNFS_REMOVED prevents new children while deactivation prevents new
file operations.  There's no reason to have them separate making
things more complex than necessary.

KERNFS_REMOVED is also used to decide whether a node is still visible
to vfs layer, which is rather redundant as equivalent determination
can be made by testing whether the node is on its parent's children
rbtree or not.

This patch removes KERNFS_REMOVED.

* Instead of KERNFS_REMOVED, each node now starts its life
  deactivated.  This means that we now use both atomic_add() and
  atomic_sub() on KN_DEACTIVATED_BIAS, which is INT_MIN.  The compiler
  generates an overflow warnings when negating INT_MIN as the negation
  can't be represented as a positive number.  Nothing is actually
  broken but let's bump BIAS by one to avoid the warnings for archs
  which negates the subtrahend..

* KERNFS_REMOVED tests in add and rename paths are replaced with
  kernfs_get/put_active() of the target nodes.  Due to the way the add
  path is structured now, active ref handling is done in the callers
  of kernfs_add_one().  This will be consolidated up later.

* kernfs_remove_one() is updated to deactivate instead of setting
  KERNFS_REMOVED.  This removes deactivation from kernfs_deactivate(),
  which is now renamed to kernfs_drain().

* kernfs_dop_revalidate() now tests RB_EMPTY_NODE(&kn->rb) instead of
  KERNFS_REMOVED and KERNFS_REMOVED test in kernfs_dir_pos() is
  dropped.  A node which is removed from the children rbtree is not
  included in the iteration in the first place.  This means that a
  node may be visible through vfs a bit longer - it's now also visible
  after deactivation until the actual removal.  This slightly enlarged
  window difference doesn't make any difference to the userland.

* Sanity check on KERNFS_REMOVED in kernfs_put() is replaced with
  checks on the active ref.

* Some comment style updates in the affected area.

v2: Reordered before removal path restructuring.  kernfs_active()
    dropped and kernfs_get/put_active() used instead.  RB_EMPTY_NODE()
    used in the lookup paths.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:44:25 -08:00
Tejun Heo a69d001cfc kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep()
There currently are two mechanisms gating active ref lockdep
annotations - KERNFS_LOCKDEP flag and KERNFS_ACTIVE_REF type mask.
The former disables lockdep annotations in kernfs_get/put_active()
while the latter disables all of kernfs_deactivate().

While KERNFS_ACTIVE_REF also behaves as an optimization to skip the
deactivation step for non-file nodes, the benefit is marginal and it
needlessly diverges code paths.  Let's drop KERNFS_ACTIVE_REF and use
KERNFS_LOCKDEP in kernfs_deactivate() too.

While at it, add a test helper kernfs_lockdep() to test KERNFS_LOCKDEP
flag so that it's more convenient and the related code can be compiled
out when not enabled.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:44:25 -08:00
Tejun Heo ea1c472dfe kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitq
kernfs_node->u.completion is used to notify deactivation completion
from kernfs_put_active() to kernfs_deactivate().  We now allow
multiple racing removals of the same node and the current removal
scheme is no longer correct - kernfs_remove() invocation may return
before the node is properly deactivated if it races against another
removal.  The removal path will be restructured to address the issue.

To help such restructure which requires supporting multiple waiters,
this patch replaces kernfs_node->u.completion with
kernfs_root->deactivate_waitq.  This makes deactivation event
notifications share a per-root waitqueue_head; however, the wait path
is quite cold and this will also allow shaving one pointer off
kernfs_node.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:44:25 -08:00
Tejun Heo d92d2e6bd7 kernfs: fix get_active failure handling in kernfs_seq_*()
When kernfs_seq_start() fails to obtain an active reference, it
returns ERR_PTR(-ENODEV).  kernfs_seq_stop() is then invoked with the
error pointer value; however, it still proceeds to invoke
kernfs_put_active() on the node leading to unbalanced put.

If kernfs_seq_stop() is called even after active ref failure, it
should skip invocation of @ops->seq_stop() and put_active.
Unfortunately, this is a bit complicated because active ref failure
isn't the only thing which may fail with ERR_PTR(-ENODEV).
@ops->seq_start/next() may also fail with the error value and
kernfs_seq_stop() doesn't have a way to tell apart those failures.

Work it around by factoring out the active part of kernfs_seq_stop()
into kernfs_seq_stop_active() and invoking it directly if
@ops->seq_start/next() fail with ERR_PTR(-ENODEV) and updating
kernfs_seq_stop() to skip kernfs_seq_stop_active() on
ERR_PTR(-ENODEV).  This is a bit nasty but ensures that the active put
is skipped iff get_active failed in kernfs_seq_start().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:44:25 -08:00
Chuansheng Liu 1f4a63bf01 xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to
call destroy_work_on_stack() which frees the debug object to pair
with INIT_WORK_ONSTACK().

Signed-off-by: Liu, Chuansheng <chuansheng.liu@intel.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 6f96b3063c)
2014-01-10 12:39:38 -06:00
Jie Liu bba719b500 xfs: fix off-by-one error in xfs_attr3_rmt_verify
With CRC check is enabled, if trying to set an attributes value just
equal to the maximum size of XATTR_SIZE_MAX would cause the v3 remote
attr write verification procedure failure, which would yield the back
trace like below:

<snip>
XFS (sda7): Internal error xfs_attr3_rmt_write_verify at line 191 of file fs/xfs/xfs_attr_remote.c
<snip>
Call Trace:
[<ffffffff816f0042>] dump_stack+0x45/0x56
[<ffffffffa0d99c8b>] xfs_error_report+0x3b/0x40 [xfs]
[<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffffa0d99ce5>] xfs_corruption_error+0x55/0x80 [xfs]
[<ffffffffa0dbef6b>] xfs_attr3_rmt_write_verify+0x14b/0x1a0 [xfs]
[<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d96edd>] _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffff81184cda>] ? vm_map_ram+0x31a/0x460
[<ffffffff81097230>] ? wake_up_state+0x20/0x20
[<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d9726b>] xfs_buf_iorequest+0x6b/0xc0 [xfs]
[<ffffffffa0d97315>] xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d97906>] xfs_bwrite+0x46/0x80 [xfs]
[<ffffffffa0dbfa94>] xfs_attr_rmtval_set+0x334/0x490 [xfs]
[<ffffffffa0db84aa>] xfs_attr_leaf_addname+0x24a/0x410 [xfs]
[<ffffffffa0db8893>] xfs_attr_set_int+0x223/0x470 [xfs]
[<ffffffffa0db8b76>] xfs_attr_set+0x96/0xb0 [xfs]
[<ffffffffa0db13b2>] xfs_xattr_set+0x42/0x70 [xfs]
[<ffffffff811df9b2>] generic_setxattr+0x62/0x80
[<ffffffff811e0213>] __vfs_setxattr_noperm+0x63/0x1b0
[<ffffffff81307afe>] ? evm_inode_setxattr+0xe/0x10
[<ffffffff811e0415>] vfs_setxattr+0xb5/0xc0
[<ffffffff811e054e>] setxattr+0x12e/0x1c0
[<ffffffff811c6e82>] ? final_putname+0x22/0x50
[<ffffffff811c708b>] ? putname+0x2b/0x40
[<ffffffff811cc4bf>] ? user_path_at_empty+0x5f/0x90
[<ffffffff811bdfd9>] ? __sb_start_write+0x49/0xe0
[<ffffffff81168589>] ? vm_mmap_pgoff+0x99/0xc0
[<ffffffff811e07df>] SyS_setxattr+0x8f/0xe0
[<ffffffff81700c2d>] system_call_fastpath+0x1a/0x1f

Tests:
    setfattr -n user.longxattr -v `perl -e 'print "A"x65536'` testfile

This patch fix it to check the remote EA size is greater than the
XATTR_SIZE_MAX rather than more than or equal to it, because it's
valid if the specified EA value size is equal to the limitation as
per VFS setxattr interface.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 85dd0707f0)
2014-01-10 12:38:41 -06:00
Dominique Martinet fb89b45cdf 9P: introduction of a new cache=mmap model.
- Add cache=mmap option
 - Make mmap read-write while keeping it as synchronous as possible
 - Build writeback fid on mmap creation if it is writable

Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2014-01-10 09:20:51 -06:00
Ben Myers bf3964c188 Merge branch 'xfs-extent-list-locking-fixes' into for-next
A set of fixes which makes sure we are taking the ilock whenever accessing the
extent list.  This was associated with "Access to block zero" messages which
may result in extent list corruption.
2014-01-09 16:03:18 -06:00
Ben Myers dc16b186bb Merge branch 'xfs-misc' into for-next
A bugfix for an off-by-one in the remote attribute verifier, and a fix for a
missing destroy_work_on_stack() in the allocation worker.
2014-01-09 15:58:59 -06:00
Chuansheng Liu 6f96b3063c xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to
call destroy_work_on_stack() which frees the debug object to pair
with INIT_WORK_ONSTACK().

Signed-off-by: Liu, Chuansheng <chuansheng.liu@intel.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2014-01-09 15:50:31 -06:00
Steven Whitehouse 39849d6946 GFS2: Add initialization for address space in super block
Spotted by Andy Price. This should fix the odd messages from
lockdep caused by 70d4ee94b3

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Price <anprice@redhat.com>
2014-01-09 16:34:04 +00:00
Steven Whitehouse 01bcb0dedb GFS2: Add hints to directory leaf blocks
This patch adds four new fields to directory leaf blocks.
The intent is not to use them in the kernel itself, although
perhaps we may be able to use them as hints at some later date,
but instead to provide more information for debug/fsck use.

One new field adds a pointer to the inode to which the leaf
belongs. This can be useful if the pointer to the leaf block
has become corrupt, as it will allow us to know which inode
this block should be associated with. This field is set when
the leaf is created and never changed over its lifetime.

The second field is a "distance from the hash table" field.
The meaning is as follows:
 0  = An old leaf in which this value has not been set
 1  = This leaf is pointed to directly from the hash table
 2+ = This leaf is part of a chain, pointed to by another leaf
      block, the value gives the position in the chain.

The third and fourth fields combine to give a time stamp of
the most recent directory insertion or deletion from this
leaf block. The time stamp is not updated when a new leaf
block is chained from the current one. The code is currently
written such that the timestamp on the dir inode will match
that of the leaf block for the most recent insertion/deletion.

For backwards compatibility, any of these new fields which is
zero should be considered to be "unknown".

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-08 12:14:57 +00:00
Steven Whitehouse 22b5a6c0c0 GFS2: For exhash conversion, only one block is needed
For most cases, only a single new block is needed when we reach
the point of converting from stuffed to exhash directory. The
exception being when the file name is so long that it will not
fit within the new leaf block.

So this patch adds a simple test for that situation so that we
do not need to request the full reservation size in this case.

Potentially we could calculate more accurately the value to use
in other cases too, but that is much more complicated to do and
it is doubtful that the benefit would outweigh the extra cost
in code complexity.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-08 11:05:29 +00:00
Jaegeuk Kim b1c57c1caa f2fs: add a sysfs entry to control max_victim_search
Previously during SSR and GC, the maximum number of retrials to find a victim
segment was hard-coded by MAX_VICTIM_SEARCH, 4096 by default.

This number makes an effect on IO locality, when SSR mode is activated, which
results in performance fluctuation on some low-end devices.

If max_victim_search = 4, the victim will be searched like below.
("D" represents a dirty segment, and "*" indicates a selected victim segment.)

 D1 D2 D3 D4 D5 D6 D7 D8 D9
[   *       ]
      [   *    ]
            [         * ]
	                [ ....]

This patch adds a sysfs entry to control the number dynamically through:
  /sys/fs/f2fs/$dev/max_victim_search

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-08 13:45:08 +09:00
Jaegeuk Kim fb5566da91 f2fs: improve write performance under frequent fsync calls
When considering a bunch of data writes with very frequent fsync calls, we
are able to think the following performance regression.

N: Node IO, D: Data IO, IO scheduler: cfq

Issue    pending IOs
	 D1 D2 D3 D4
 D1         D2 D3 D4 N1
 D2            D3 D4 N1 N2
 N1            D3 D4 N2 D1
 --> N1 can be selected by cfq becase of the same priority of N and D.
     Then D3 and D4 would be delayed, resuling in performance degradation.

So, when processing the fsync call, it'd better give higher priority to data IOs
than node IOs by assigning WRITE and WRITE_SYNC respectively.
This patch improves the random wirte performance with frequent fsync calls by up
to 10%.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-08 11:16:20 +09:00
Masanari Iida 8faaaead62 treewide: fix comments and printk msgs
This patch fixed several typo in printk from various
part of kernel source.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-01-07 15:06:07 +01:00