alistair23-linux/fs
Eric Ren 439a36b8ef ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a precess
already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that are
invoked directly by vfs code.  For instance:

  const struct inode_operations ocfs2_file_iops = {
      .permission     = ocfs2_permission,
      .get_acl        = ocfs2_iop_get_acl,
      .set_acl        = ocfs2_iop_set_acl,
  };

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):

  do_sys_open
   may_open
    inode_permission
     ocfs2_permission
      ocfs2_inode_lock() <=== first time
       generic_permission
        get_acl
         ocfs2_iop_get_acl
  	ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two of
ocfs2_inode_lock().  Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of
the remote EX lock request.  Another hand, the recursive cluster lock
(the second one) will be blocked in in __ocfs2_cluster_lock() because of
OCFS2_LOCK_BLOCKED.  But, the downconvert never complete, why? because
there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.

1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track
   of the processes' pid who has taken the cluster lock of this lock
   resource;

2. introduce a new flag for ocfs2_inode_lock_full:
   OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for
   us if we've got cluster lock.

3. export a helper: ocfs2_is_locked_by_me() is used to check if we have
   got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs
routines can call into each other.

The performance penalty of processing the holder list should only be
seen at a few cases where the tracking logic is used, such as get/set
acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real
world, as far as I can see, including permission check,
(get|set)_(acl|attr), and the gfs2 code also do so.

[sfr@canb.auug.org.au remove some inlines]
Link: http://lkml.kernel.org/r/20170117100948.11657-2-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:27 -08:00
..
9p Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
adfs
affs vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
afs afs: Use core kernel UUID generation 2017-02-10 16:34:17 +00:00
autofs4 Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2016-12-17 19:16:12 -08:00
befs befs: add NFS export support 2016-12-22 11:25:24 +00:00
bfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
btrfs Merge branch 'for-4.11/next' into for-4.11/linus-merge 2017-02-17 14:08:19 -07:00
cachefiles
ceph ceph: fix bad endianness handling in parse_reply_info_extra 2017-01-18 17:58:45 +01:00
cifs CIFS: Allow to switch on encryption with seal mount option 2017-02-01 16:46:37 -06:00
coda vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
configfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
cramfs
crypto fscrypt: properly declare on-stack completion 2017-02-06 23:45:28 -05:00
debugfs debugfs: add debugfs_lookup() 2017-02-02 10:20:16 -07:00
devpts
dlm ktime: Get rid of ktime_equal() 2016-12-25 17:21:23 +01:00
ecryptfs vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
efivarfs
efs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
exofs locking/atomic, kref: Add kref_read() 2017-01-14 11:37:18 +01:00
exportfs
ext2 dax: fix build warnings with FS_DAX and !FS_IOMAP 2017-01-24 16:26:14 -08:00
ext4 mm, dax: change pmd_fault() to take only vmf parameter 2017-02-22 16:41:26 -08:00
f2fs Various cleanups for the file system encryption feature. 2017-02-20 18:22:31 -08:00
fat
freevxfs
fscache fscache: Fix dead object requeue 2017-01-31 13:23:09 -05:00
fuse Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-02-20 13:23:30 -08:00
gfs2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-02-22 10:15:09 -08:00
hfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
hfsplus Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
hostfs vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
hpfs
hugetlbfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
isofs Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block 2016-12-13 10:19:16 -08:00
jbd2 For this cycle we add support for the shutdown ioctl, which is 2017-02-20 18:24:39 -08:00
jffs2 vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
jfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
kernfs kernfs: handle null pointers while printing node name and path 2017-02-10 16:02:26 +01:00
lockd netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
minix vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
ncpfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
nfs pNFS: Fix a reference leak in _pnfs_return_layout 2017-01-26 15:50:41 -05:00
nfs_common netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
nfsd driver core patches for 4.11-rc1 2017-02-22 11:44:32 -08:00
nilfs2 block: Use pointer to backing_dev_info from request_queue 2017-02-02 08:20:48 -07:00
nls
notify Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-02-21 07:44:03 -08:00
ntfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ocfs2 ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock 2017-02-22 16:41:27 -08:00
omfs
openpromfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
orangefs Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2016-12-17 19:16:12 -08:00
overlayfs ovl: fix possible use after free on redirect dir lookup 2017-01-18 15:19:54 +01:00
proc Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2017-02-21 12:49:56 -08:00
pstore pstore: Check for prz allocation in walker 2017-02-13 10:25:52 -08:00
qnx4
qnx6
quota Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2016-12-19 08:23:53 -08:00
ramfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
reiserfs Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2016-12-17 19:16:12 -08:00
romfs romfs: use different way to generate fsid for BLOCK or MTD 2017-01-24 16:26:14 -08:00
squashfs Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2016-12-17 19:16:12 -08:00
sysfs
sysv vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
tracefs
ubifs Various cleanups for the file system encryption feature. 2017-02-20 18:22:31 -08:00
udf udf: simplify udf_ioctl() 2017-02-03 16:24:18 +01:00
ufs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
xfs mm, dax: change pmd_fault() to take only vmf parameter 2017-02-22 16:41:26 -08:00
aio.c aio: fix lock dep warning 2017-01-14 19:31:40 -05:00
anon_inodes.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
attr.c
bad_inode.c bad_inode: add missing i_op initializers 2016-12-09 11:57:43 +01:00
binfmt_aout.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
binfmt_elf.c fs/binfmt: Convert obsolete cputime type to nsecs 2017-02-01 09:13:51 +01:00
binfmt_elf_fdpic.c fs/binfmt: Convert obsolete cputime type to nsecs 2017-02-01 09:13:51 +01:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c Merge branch 'for-4.11/next' into for-4.11/linus-merge 2017-02-17 14:08:19 -07:00
buffer.c clean_bdev_aliases: Prevent cleaning blocks that are not in block range 2017-01-02 09:35:14 -07:00
char_dev.c
compat.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
compat_binfmt_elf.c fs/binfmt: Convert obsolete cputime type to nsecs 2017-02-01 09:13:51 +01:00
compat_ioctl.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
coredump.c coredump: Ensure proper size of sparse core files 2017-01-14 19:32:40 -05:00
dax.c mm, dax: change pmd_fault() to take only vmf parameter 2017-02-22 16:41:26 -08:00
dcache.c mnt: Protect the mountpoint hashtable with mount_lock 2017-01-10 13:34:43 +13:00
dcookies.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
direct-io.c do_direct_IO: Use inode->i_blkbits to compute block count to be cleaned 2017-01-10 13:29:54 -07:00
drop_caches.c
eventfd.c
eventpoll.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
exec.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
fcntl.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
fhandle.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
file.c
file_table.c constify alloc_file() 2016-12-05 19:01:16 -05:00
filesystems.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
fs-writeback.c fs/fs-writeback.c: remove redundant if check 2016-12-12 18:55:08 -08:00
fs_pin.c
fs_struct.c
inode.c
internal.h Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-12-17 18:44:00 -08:00
ioctl.c vfs: call vfs_clone_file_range() under freeze protection 2016-12-16 11:02:54 +01:00
iomap.c fs: break out of iomap_file_buffered_write on fatal signals 2017-02-03 14:13:19 -08:00
Kconfig dax: fix build warnings with FS_DAX and !FS_IOMAP 2017-01-24 16:26:14 -08:00
Kconfig.binfmt docs: fix locations of several documents that got moved 2016-10-24 08:12:35 -02:00
libfs.c libfs: Modify mount_pseudo_xattr to be clear it is not a userspace mount 2017-01-10 13:34:55 +13:00
locks.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
Makefile logfs: remove from tree 2016-12-14 23:48:11 -05:00
mbcache.c mbcache: document that "find" functions only return reusable entries 2016-12-03 15:55:01 -05:00
mount.h vfs: add path_is_mountpoint() helper 2016-12-03 20:51:35 -05:00
mpage.c fs: Add helper to clean bdev aliases under a bh and use it 2016-11-04 14:34:47 -06:00
namei.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
namespace.c mnt: Protect the mountpoint hashtable with mount_lock 2017-01-10 13:34:43 +13:00
no-block.c
nsfs.c net: add an ioctl to get a socket network namespace 2016-10-31 10:56:36 -04:00
open.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
pipe.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
pnode.c reorganize do_make_slave() 2016-12-16 16:30:49 -05:00
pnode.h
posix_acl.c tmpfs: clear S_ISGID when setting posix ACLs 2017-01-10 01:29:48 -05:00
proc_namespace.c
read_write.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
readdir.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
select.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
seq_file.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
signalfd.c
splice.c vfs: fix uninitialized flags in splice_to_pipe() 2017-02-16 09:09:02 -08:00
stack.c
stat.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
statfs.c vfs: misc struct path constification 2016-12-05 19:03:49 -05:00
super.c block: Use pointer to backing_dev_info from request_queue 2017-02-02 08:20:48 -07:00
sync.c
timerfd.c timerfd: Protect the might cancel mechanism proper 2017-02-10 11:15:09 +01:00
userfaultfd.c userfaultfd: fix SIGBUS resulting from false rwsem wakeups 2017-01-24 16:26:14 -08:00
utimes.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
xattr.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00