1
0
Fork 0
remarkable-linux/fs/ocfs2
Gang He 9cb1674008 ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE
commit ff26cc10ae upstream.

If we can't get inode lock immediately in the function
ocfs2_inode_lock_with_page() when reading a page, we should not return
directly here, since this will lead to a softlockup problem when the
kernel is configured with CONFIG_PREEMPT is not set.  The method is to
get a blocking lock and immediately unlock before returning, this can
avoid CPU resource waste due to lots of retries, and benefits fairness
in getting lock among multiple nodes, increase efficiency in case
modifying the same file frequently from multiple nodes.

The softlockup crash (when set /proc/sys/kernel/softlockup_panic to 1)
looks like:

  Kernel panic - not syncing: softlockup: hung tasks
  CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  Call Trace:
    <IRQ>
    dump_stack+0x5c/0x82
    panic+0xd5/0x21e
    watchdog_timer_fn+0x208/0x210
    __hrtimer_run_queues+0xcc/0x200
    hrtimer_interrupt+0xa6/0x1f0
    smp_apic_timer_interrupt+0x34/0x50
    apic_timer_interrupt+0x96/0xa0
    </IRQ>
   RIP: 0010:unlock_page+0x17/0x30
   RSP: 0000:ffffaf154080bc88 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
   RAX: dead000000000100 RBX: fffff21e009f5300 RCX: 0000000000000004
   RDX: dead0000000000ff RSI: 0000000000000202 RDI: fffff21e009f5300
   RBP: 0000000000000000 R08: 0000000000000000 R09: ffffaf154080bb00
   R10: ffffaf154080bc30 R11: 0000000000000040 R12: ffff993749a39518
   R13: 0000000000000000 R14: fffff21e009f5300 R15: fffff21e009f5300
    ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2]
    ocfs2_readpage+0x41/0x2d0 [ocfs2]
    filemap_fault+0x12b/0x5c0
    ocfs2_fault+0x29/0xb0 [ocfs2]
    __do_fault+0x1a/0xa0
    __handle_mm_fault+0xbe8/0x1090
    handle_mm_fault+0xaa/0x1f0
    __do_page_fault+0x235/0x4b0
    trace_do_page_fault+0x3c/0x110
    async_page_fault+0x28/0x30
   RIP: 0033:0x7fa75ded638e
   RSP: 002b:00007ffd6657db18 EFLAGS: 00010287
   RAX: 000055c7662fb700 RBX: 0000000000000001 RCX: 000055c7662fb700
   RDX: 0000000000001770 RSI: 00007fa75e909000 RDI: 000055c7662fb700
   RBP: 0000000000000003 R08: 000000000000000e R09: 0000000000000000
   R10: 0000000000000483 R11: 00007fa75ded61b0 R12: 00007fa75e90a770
   R13: 000000000000000e R14: 0000000000001770 R15: 0000000000000000

About performance improvement, we can see the testing time is reduced,
and CPU utilization decreases, the detailed data is as follows.  I ran
multi_mmap test case in ocfs2-test package in a three nodes cluster.

Before applying this patch:
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   2754 ocfs2te+  20   0  170248   6980   4856 D 80.73 0.341   0:18.71 multi_mmap
   1505 root      rt   0  222236 123060  97224 S 2.658 6.015   0:01.44 corosync
      5 root      20   0       0      0      0 S 1.329 0.000   0:00.19 kworker/u8:0
     95 root      20   0       0      0      0 S 1.329 0.000   0:00.25 kworker/u8:1
   2728 root      20   0       0      0      0 S 0.997 0.000   0:00.24 jbd2/sda1-33
   2721 root      20   0       0      0      0 S 0.664 0.000   0:00.07 ocfs2dc-3C8CFD4
   2750 ocfs2te+  20   0  142976   4652   3532 S 0.664 0.227   0:00.28 mpirun

  ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared
  Tests with "-b 4096 -C 32768"
  Thu Dec 28 14:44:52 CST 2017
  multi_mmap..................................................Passed.
  Runtime 783 seconds.

After apply this patch:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   2508 ocfs2te+  20   0  170248   6804   4680 R 54.00 0.333   0:55.37 multi_mmap
    155 root      20   0       0      0      0 S 2.667 0.000   0:01.20 kworker/u8:3
     95 root      20   0       0      0      0 S 2.000 0.000   0:01.58 kworker/u8:1
   2504 ocfs2te+  20   0  142976   4604   3480 R 1.667 0.225   0:01.65 mpirun
      5 root      20   0       0      0      0 S 1.000 0.000   0:01.36 kworker/u8:0
   2482 root      20   0       0      0      0 S 1.000 0.000   0:00.86 jbd2/sda1-33
    299 root       0 -20       0      0      0 S 0.333 0.000   0:00.13 kworker/2:1H
    335 root       0 -20       0      0      0 S 0.333 0.000   0:00.17 kworker/1:1H
    535 root      20   0   12140   7268   1456 S 0.333 0.355   0:00.34 haveged
   1282 root      rt   0  222284 123108  97224 S 0.333 6.017   0:01.33 corosync

  ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared
  Tests with "-b 4096 -C 32768"
  Thu Dec 28 15:04:12 CST 2017
  multi_mmap..................................................Passed.
  Runtime 487 seconds.

Link: http://lkml.kernel.org/r/1514447305-30814-1-git-send-email-ghe@suse.com
Fixes: 1cce4df04f ("ocfs2: do not lock/unlock() inode DLM lock")
Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Eric Ren <zren@suse.com>
Acked-by: alex chen <alex.chen@huawei.com>
Acked-by: piaojun <piaojun@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-22 15:43:52 +01:00
..
cluster ocfs2: o2hb: revert hb threshold to keep compatible 2017-07-05 14:40:29 +02:00
dlm ocfs2: fix cluster hang after a node dies 2017-11-24 08:33:42 +01:00
dlmfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
Kconfig
Makefile ocfs2: disable BUG assertions in reading blocks 2016-06-24 17:23:52 -07:00
acl.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
acl.h ocfs2: fix posix_acl_create deadlock 2016-05-12 15:52:50 -07:00
alloc.c ocfs2: fstrim: Fix start offset of first cluster group during fstrim 2017-11-08 10:08:32 +01:00
alloc.h ocfs2: retry on ENOSPC if sufficient space in truncate log 2016-08-02 17:31:41 -04:00
aops.c fs: add i_blocksize() 2017-06-14 15:06:00 +02:00
aops.h ocfs2: fix ip_unaligned_aio deadlock with dio work queue 2016-03-25 16:37:42 -07:00
blockcheck.c ocfs2: kill endianness abuses in blockcheck.c 2012-05-29 23:28:35 -04:00
blockcheck.h
buffer_head_io.c Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block 2016-07-26 15:03:07 -07:00
buffer_head_io.h
dcache.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
dcache.h ocfs2: revert iput deferring code in ocfs2_drop_dentry_lock 2014-04-03 16:20:55 -07:00
dir.c ocfs2: fix not enough credit panic 2016-11-11 08:12:37 -08:00
dir.h VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
dlmglue.c ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE 2018-02-22 15:43:52 +01:00
dlmglue.h ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock 2017-10-21 17:21:36 +02:00
export.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-26 17:22:07 -07:00
export.h
extent_map.c ocfs2: neaten do_error, ocfs2_error and ocfs2_abort 2015-09-04 16:54:41 -07:00
extent_map.h
file.c ocfs2: should wait dio before inode lock in ocfs2_setattr() 2017-11-24 08:33:42 +01:00
file.h ocfs2: prepare some interfaces used in append direct io 2015-02-16 17:56:04 -08:00
filecheck.c ocfs2: sysfile interfaces for online file check 2016-03-22 15:36:02 -07:00
filecheck.h ocfs2: sysfile interfaces for online file check 2016-03-22 15:36:02 -07:00
heartbeat.c
heartbeat.h
inode.c ocfs2: fix improper handling of return errno 2016-05-26 15:35:44 -07:00
inode.h ocfs2: fix undefined struct variable in inode.h 2016-10-07 18:46:26 -07:00
ioctl.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
ioctl.h
journal.c ocfs2: improve recovery performance 2016-07-26 16:19:19 -07:00
journal.h jbd2: add support for avoiding data writes during transaction commits 2016-04-24 00:56:07 -04:00
localalloc.c ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local 2016-03-25 16:37:42 -07:00
localalloc.h ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters 2014-02-06 13:48:51 -08:00
locks.c ocfs2: fix flock panic issue 2015-12-29 17:45:49 -08:00
locks.h
mmap.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
mmap.h
move_extents.c fs: Replace CURRENT_TIME with current_time() for inode timestamps 2016-09-27 21:06:21 -04:00
move_extents.h
namei.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
namei.h ocfs2: do not include dio entry in case of orphan scan 2015-11-05 19:34:48 -08:00
ocfs1_fs_compat.h
ocfs2.h ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock 2017-10-21 17:21:36 +02:00
ocfs2_fs.h ocfs2: fix comment in struct ocfs2_extended_slot 2016-05-19 19:12:14 -07:00
ocfs2_ioctl.h
ocfs2_lockid.h
ocfs2_lockingver.h
ocfs2_trace.h switch generic_file_splice_read() to use of ->read_iter() 2016-10-05 18:23:56 -04:00
quota.h quota: constify qtree_fmt_operations structures 2016-01-04 10:58:35 +01:00
quota_global.c quota: use time64_t internally 2016-06-19 18:09:31 +02:00
quota_local.c ocfs2: neaten do_error, ocfs2_error and ocfs2_abort 2015-09-04 16:54:41 -07:00
refcounttree.c fs: Replace CURRENT_TIME with current_time() for inode timestamps 2016-09-27 21:06:21 -04:00
refcounttree.h ocfs2: fix NULL pointer dereference in ocfs2_duplicate_clusters_by_page 2013-08-13 17:57:49 -07:00
reservations.c ocfs2: make resv_lock spinlock static 2015-02-10 14:30:29 -08:00
reservations.h
resize.c ocfs2: solve a problem of crossing the boundary in updating backups 2016-03-25 16:37:42 -07:00
resize.h
slot_map.c ocfs2: clean up an unneeded goto in ocfs2_put_slot() 2016-05-19 19:12:14 -07:00
slot_map.h
stack_o2cb.c ocfs2: avoid a pointless delay in o2cb_cluster_check() 2015-04-14 16:48:57 -07:00
stack_user.c ocfs2: ensure that dlm lockspace is created by kernel module 2016-08-02 17:31:41 -04:00
stackglue.c ocfs2: fix crash caused by stale lvb with fsdlm plugin 2017-01-19 20:17:59 +01:00
stackglue.h ocfs2: fix crash caused by stale lvb with fsdlm plugin 2017-01-19 20:17:59 +01:00
suballoc.c ocfs2: fix double unlock in case retry after free truncate log 2016-09-19 15:36:17 -07:00
suballoc.h ocfs2: rollback alloc_dinode counts when ocfs2_block_group_set_bits() failed 2014-04-03 16:20:56 -07:00
super.c fs/ocfs2/super: remove deprecated create_singlethread_workqueue() 2016-10-07 18:46:26 -07:00
super.h ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local 2016-03-25 16:37:42 -07:00
symlink.c vfs: Remove {get,set,remove}xattr inode operations 2016-10-07 21:48:36 -04:00
symlink.h ocfs: simplify symlink handling 2012-05-29 23:28:40 -04:00
sysfile.c ocfs2: avoid system inode ref confusion by adding mutex lock 2014-04-03 16:20:57 -07:00
sysfile.h
uptodate.c ocfs2: remove NULL assignments on static 2014-06-04 16:53:53 -07:00
uptodate.h
xattr.c fs: Replace CURRENT_TIME with current_time() for inode timestamps 2016-09-27 21:06:21 -04:00
xattr.h ocfs2: fix posix_acl_create deadlock 2016-05-12 15:52:50 -07:00