1
0
Fork 0
alistair23-linux/fs
Shakeel Butt d46eb14b73 fs: fsnotify: account fsnotify metadata to kmemcg
Patch series "Directed kmem charging", v8.

The Linux kernel's memory cgroup allows limiting the memory usage of the
jobs running on the system to provide isolation between the jobs.  All
the kernel memory allocated in the context of the job and marked with
__GFP_ACCOUNT will also be included in the memory usage and be limited
by the job's limit.

The kernel memory can only be charged to the memcg of the process in
whose context kernel memory was allocated.  However there are cases
where the allocated kernel memory should be charged to the memcg
different from the current processes's memcg.  This patch series
contains two such concrete use-cases i.e.  fsnotify and buffer_head.

The fsnotify event objects can consume a lot of system memory for large
or unlimited queues if there is either no or slow listener.  The events
are allocated in the context of the event producer.  However they should
be charged to the event consumer.  Similarly the buffer_head objects can
be allocated in a memcg different from the memcg of the page for which
buffer_head objects are being allocated.

To solve this issue, this patch series introduces mechanism to charge
kernel memory to a given memcg.  In case of fsnotify events, the memcg
of the consumer can be used for charging and for buffer_head, the memcg
of the page can be charged.  For directed charging, the caller can use
the scope API memalloc_[un]use_memcg() to specify the memcg to charge
for all the __GFP_ACCOUNT allocations within the scope.

This patch (of 2):

A lot of memory can be consumed by the events generated for the huge or
unlimited queues if there is either no or slow listener.  This can cause
system level memory pressure or OOMs.  So, it's better to account the
fsnotify kmem caches to the memcg of the listener.

However the listener can be in a different memcg than the memcg of the
producer and these allocations happen in the context of the event
producer.  This patch introduces remote memcg charging API which the
producer can use to charge the allocations to the memcg of the listener.

There are seven fsnotify kmem caches and among them allocations from
dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and
inotify_inode_mark_cachep happens in the context of syscall from the
listener.  So, SLAB_ACCOUNT is enough for these caches.

The objects from fsnotify_mark_connector_cachep are not accounted as
they are small compared to the notification mark or events and it is
unclear whom to account connector to since it is shared by all events
attached to the inode.

The allocations from the event caches happen in the context of the event
producer.  For such caches we will need to remote charge the allocations
to the listener's memcg.  Thus we save the memcg reference in the
fsnotify_group structure of the listener.

This patch has also moved the members of fsnotify_group to keep the size
same, at least for 64 bit build, even with additional member by filling
the holes.

[shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it]
  Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:30 -07:00
..
9p get rid of 'opened' argument of ->atomic_open() - part 3 2018-07-12 10:04:20 -04:00
adfs adfs: don't put inodes into icache 2018-08-03 16:03:33 -04:00
affs affs: fix potential memory leak when parsing option 'prefix' 2018-05-28 12:36:41 +02:00
afs Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-08-15 15:04:25 -07:00
autofs autofs: fix slab out of bounds read in getname_kernel() 2018-07-14 11:11:09 -07:00
befs fix a series of Documentation/ broken file name references 2018-06-15 18:10:01 -03:00
bfs bfs_add_entry: pass name/len as qstr pointer 2018-05-22 14:27:50 -04:00
btrfs btrfs: readpages() should submit IO as read-ahead 2018-08-17 16:20:29 -07:00
cachefiles cachefiles: Wait rather than BUG'ing on "Unexpected object collision" 2018-07-25 14:49:00 +01:00
ceph Merge branch 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-08-13 19:58:36 -07:00
cifs smb3/cifs fixes (including 8 for stable). Improved tracing, stats, snapshots (previous version mounts work now), performance (compounding enabled for statfs) 2018-08-13 22:32:11 -07:00
coda vfs: change inode times to use struct timespec64 2018-06-05 16:57:31 -07:00
configfs configfs: fix registered group removal 2018-07-17 06:14:07 -07:00
cramfs vfs/y2038: inode timestamps conversion to timespec64 2018-06-15 07:31:07 +09:00
crypto f2fs-for-4.18-rc1 2018-06-11 10:16:13 -07:00
debugfs Revert "debugfs: inode: debugfs_create_dir uses mode permission from parent" 2018-06-12 20:52:16 -07:00
devpts devpts: comment devpts_mntget() 2018-03-14 13:31:23 +01:00
dlm treewide: Use array_size() in vmalloc() 2018-06-12 16:19:22 -07:00
ecryptfs Merge branch 'fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into aio-base 2018-05-26 09:16:25 +02:00
efivarfs efivars: Call guid_parse() against guid_t type of variable 2018-07-22 14:13:44 +02:00
efs
exofs exofs: use bio_clone_fast in _write_mirror 2018-07-24 14:43:20 -06:00
exportfs ovl: do not try to reconnect a disconnected origin dentry 2018-04-12 12:04:49 +02:00
ext2 dax: remove VM_MIXEDMAP for fsdax and device dax 2018-08-17 16:20:27 -07:00
ext4 ext4: readpages() should submit IO as read-ahead 2018-08-17 16:20:29 -07:00
f2fs mpage: mpage_readpages() should submit IO as read-ahead 2018-08-17 16:20:29 -07:00
fat fat: fix memory allocation failure handling of match_strdup() 2018-07-21 12:50:46 -07:00
freevxfs freevxfs_lookup(): use d_splice_alias() 2018-05-22 14:27:51 -04:00
fscache fscache: Fix reference overput in fscache_attach_object() error handling 2018-07-25 14:49:00 +01:00
fuse Merge branch 'work.mkdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-08-13 20:25:58 -07:00
gfs2 gfs2 4.19 merge 2018-08-15 22:40:03 -07:00
hfs new helper: inode_fake_hash() 2018-08-03 16:03:32 -04:00
hfsplus vfs/y2038: inode timestamps conversion to timespec64 2018-06-15 07:31:07 +09:00
hostfs vfs: discard ATTR_ATTR_FLAG 2018-08-17 16:20:28 -07:00
hpfs fs/hpfs: extend gmt_to_local() conversion to 64-bit times 2018-08-17 16:20:27 -07:00
hugetlbfs Merge branch 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-08-13 19:58:36 -07:00
isofs isofs: fix potential memory leak in mount option parsing 2018-04-16 09:47:41 +02:00
jbd2 jbd2: replace current_kernel_time64 with ktime equivalent 2018-07-29 15:51:47 -04:00
jffs2 jffs2: use unsigned 32-bit timstamps consistently 2018-07-18 16:44:01 +02:00
jfs Just one jfs patch for 4.19 2018-08-15 22:47:23 -07:00
kernfs kernfs: allow creating kernfs objects with arbitrary uid/gid 2018-07-20 23:44:35 -07:00
lockd net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
minix minix_lookup: use d_splice_alias() 2018-05-22 14:27:52 -04:00
nfs Merge branch 'work.mkdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-08-13 20:25:58 -07:00
nfs_common net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
nfsd IMA: don't propagate opened through the entire thing 2018-07-12 10:04:19 -04:00
nilfs2 do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
nls
notify fs: fsnotify: account fsnotify metadata to kmemcg 2018-08-17 16:20:30 -07:00
ntfs ntfs: mft: remove VLA usage 2018-08-17 16:20:27 -07:00
ocfs2 ocfs2: make several functions and variables static (and some const) 2018-08-17 16:20:28 -07:00
omfs omfs_lookup(): report IO errors, use d_splice_alias() 2018-05-22 14:27:58 -04:00
openpromfs openpromfs: switch to d_splice_alias() 2018-05-22 14:27:57 -04:00
orangefs orangefs: remove redundant pointer orangefs_inode 2018-08-14 12:07:14 -04:00
overlayfs vfs/y2038: inode timestamps conversion to timespec64 2018-06-15 07:31:07 +09:00
proc fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps* 2018-07-14 11:11:09 -07:00
pstore pstore: add zstd compression support 2018-08-03 18:12:18 -07:00
qnx4 qnx4_lookup: use d_splice_alias() 2018-05-22 14:27:52 -04:00
qnx6 qnx6_lookup: switch to d_splice_alias() 2018-05-22 14:27:54 -04:00
quota quota: Cleanup list iteration in dqcache_shrink_scan() 2018-06-20 11:04:26 +02:00
ramfs
reiserfs reiserfs: fix buffer overflow with long warning messages 2018-07-14 11:11:10 -07:00
romfs romfs_lookup: switch to d_splice_alias() 2018-05-22 14:27:55 -04:00
squashfs Squashfs: Compute expected length from inode size rather than block length 2018-08-02 09:34:02 -07:00
sysfs SCSI misc on 20180815 2018-08-15 22:06:26 -07:00
sysv sysv_lookup: use d_splice_alias() 2018-05-22 14:27:53 -04:00
tracefs
ubifs vfs/y2038: inode timestamps conversion to timespec64 2018-06-15 07:31:07 +09:00
udf Merge branch 'work.mkdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-08-13 20:25:58 -07:00
ufs fs/ufs: use ktime_get_real_seconds for sb and cg timestamps 2018-08-17 16:20:27 -07:00
xfs dax: remove VM_MIXEDMAP for fsdax and device dax 2018-08-17 16:20:27 -07:00
Kconfig autofs: remove left-over autofs4 stubs 2018-06-11 08:22:34 -07:00
Kconfig.binfmt kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
Makefile autofs: remove left-over autofs4 stubs 2018-06-11 08:22:34 -07:00
aio.c Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-08-13 20:56:23 -07:00
anon_inodes.c anon_inode_getfile(): switch to alloc_file_pseudo() 2018-07-12 10:04:27 -04:00
attr.c fs: Fix attr.c kernel-doc 2018-07-03 16:44:45 -04:00
bad_inode.c get rid of 'opened' argument of ->atomic_open() - part 3 2018-07-12 10:04:20 -04:00
binfmt_aout.c exec: introduce finalize_exec() before start_thread() 2018-04-11 10:28:37 -07:00
binfmt_elf.c Here are the main MIPS changes for 4.19. 2018-08-13 19:24:32 -07:00
binfmt_elf_fdpic.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
binfmt_em86.c
binfmt_flat.c exec: introduce finalize_exec() before start_thread() 2018-04-11 10:28:37 -07:00
binfmt_misc.c turn filp_clone_open() into inline wrapper for dentry_open() 2018-07-10 23:29:03 -04:00
binfmt_script.c
block_dev.c for-4.19/block-20180812 2018-08-14 10:23:25 -07:00
buffer.c iomap: mark newly allocated buffer heads as new 2018-06-19 15:10:55 -07:00
char_dev.c block, char_dev: Use correct format specifier for unsigned ints 2018-03-15 17:59:24 +01:00
compat.c ncpfs: remove compat functionality 2018-06-05 19:23:26 +02:00
compat_binfmt_elf.c
compat_ioctl.c media: dvb/audio.h: get rid of unused APIs 2018-07-30 16:21:49 -04:00
coredump.c
d_path.c split d_path() and friends into a separate file 2018-03-29 15:07:46 -04:00
dax.c dax: dax_layout_busy_page() warn on !exceptional 2018-07-29 16:59:16 -04:00
dcache.c fs/dcache.c: fix kmemcheck splat at take_dentry_name_snapshot() 2018-08-17 16:20:28 -07:00
dcookies.c fs: add do_lookup_dcookie() helper; remove in-kernel call to syscall 2018-04-02 20:15:39 +02:00
direct-io.c block: consistently use GFP_NOIO instead of __GFP_NORECLAIM 2018-05-14 08:55:18 -06:00
drop_caches.c
eventfd.c Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
eventpoll.c Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
exec.c mm: fix vma_is_anonymous() false-positives 2018-07-26 19:38:03 -07:00
fcntl.c mm: restructure memfd code 2018-06-07 17:34:35 -07:00
fhandle.c vfs: Copy struct mount.mnt_id to userspace using put_user() 2018-01-15 12:07:51 -08:00
file.c fs: add ksys_close() wrapper; remove in-kernel calls to sys_close() 2018-04-02 20:16:00 +02:00
file_table.c make alloc_file() static 2018-07-12 10:04:29 -04:00
filesystems.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
fs-writeback.c bdi: Fix oops in wb_workfn() 2018-05-03 16:11:37 -06:00
fs_pin.c
fs_struct.c
inode.c Merge branch 'work.mkdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-08-13 20:25:58 -07:00
internal.h Merge branch 'iomap-4.19-merge' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux 2018-08-13 22:29:03 -07:00
ioctl.c fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems 2018-05-24 12:04:28 -05:00
iomap.c Changes for 4.19: 2018-08-14 08:56:02 -07:00
libfs.c fs, dax: prepare for dax-specific address_space_operations 2018-03-30 11:34:55 -07:00
locks.c File locking fixes and enhancements for v4.19 2018-08-13 21:56:50 -07:00
mbcache.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
mount.h
mpage.c mpage: mpage_readpages() should submit IO as read-ahead 2018-08-17 16:20:29 -07:00
namei.c Merge branches 'work.misc' and 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-08-13 21:28:25 -07:00
namespace.c fix __legitimize_mnt()/mntput() race 2018-08-09 17:51:32 -04:00
no-block.c
nsfs.c net: Export open_related_ns() 2018-02-15 15:34:42 -05:00
open.c ->atomic_open(): return 0 in all success cases 2018-07-12 10:04:21 -04:00
pipe.c Merge branch 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-08-13 19:58:36 -07:00
pnode.c
pnode.h
posix_acl.c
proc_namespace.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
read_write.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
readdir.c fs: add ksys_getdents64() helper; remove in-kernel calls to sys_getdents64() 2018-04-02 20:16:02 +02:00
select.c Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
seq_file.c fs/seq_file.c: simplify seq_file iteration code and interface 2018-08-17 16:20:28 -07:00
signalfd.c Merge branch 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-06-16 16:21:50 +09:00
splice.c Merge branch 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-06-16 16:21:50 +09:00
stack.c
stat.c fs: add do_readlinkat() helper; remove internal call to sys_readlinkat() 2018-04-02 20:15:34 +02:00
statfs.c kernel: add kcompat_sys_{f,}statfs64() 2018-07-12 14:49:48 +01:00
super.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-06-04 10:14:28 -07:00
sync.c Changes for this release: 2018-04-04 12:44:02 -07:00
timerfd.c Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-08-13 20:56:23 -07:00
userfaultfd.c userfaultfd: remove uffd flags from vma->vm_flags if UFFD_EVENT_FORK fails 2018-08-02 16:03:40 -07:00
utimes.c fs: add do_compat_futimesat() helper; remove in-kernel call to compat syscall 2018-04-02 20:15:44 +02:00
xattr.c vfs: delete unnecessary assignment in vfs_listxattr 2018-05-29 13:22:41 -04:00