Commit graph

112 commits

Author SHA1 Message Date
Richard Weinberger f74a14e870 um: Remove hppfs
hppfs (honeypot procfs) was an attempt to use UML as honeypot.
It was never stable nor in heavy use.

As Al Viro and Christoph Hellwig pointed some major issues out
it is better to let it die.

Signed-off-by: Richard Weinberger <richard@nod.at>
2015-05-31 13:23:08 +02:00
Linus Torvalds 3f3c73de77 This adds the new tracefs file system. This has been in linux-next for
more than one release, as I had it ready for the 4.0 merge window, but
 a last minute thing that needed to go into Linux first had to be done.
 That was that perf hard coded the file system number when reading
 /sys/kernel/debugfs/tracing directory making sure that the path had
 the debugfs mount # before it would parse the tracing file. This broke
 other use cases of perf, and the check is removed.
 
 Now when mounting /sys/kernel/debug, tracefs is automatically mounted
 in /sys/kernel/debug/tracing such that old tools will still see that
 path as expected. But now system admins can mount tracefs directly
 and not need to mount debugfs, which can expose security issues.
 A new directory is created when tracefs is configured such that
 system admins can now mount it separately (/sys/kernel/tracing).
 
 This branch is based off of Al Viro's vfs debugfs_automount branch
 at commit 163f9eb95a
 debugfs: Provide a file creation function that also takes an initial size
 to get the debugfs_create_automount() operation.
 I just noticed that Al rebased the pull to add his Signed-off-by to
 that commit, and the commit is now e59b4e9187.
 I did a git diff of those two and see they are the same. Only the
 latter has Al's SOB.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVLA6YAAoJEEjnJuOKh9ldv6AH/1JUINDQwV+M0VTwzbLogloo
 Sco0byLhskmx5KLVD7Vs8BJAGrgHTdit32kzBGmLGJvVCKBa+c8lwmRw6rnXg3uX
 K4kGp7BIyn1/geXoGpCmDKaLGXhDcw49hRzejKDg/OqFtxKTsSeQtG8fo29ps9Do
 0VaF6UDp8gYplC2N2BfpB59LVndrITQ3mSsBBeFPvS7IxFJXAhDBOq2yi0aI6HyJ
 ICo2L/bA9HLxMuceWrXbsun+RP68+AQlnFfAtok7AcuBzUYPCKY0shT2VMOUtpTt
 1dGxMxq6q1ACfmY7gbp47WMX9aKjWcSEr0V+IYx/xex6Maf0Xsujsy99bKYUWvs=
 =OcgU
 -----END PGP SIGNATURE-----

Merge tag 'trace-4.1-tracefs' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracefs from Steven Rostedt:
 "This adds the new tracefs file system.

  This has been in linux-next for more than one release, as I had it
  ready for the 4.0 merge window, but a last minute thing that needed to
  go into Linux first had to be done.  That was that perf hard coded the
  file system number when reading /sys/kernel/debugfs/tracing directory
  making sure that the path had the debugfs mount # before it would
  parse the tracing file.  This broke other use cases of perf, and the
  check is removed.

  Now when mounting /sys/kernel/debug, tracefs is automatically mounted
  in /sys/kernel/debug/tracing such that old tools will still see that
  path as expected.  But now system admins can mount tracefs directly
  and not need to mount debugfs, which can expose security issues.  A
  new directory is created when tracefs is configured such that system
  admins can now mount it separately (/sys/kernel/tracing)"

* tag 'trace-4.1-tracefs' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Have mkdir and rmdir be part of tracefs
  tracefs: Add directory /sys/kernel/tracing
  tracing: Automatically mount tracefs on debugfs/tracing
  tracing: Convert the tracing facility over to use tracefs
  tracefs: Add new tracefs file system
  tracing: Create cmdline tracer options on tracing fs init
  tracing: Only create tracer options files if directory exists
  debugfs: Provide a file creation function that also takes an initial size
2015-04-14 10:22:29 -07:00
Linus Torvalds 9cd77374f0 Merge branch 'parisc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc update from Helge Deller:
 "The major change in here is the removal of the old HP-UX compat code
  which should have made it possible to load and execute 32-bit HP-UX
  binaries on PA-RISC Linux.  Since it was never functional and since
  nobody cares about old 32-bit HPUX binaries any longer, it's now time
  to free up 3200 lines of kernel code (CONFIG_HPUX and
  CONFIG_BINFMT_SOM).

  Other than that we wire up the execveat() syscall, fix sparse errors
  and have some whitespace cleanups"

* 'parisc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  fs/binfmt_som: Drop kernel support for HP-UX SOM binaries
  parisc: Remove unused function
  parisc: macro whitespace fixes
  parisc/uaccess: fix sparse errors
  parisc: hpux - Remove HPUX syscall numbers
  parisc: hpux - Remove hpux gateway page
  parisc: hpux - Delete files in hpux subdirectory
  parisc: hpux - Do not compile hpux subdirectory
  parisc: hpux - Drop support for HP-UX binaries
  parisc: Add error checks when building up signal trampoline handler
  parisc: Wire up execveat syscall
2015-02-17 14:25:58 -08:00
Helge Deller 35e88d5c22 fs/binfmt_som: Drop kernel support for HP-UX SOM binaries
The parisc arch has been the only user of HP-UX SOM binaries.

Support for HP-UX executables was never finished and since we now drop support
for the HP-UX compat layer anyway, it does not makes sense to keep the
BINFMT_SOM support.

Cc: linux-fsdevel@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
2015-02-17 16:29:36 +01:00
Matthew Wilcox 6cd176a51e vfs,ext2: remove CONFIG_EXT2_FS_XIP and rename CONFIG_FS_XIP to CONFIG_FS_DAX
The fewer Kconfig options we have the better.  Use the generic
CONFIG_FS_DAX to enable XIP support in ext2 as well as in the core.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-16 17:56:04 -08:00
Matthew Wilcox d475c6346a dax,ext2: replace XIP read and write with DAX I/O
Use the generic AIO infrastructure instead of custom read and write
methods.  In addition to giving us support for AIO, this adds the missing
locking between read() and truncate().

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-16 17:56:03 -08:00
Steven Rostedt (Red Hat) 4282d60689 tracefs: Add new tracefs file system
Add a separate file system to handle the tracing directory. Currently it
is part of debugfs, but that is starting to show its limits.

One thing is that in order to access the tracing infrastructure, you need
to mount debugfs. As that includes debugging from all sorts of sub systems
in the kernel, it is not considered advisable to mount such an all
encompassing debugging system.

Having the tracing system in its own file systems gives access to the
tracing sub system without needing to include all other systems.

Another problem with tracing using the debugfs system is that the
instances use mkdir to create sub buffers. debugfs does not support mkdir
from userspace so to implement it, special hacks were used. By controlling
the file system that the tracing infrastructure uses, this can be properly
done without hacks.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-03 12:48:40 -05:00
Al Viro 707c5960f1 Merge branch 'nsfs' into for-next 2014-12-10 21:31:59 -05:00
Al Viro e149ed2b80 take the targets of /proc/*/ns/* symlinks to separate fs
New pseudo-filesystem: nsfs.  Targets of /proc/*/ns/* live there now.
It's not mountable (not even registered, so it's not in /proc/filesystems,
etc.).  Files on it *are* bindable - we explicitly permit that in do_loopback().

This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
get_proc_ns() is a macro now (it's simply returning ->i_private; would
have been an inline, if not for header ordering headache).
proc_ns_inode() is an ex-parrot.  The interface used in procfs is
ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).

Dentries and inodes are never hashed; a non-counting reference to dentry
is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
if present.  See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
of that mechanism.

As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets
from ns_get_path().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-10 21:30:20 -05:00
Miklos Szeredi ef94b1864d ovl: rename filesystem type to "overlay"
Some distributions carry an "old" format of overlayfs while mainline has a
"new" format.

The distros will possibly want to keep the old overlayfs alongside the new
for compatibility reasons.

To make it possible to differentiate the two versions change the name of
the new one from "overlayfs" to "overlay".

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reported-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Andy Whitcroft <apw@canonical.com>
2014-11-20 16:39:59 +01:00
Miklos Szeredi e9be9d5e76 overlay filesystem
Overlayfs allows one, usually read-write, directory tree to be
overlaid onto another, read-only directory tree.  All modifications
go to the upper, writable layer.

This type of mechanism is most often used for live CDs but there's a
wide variety of other uses.

The implementation differs from other "union filesystem"
implementations in that after a file is opened all operations go
directly to the underlying, lower or upper, filesystems.  This
simplifies the implementation and allows native performance in these
cases.

The dentry tree is duplicated from the underlying filesystems, this
enables fast cached lookups without adding special support into the
VFS.  This uses slightly more memory than union mounts, but dentries
are relatively small.

Currently inodes are duplicated as well, but it is a possible
optimization to share inodes for non-directories.

Opening non directories results in the open forwarded to the
underlying filesystem.  This makes the behavior very similar to union
mounts (with the same limitations vs. fchmod/fchown on O_RDONLY file
descriptors).

Usage:

  mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper/upper,workdir=/upper/work /overlay

The following cotributions have been folded into this patch:

Neil Brown <neilb@suse.de>:
 - minimal remount support
 - use correct seek function for directories
 - initialise is_real before use
 - rename ovl_fill_cache to ovl_dir_read

Felix Fietkau <nbd@openwrt.org>:
 - fix a deadlock in ovl_dir_read_merged
 - fix a deadlock in ovl_remove_whiteouts

Erez Zadok <ezk@fsl.cs.sunysb.edu>
 - fix cleanup after WARN_ON

Sedat Dilek <sedat.dilek@googlemail.com>
 - fix up permission to confirm to new API

Robin Dong <hao.bigrat@gmail.com>
 - fix possible leak in ovl_new_inode
 - create new inode in ovl_link

Andy Whitcroft <apw@canonical.com>
 - switch to __inode_permission()
 - copy up i_uid/i_gid from the underlying inode

AV:
 - ovl_copy_up_locked() - dput(ERR_PTR(...)) on two failure exits
 - ovl_clear_empty() - one failure exit forgetting to do unlock_rename(),
   lack of check for udir being the parent of upper, dropping and regaining
   the lock on udir (which would require _another_ check for parent being
   right).
 - bogus d_drop() in copyup and rename [fix from your mail]
 - copyup/remove and copyup/rename races [fix from your mail]
 - ovl_dir_fsync() leaving ERR_PTR() in ->realfile
 - ovl_entry_free() is pointless - it's just a kfree_rcu()
 - fold ovl_do_lookup() into ovl_lookup()
 - manually assigning ->d_op is wrong.  Just use ->s_d_op.
 [patches picked from Miklos]:
 * copyup/remove and copyup/rename races
 * bogus d_drop() in copyup and rename

Also thanks to the following people for testing and reporting bugs:

  Jordi Pujol <jordipujolp@gmail.com>
  Andy Whitcroft <apw@canonical.com>
  Michal Suchanek <hramrach@centrum.cz>
  Felix Fietkau <nbd@openwrt.org>
  Erez Zadok <ezk@fsl.cs.sunysb.edu>
  Randy Dunlap <rdunlap@xenotime.net>

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-10-24 00:14:38 +02:00
Al Viro efb170c228 take fs_pin stuff to fs/*
Add a new field to fs_pin - kill(pin).  That's what umount and r/o remount
will be calling for all pins attached to vfsmount and superblock resp.
Called after bumping the refcount, so it won't go away under us.  Dropping
the refcount is responsibility of the instance.  All generic stuff moved to
fs/fs_pin.c; the next step will rip all the knowledge of kernel/acct.c from
fs/super.c and fs/namespace.c.  After that - death to mnt_pin(); it was
intended to be usable as generic mechanism for code that wants to attach
objects to vfsmount, so that they would not make the sucker busy and
would get killed on umount.  Never got it right; it remained acct.c-specific
all along.  Now it's very close to being killable.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:08 -04:00
Jens Axboe 2667bcbbd5 block: move ioprio.c from fs/ to block/
Like commit f9c78b2b, move this block related file outside
of fs/ and into the core block directory, block/.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-05-19 11:02:18 -06:00
Jens Axboe f9c78b2be2 block: move bio.c and bio-integrity.c from fs/ to block/
They really belong in block/, especially now since it's not in
drivers/block/ anymore. Additionally, the get_maintainer script
gets it wrong when in fs/.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-05-19 08:34:46 -06:00
Tejun Heo ba341d55a4 kernfs: add CONFIG_KERNFS
As sysfs was kernfs's only user, kernfs has been piggybacking on
CONFIG_SYSFS; however, kernfs is scheduled to grow a new user very
soon.  Introduce a separate config option CONFIG_KERNFS which is to be
selected by kernfs users.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07 16:08:57 -08:00
Linus Torvalds bf3d846b78 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "Assorted stuff; the biggest pile here is Christoph's ACL series.  Plus
  assorted cleanups and fixes all over the place...

  There will be another pile later this week"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
  __dentry_path() fixes
  vfs: Remove second variable named error in __dentry_path
  vfs: Is mounted should be testing mnt_ns for NULL or error.
  Fix race when checking i_size on direct i/o read
  hfsplus: remove can_set_xattr
  nfsd: use get_acl and ->set_acl
  fs: remove generic_acl
  nfs: use generic posix ACL infrastructure for v3 Posix ACLs
  gfs2: use generic posix ACL infrastructure
  jfs: use generic posix ACL infrastructure
  xfs: use generic posix ACL infrastructure
  reiserfs: use generic posix ACL infrastructure
  ocfs2: use generic posix ACL infrastructure
  jffs2: use generic posix ACL infrastructure
  hfsplus: use generic posix ACL infrastructure
  f2fs: use generic posix ACL infrastructure
  ext2/3/4: use generic posix ACL infrastructure
  btrfs: use generic posix ACL infrastructure
  fs: make posix_acl_create more useful
  fs: make posix_acl_chmod more useful
  ...
2014-01-28 08:38:04 -08:00
Christoph Hellwig feda821e76 fs: remove generic_acl
And instead convert tmpfs to use the new generic ACL code, with two stub
methods provided for in-memory filesystems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-26 08:26:40 -05:00
Christoph Hellwig 5c8ebd57b6 fs: merge xattr_acl.c into posix_acl.c
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 23:58:16 -05:00
Tejun Heo b8441ed279 sysfs, kernfs: add skeletons for kernfs
Core sysfs implementation will be separated into kernfs so that it can
be used by other non-kobject users.

This patch creates fs/kernfs/ directory and makes boilerplate changes.
kernfs interface will be directly based on sysfs_dirent and its
forward declaration is moved to include/linux/kernfs.h which is
included from include/linux/sysfs.h.  sysfs core implementation will
be gradually separated out and moved to kernfs.

This patch doesn't introduce any functional changes.

v2: mount.c added.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-27 13:28:24 -08:00
Linus Torvalds 20b4fb4852 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
  don't bother with deferred freeing of fdtables
  proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
  proc: Make the PROC_I() and PDE() macros internal to procfs
  proc: Supply a function to remove a proc entry by PDE
  take cgroup_open() and cpuset_open() to fs/proc/base.c
  ppc: Clean up scanlog
  ppc: Clean up rtas_flash driver somewhat
  hostap: proc: Use remove_proc_subtree()
  drm: proc: Use remove_proc_subtree()
  drm: proc: Use minor->index to label things, not PDE->name
  drm: Constify drm_proc_list[]
  zoran: Don't print proc_dir_entry data in debug
  reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
  proc: Supply an accessor for getting the data from a PDE's parent
  airo: Use remove_proc_subtree()
  rtl8192u: Don't need to save device proc dir PDE
  rtl8187se: Use a dir under /proc/net/r8180/
  proc: Add proc_mkdir_data()
  proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
  proc: Move PDE_NET() to fs/proc/proc_net.c
  ...
2013-05-01 17:51:54 -07:00
Linus Torvalds b9394d8a65 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/efi changes from Peter Anvin:
 "The bulk of these changes are cleaning up the efivars handling and
  breaking it up into a tree of files.  There are a number of fixes as
  well.

  The entire changeset is pretty big, but most of it is code movement.

  Several of these commits are quite new; the history got very messed up
  due to a mismerge with the urgent changes for rc8 which completely
  broke IA64, and so Ingo requested that we rebase it to straighten it
  out."

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: remove "kfree(NULL)"
  efi: locking fix in efivar_entry_set_safe()
  efi, pstore: Read data from variable store before memcpy()
  efi, pstore: Remove entry from list when erasing
  efi, pstore: Initialise 'entry' before iterating
  efi: split efisubsystem from efivars
  efivarfs: Move to fs/efivarfs
  efivars: Move pstore code into the new EFI directory
  efivars: efivar_entry API
  efivars: Keep a private global pointer to efivars
  efi: move utf16 string functions to efi.h
  x86, efi: Make efi_memblock_x86_reserve_range more readable
  efivarfs: convert to use simple_open()
2013-05-01 15:51:46 -07:00
Josh Triplett 2535e0d723 fs: make binfmt support for #! scripts modular and removable
Add a new configuration option CONFIG_BINFMT_SCRIPT to configure support
for interpreted scripts starting with "#!"; allow compiling out that
support, or building it as a module.  Embedded systems running exclusively
compiled binaries could leave this support out, and systems that don't
need scripts before mounting the root filesystem can build this as a
module.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:04 -07:00
Josh Triplett 146732ce10 fs: don't compile in drop_caches.c when CONFIG_SYSCTL=n
drop_caches.c provides code only invokable via sysctl, so don't compile it
in when CONFIG_SYSCTL=n.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:33 -07:00
Matt Fleming d68772b7c8 efivarfs: Move to fs/efivarfs
Now that efivarfs uses the efivar API, move it out of efivars.c and
into fs/efivarfs where it belongs. This move will eventually allow us
to enable the efivarfs code without having to also enable
CONFIG_EFI_VARS built, and vice versa.

Furthermore, things like,

    mount -t efivarfs none /sys/firmware/efi/efivars

will now work if efivarfs is built as a module without requiring the
use of MODULE_ALIAS(), which would have been necessary when the
efivarfs code was part of efivars.c.

Cc: Matthew Garrett <matthew.garrett@nebula.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Reviewed-by: Tom Gundersen <teg@jklm.no>
Tested-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-04-17 13:25:09 +01:00
Al Viro f776c73888 fold fifo.c into pipe.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:12:58 -04:00
Jaegeuk Kim a14d53937c f2fs: update Kconfig and Makefile
This adds Makefile and Kconfig for f2fs, and updates Makefile and Kconfig files
in the fs directory.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-11 13:43:42 +09:00
Alex Kelly 046d662f48 coredump: make core dump functionality optional
Adds an expert Kconfig option, CONFIG_COREDUMP, which allows disabling of
core dump.  This saves approximately 2.6k in the compiled kernel, and
complements CONFIG_ELF_CORE, which now depends on it.

CONFIG_COREDUMP also disables coredump-related sysctls, except for
suid_dumpable and related functions, which are necessary for ptrace.

[akpm@linux-foundation.org: fix binfmt_aout.c build]
Signed-off-by: Alex Kelly <alex.page.kelly@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:15 +09:00
Alex Kelly 10c28d937e coredump: move core dump functionality into its own file
This prepares for making core dump functionality optional.

The variable "suid_dumpable" and associated functions are left in fs/exec.c
because they're used elsewhere, such as in ptrace.

Signed-off-by: Alex Kelly <alex.page.kelly@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-02 21:35:55 -04:00
Kai Bankett 5d026c7242 fs: initial qnx6fs addition
Adds support for qnx6fs readonly support to the linux kernel.

* Mount option
  The option mmi_fs can be used to mount Harman Becker/Audi MMI 3G
  HDD qnx6fs filesystems.

* Documentation
  A high level filesystem stucture description can be found in the
  Documentation/filesystems directory. (qnx6.txt)

* Additional features
  - Active (stable) superblock selection
  - Superblock checksum check (enforced)
  - Supports mount of qnx6 filesystems with to host different endianess
  - Automatic endianess detection
  - Longfilename support (with non-enfocing crc check)
  - All blocksizes (512, 1024, 2048 and 4096 supported)

Signed-off-by: Kai Bankett <chaosman@ontika.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:38 -04:00
Al Viro ece2ccb668 Merge branches 'vfsmount-guts', 'umode_t' and 'partitions' into Z 2012-01-06 23:15:54 -05:00
Al Viro 0226f4923f vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
rationale: that stuff is far tighter bound to fs/namespace.c than to
the guts of procfs proper.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:13 -05:00
Al Viro 9be96f3fd1 move fs/partitions to block/
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:06 -05:00
Boaz Harrosh 60325f0c6e fs/Makefile: Stupid typo breakage of exofs inclusion
In my last patch I did a stupid mistake and broke the exofs
compilation completely. Fix it ASAP.

Instead of obj-y I did obj-$(y)

Really Really sorry. Me totally blushing :-{|

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-27 08:36:51 +02:00
Boaz Harrosh 3e335672e0 fs/Makefile: Always inspect exofs/
fs/exofs directory has multiple targets now, of which the
ore.ko will be needed by the pnfs-objects-layout-driver
(fs/nfs/objlayout).

As suggested by: Michal Marek <mmarek@suse.cz>  convert
inclusion of exofs/ from obj-$(CONFIG_EXOFS_FS) => obj-$(y).
So ORE can be selected also from fs/nfs/Kconfig

CC: Michal Marek <mmarek@suse.cz>
CC: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-24 16:36:33 -07:00
NeilBrown 49b28684fd nfsd: Remove deprecated nfsctl system call and related code.
As promised in feature-removal-schedule.txt it is time to
remove the nfsctl system call.

Userspace has perferred to not use this call throughout 2.6 and it has been
excluded in the default configuration since 2.6.36 (9 months ago).

So this patch removes all the code that was being compiled out.

There are still references to sys_nfsctl in various arch systemcall tables
and related code.  These should be cleaned out too, probably in the next
merge window.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-07-15 18:58:42 -04:00
Linus Torvalds 242e5d06be Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] tioca: Fix assignment from incompatible pointer warnings
  [IA64] mca.c: Fix cast from integer to pointer warning
  [IA64] setup.c Typo fix "Architechtuallly"
  [IA64] Add CONFIG_MISC_DEVICES=y to configs that need it.
  [IA64] disable interrupts at end of ia64_mca_cpe_int_handler()
  [IA64] Add DMA_ERROR_CODE define.
  pstore: fix build warning for unused return value from sysfs_create_file
  pstore: X86 platform interface using ACPI/APEI/ERST
  pstore: new filesystem interface to platform persistent storage
2011-03-16 19:01:29 -07:00
Aneesh Kumar K.V 990d6c2d7a vfs: Add name to file handle conversion support
The syscall also return mount id which can be used
to lookup file system specific information such as uuid
in /proc/<pid>/mountinfo

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:37 -04:00
Tony Luck ca01d6dd2d pstore: new filesystem interface to platform persistent storage
Some platforms have a small amount of non-volatile storage that
can be used to store information useful to diagnose the cause of
a system crash.  This is the generic part of a file system interface
that presents information from the crash as a series of files in
/dev/pstore.  Once the information has been seen, the underlying
storage is freed by deleting the files.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2010-12-28 14:25:21 -08:00
Greg Kroah-Hartman e4c5bf8e3d Merge 'staging-next' to Linus's tree
This merges the staging-next tree to Linus's tree and resolves
some conflicts that were present due to changes in other trees that were
affected by files here.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28 09:44:56 -07:00
Arnd Bergmann 2116b7a473 smbfs: move to drivers/staging
smbfs has been scheduled for removal in 2.6.27, so
maybe we can now move it to drivers/staging on the
way out.

smbfs still uses the big kernel lock and nobody
is going to fix that, so we should be getting
rid of it soon.

This removes the 32 bit compat mount and ioctl
handling code, which is implemented in common fs
code, and moves all smbfs related files into
drivers/staging/smbfs.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-05 09:08:21 -07:00
Arnd Bergmann db7bee24d2 autofs3: move to drivers/staging
Nobody appears to be interested in fixing autofs3 bugs
any more and it uses the BKL, which is going away.

Move this to staging for retirement. Unless someone
complains until 2.6.38, we can remove it for good.

The include/linux/auto_fs.h header file is still used
by autofs4, so it remains in place.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Ian Kent <raven@themaw.net>
Cc: autofs@linux.kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-05 09:03:39 -07:00
NeilBrown 1e1405673e nfsd: allow deprecated interface to be compiled out.
Add CONFIG_NFSD_DEPRECATED, default to y.
Only include deprecated interface if this is defined.
This allows distros to remove this interface before the official
removal, and allows developers to test without it.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-09-22 15:33:14 -04:00
Al Viro 7ed1ee6118 Take statfs variants to fs/statfs.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21 18:31:17 -04:00
Linus Torvalds fc7f99cf36 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (205 commits)
  ceph: update for write_inode API change
  ceph: reset osd after relevant messages timed out
  ceph: fix flush_dirty_caps race with caps migration
  ceph: include migrating caps in issued set
  ceph: fix osdmap decoding when pools include (removed) snaps
  ceph: return EBADF if waiting for caps on closed file
  ceph: set osd request message front length correctly
  ceph: reset front len on return to msgpool; BUG on mismatched front iov
  ceph: fix snaptrace decoding on cap migration between mds
  ceph: use single osd op reply msg
  ceph: reset bits on connection close
  ceph: remove bogus mds forward warning
  ceph: remove fragile __map_osds optimization
  ceph: fix connection fault STANDBY check
  ceph: invalidate_authorizer without con->mutex held
  ceph: don't clobber write return value when using O_SYNC
  ceph: fix client_request_forward decoding
  ceph: drop messages on unregistered mds sessions; cleanup
  ceph: fix comments, locking in destroy_inode
  ceph: move dereference after NULL test
  ...

Fix trivial conflicts in Documentation/ioctl/ioctl-number.txt
2010-03-19 09:43:06 -07:00
Joern Engel 5db53f3e80 [LogFS] add new flash file system
This is a new flash file system. See
Documentation/filesystems/logfs.txt

Signed-off-by: Joern Engel <joern@logfs.org>
2009-11-20 20:13:39 +01:00
Sage Weil 9030aaf9bf ceph: Kconfig, Makefile
Kconfig options and Makefile.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-06 11:31:15 -07:00
Ryusuke Konishi 0c4fb87764 nilfs2: update makefile and Kconfig
This adds a Makefile for the nilfs2 file system, and updates the
makefile and Kconfig file in the file system directory.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07 08:31:16 -07:00
Linus Torvalds 3cc50ac0db Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache: (41 commits)
  NFS: Add mount options to enable local caching on NFS
  NFS: Display local caching state
  NFS: Store pages from an NFS inode into a local cache
  NFS: Read pages from FS-Cache into an NFS inode
  NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching
  NFS: Add read context retention for FS-Cache to call back with
  NFS: FS-Cache page management
  NFS: Add some new I/O counters for FS-Cache doing things for NFS
  NFS: Invalidate FsCache page flags when cache removed
  NFS: Use local disk inode cache
  NFS: Define and create inode-level cache objects
  NFS: Define and create superblock-level objects
  NFS: Define and create server-level objects
  NFS: Register NFS for caching and retrieve the top-level index
  NFS: Permit local filesystem caching to be enabled for NFS
  NFS: Add FS-Cache option bit and debug bit
  NFS: Add comment banners to some NFS functions
  FS-Cache: Make kAFS use FS-Cache
  CacheFiles: A cache that backs onto a mounted filesystem
  CacheFiles: Export things for CacheFiles
  ...
2009-04-03 10:07:43 -07:00
Linus Torvalds 9b59f0316b Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  fs: Add exofs to Kernel build
  exofs: Documentation
  exofs: export_operations
  exofs: super_operations and file_system_type
  exofs: dir_inode and directory operations
  exofs: address_space_operations
  exofs: symlink_inode and fast_symlink_inode operations
  exofs: file and file_inode operations
  exofs: Kbuild, Headers and osd utils
2009-04-03 09:53:22 -07:00
David Howells 9ae326a690 CacheFiles: A cache that backs onto a mounted filesystem
Add an FS-Cache cache-backend that permits a mounted filesystem to be used as a
backing store for the cache.

CacheFiles uses a userspace daemon to do some of the cache management - such as
reaping stale nodes and culling.  This is called cachefilesd and lives in
/sbin.  The source for the daemon can be downloaded from:

	http://people.redhat.com/~dhowells/cachefs/cachefilesd.c

And an example configuration from:

	http://people.redhat.com/~dhowells/cachefs/cachefilesd.conf

The filesystem and data integrity of the cache are only as good as those of the
filesystem providing the backing services.  Note that CacheFiles does not
attempt to journal anything since the journalling interfaces of the various
filesystems are very specific in nature.

CacheFiles creates a misc character device - "/dev/cachefiles" - that is used
to communication with the daemon.  Only one thing may have this open at once,
and whilst it is open, a cache is at least partially in existence.  The daemon
opens this and sends commands down it to control the cache.

CacheFiles is currently limited to a single cache.

CacheFiles attempts to maintain at least a certain percentage of free space on
the filesystem, shrinking the cache by culling the objects it contains to make
space if necessary - see the "Cache Culling" section.  This means it can be
placed on the same medium as a live set of data, and will expand to make use of
spare space and automatically contract when the set of data requires more
space.

============
REQUIREMENTS
============

The use of CacheFiles and its daemon requires the following features to be
available in the system and in the cache filesystem:

	- dnotify.

	- extended attributes (xattrs).

	- openat() and friends.

	- bmap() support on files in the filesystem (FIBMAP ioctl).

	- The use of bmap() to detect a partial page at the end of the file.

It is strongly recommended that the "dir_index" option is enabled on Ext3
filesystems being used as a cache.

=============
CONFIGURATION
=============

The cache is configured by a script in /etc/cachefilesd.conf.  These commands
set up cache ready for use.  The following script commands are available:

 (*) brun <N>%
 (*) bcull <N>%
 (*) bstop <N>%
 (*) frun <N>%
 (*) fcull <N>%
 (*) fstop <N>%

	Configure the culling limits.  Optional.  See the section on culling
	The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.

	The commands beginning with a 'b' are file space (block) limits, those
	beginning with an 'f' are file count limits.

 (*) dir <path>

	Specify the directory containing the root of the cache.  Mandatory.

 (*) tag <name>

	Specify a tag to FS-Cache to use in distinguishing multiple caches.
	Optional.  The default is "CacheFiles".

 (*) debug <mask>

	Specify a numeric bitmask to control debugging in the kernel module.
	Optional.  The default is zero (all off).  The following values can be
	OR'd into the mask to collect various information:

		1	Turn on trace of function entry (_enter() macros)
		2	Turn on trace of function exit (_leave() macros)
		4	Turn on trace of internal debug points (_debug())

	This mask can also be set through sysfs, eg:

		echo 5 >/sys/modules/cachefiles/parameters/debug

==================
STARTING THE CACHE
==================

The cache is started by running the daemon.  The daemon opens the cache device,
configures the cache and tells it to begin caching.  At that point the cache
binds to fscache and the cache becomes live.

The daemon is run as follows:

	/sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]

The flags are:

 (*) -d

	Increase the debugging level.  This can be specified multiple times and
	is cumulative with itself.

 (*) -s

	Send messages to stderr instead of syslog.

 (*) -n

	Don't daemonise and go into background.

 (*) -f <configfile>

	Use an alternative configuration file rather than the default one.

===============
THINGS TO AVOID
===============

Do not mount other things within the cache as this will cause problems.  The
kernel module contains its own very cut-down path walking facility that ignores
mountpoints, but the daemon can't avoid them.

Do not create, rename or unlink files and directories in the cache whilst the
cache is active, as this may cause the state to become uncertain.

Renaming files in the cache might make objects appear to be other objects (the
filename is part of the lookup key).

Do not change or remove the extended attributes attached to cache files by the
cache as this will cause the cache state management to get confused.

Do not create files or directories in the cache, lest the cache get confused or
serve incorrect data.

Do not chmod files in the cache.  The module creates things with minimal
permissions to prevent random users being able to access them directly.

=============
CACHE CULLING
=============

The cache may need culling occasionally to make space.  This involves
discarding objects from the cache that have been used less recently than
anything else.  Culling is based on the access time of data objects.  Empty
directories are culled if not in use.

Cache culling is done on the basis of the percentage of blocks and the
percentage of files available in the underlying filesystem.  There are six
"limits":

 (*) brun
 (*) frun

     If the amount of free space and the number of available files in the cache
     rises above both these limits, then culling is turned off.

 (*) bcull
 (*) fcull

     If the amount of available space or the number of available files in the
     cache falls below either of these limits, then culling is started.

 (*) bstop
 (*) fstop

     If the amount of available space or the number of available files in the
     cache falls below either of these limits, then no further allocation of
     disk space or files is permitted until culling has raised things above
     these limits again.

These must be configured thusly:

	0 <= bstop < bcull < brun < 100
	0 <= fstop < fcull < frun < 100

Note that these are percentages of available space and available files, and do
_not_ appear as 100 minus the percentage displayed by the "df" program.

The userspace daemon scans the cache to build up a table of cullable objects.
These are then culled in least recently used order.  A new scan of the cache is
started as soon as space is made in the table.  Objects will be skipped if
their atimes have changed or if the kernel module says it is still using them.

===============
CACHE STRUCTURE
===============

The CacheFiles module will create two directories in the directory it was
given:

 (*) cache/

 (*) graveyard/

The active cache objects all reside in the first directory.  The CacheFiles
kernel module moves any retired or culled objects that it can't simply unlink
to the graveyard from which the daemon will actually delete them.

The daemon uses dnotify to monitor the graveyard directory, and will delete
anything that appears therein.

The module represents index objects as directories with the filename "I..." or
"J...".  Note that the "cache/" directory is itself a special index.

Data objects are represented as files if they have no children, or directories
if they do.  Their filenames all begin "D..." or "E...".  If represented as a
directory, data objects will have a file in the directory called "data" that
actually holds the data.

Special objects are similar to data objects, except their filenames begin
"S..." or "T...".

If an object has children, then it will be represented as a directory.
Immediately in the representative directory are a collection of directories
named for hash values of the child object keys with an '@' prepended.  Into
this directory, if possible, will be placed the representations of the child
objects:

	INDEX     INDEX      INDEX                             DATA FILES
	========= ========== ================================= ================
	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry

If the key is so long that it exceeds NAME_MAX with the decorations added on to
it, then it will be cut into pieces, the first few of which will be used to
make a nest of directories, and the last one of which will be the objects
inside the last directory.  The names of the intermediate directories will have
'+' prepended:

	J1223/@23/+xy...z/+kl...m/Epqr

Note that keys are raw data, and not only may they exceed NAME_MAX in size,
they may also contain things like '/' and NUL characters, and so they may not
be suitable for turning directly into a filename.

To handle this, CacheFiles will use a suitably printable filename directly and
"base-64" encode ones that aren't directly suitable.  The two versions of
object filenames indicate the encoding:

	OBJECT TYPE	PRINTABLE	ENCODED
	===============	===============	===============
	Index		"I..."		"J..."
	Data		"D..."		"E..."
	Special		"S..."		"T..."

Intermediate directories are always "@" or "+" as appropriate.

Each object in the cache has an extended attribute label that holds the object
type ID (required to distinguish special objects) and the auxiliary data from
the netfs.  The latter is used to detect stale objects in the cache and update
or retire them.

Note that CacheFiles will erase from the cache any file it doesn't recognise or
any file of an incorrect type (such as a FIFO file or a device file).

==========================
SECURITY MODEL AND SELINUX
==========================

CacheFiles is implemented to deal properly with the LSM security features of
the Linux kernel and the SELinux facility.

One of the problems that CacheFiles faces is that it is generally acting on
behalf of a process, and running in that process's context, and that includes a
security context that is not appropriate for accessing the cache - either
because the files in the cache are inaccessible to that process, or because if
the process creates a file in the cache, that file may be inaccessible to other
processes.

The way CacheFiles works is to temporarily change the security context (fsuid,
fsgid and actor security label) that the process acts as - without changing the
security context of the process when it the target of an operation performed by
some other process (so signalling and suchlike still work correctly).

When the CacheFiles module is asked to bind to its cache, it:

 (1) Finds the security label attached to the root cache directory and uses
     that as the security label with which it will create files.  By default,
     this is:

	cachefiles_var_t

 (2) Finds the security label of the process which issued the bind request
     (presumed to be the cachefilesd daemon), which by default will be:

	cachefilesd_t

     and asks LSM to supply a security ID as which it should act given the
     daemon's label.  By default, this will be:

	cachefiles_kernel_t

     SELinux transitions the daemon's security ID to the module's security ID
     based on a rule of this form in the policy.

	type_transition <daemon's-ID> kernel_t : process <module's-ID>;

     For instance:

	type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;

The module's security ID gives it permission to create, move and remove files
and directories in the cache, to find and access directories and files in the
cache, to set and access extended attributes on cache objects, and to read and
write files in the cache.

The daemon's security ID gives it only a very restricted set of permissions: it
may scan directories, stat files and erase files and directories.  It may
not read or write files in the cache, and so it is precluded from accessing the
data cached therein; nor is it permitted to create new files in the cache.

There are policy source files available in:

	http://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2

and later versions.  In that tarball, see the files:

	cachefilesd.te
	cachefilesd.fc
	cachefilesd.if

They are built and installed directly by the RPM.

If a non-RPM based system is being used, then copy the above files to their own
directory and run:

	make -f /usr/share/selinux/devel/Makefile
	semodule -i cachefilesd.pp

You will need checkpolicy and selinux-policy-devel installed prior to the
build.

By default, the cache is located in /var/fscache, but if it is desirable that
it should be elsewhere, than either the above policy files must be altered, or
an auxiliary policy must be installed to label the alternate location of the
cache.

For instructions on how to add an auxiliary policy to enable the cache to be
located elsewhere when SELinux is in enforcing mode, please see:

	/usr/share/doc/cachefilesd-*/move-cache.txt

When the cachefilesd rpm is installed; alternatively, the document can be found
in the sources.

==================
A NOTE ON SECURITY
==================

CacheFiles makes use of the split security in the task_struct.  It allocates
its own task_security structure, and redirects current->act_as to point to it
when it acts on behalf of another process, in that process's context.

The reason it does this is that it calls vfs_mkdir() and suchlike rather than
bypassing security and calling inode ops directly.  Therefore the VFS and LSM
may deny the CacheFiles access to the cache data because under some
circumstances the caching code is running in the security context of whatever
process issued the original syscall on the netfs.

Furthermore, should CacheFiles create a file or directory, the security
parameters with that object is created (UID, GID, security label) would be
derived from that process that issued the system call, thus potentially
preventing other processes from accessing the cache - including CacheFiles's
cache management daemon (cachefilesd).

What is required is to temporarily override the security of the process that
issued the system call.  We can't, however, just do an in-place change of the
security data as that affects the process as an object, not just as a subject.
This means it may lose signals or ptrace events for example, and affects what
the process looks like in /proc.

So CacheFiles makes use of a logical split in the security between the
objective security (task->sec) and the subjective security (task->act_as).  The
objective security holds the intrinsic security properties of a process and is
never overridden.  This is what appears in /proc, and is what is used when a
process is the target of an operation by some other process (SIGKILL for
example).

The subjective security holds the active security properties of a process, and
may be overridden.  This is not seen externally, and is used whan a process
acts upon another object, for example SIGKILLing another process or opening a
file.

LSM hooks exist that allow SELinux (or Smack or whatever) to reject a request
for CacheFiles to run in a context of a specific security label, or to create
files and directories with another security label.

This documentation is added by the patch to:

	Documentation/filesystems/caching/cachefiles.txt

Signed-Off-By: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:41 +01:00