1
0
Fork 0
Commit Graph

45 Commits (a528d35e8bfcc521d7cb70aaf03e1bd296c8493f)

Author SHA1 Message Date
David Howells a528d35e8b statx: Add a system call to make enhanced file info available
Add a system call to make extended file information available, including
file creation and some attribute flags where available through the
underlying filesystem.

The getattr inode operation is altered to take two additional arguments: a
u32 request_mask and an unsigned int flags that indicate the
synchronisation mode.  This change is propagated to the vfs_getattr*()
function.

Functions like vfs_stat() are now inline wrappers around new functions
vfs_statx() and vfs_statx_fd() to reduce stack usage.

========
OVERVIEW
========

The idea was initially proposed as a set of xattrs that could be retrieved
with getxattr(), but the general preference proved to be for a new syscall
with an extended stat structure.

A number of requests were gathered for features to be included.  The
following have been included:

 (1) Make the fields a consistent size on all arches and make them large.

 (2) Spare space, request flags and information flags are provided for
     future expansion.

 (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an
     __s64).

 (4) Creation time: The SMB protocol carries the creation time, which could
     be exported by Samba, which will in turn help CIFS make use of
     FS-Cache as that can be used for coherency data (stx_btime).

     This is also specified in NFSv4 as a recommended attribute and could
     be exported by NFSD [Steve French].

 (5) Lightweight stat: Ask for just those details of interest, and allow a
     netfs (such as NFS) to approximate anything not of interest, possibly
     without going to the server [Trond Myklebust, Ulrich Drepper, Andreas
     Dilger] (AT_STATX_DONT_SYNC).

 (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks
     its cached attributes are up to date [Trond Myklebust]
     (AT_STATX_FORCE_SYNC).

And the following have been left out for future extension:

 (7) Data version number: Could be used by userspace NFS servers [Aneesh
     Kumar].

     Can also be used to modify fill_post_wcc() in NFSD which retrieves
     i_version directly, but has just called vfs_getattr().  It could get
     it from the kstat struct if it used vfs_xgetattr() instead.

     (There's disagreement on the exact semantics of a single field, since
     not all filesystems do this the same way).

 (8) BSD stat compatibility: Including more fields from the BSD stat such
     as creation time (st_btime) and inode generation number (st_gen)
     [Jeremy Allison, Bernd Schubert].

 (9) Inode generation number: Useful for FUSE and userspace NFS servers
     [Bernd Schubert].

     (This was asked for but later deemed unnecessary with the
     open-by-handle capability available and caused disagreement as to
     whether it's a security hole or not).

(10) Extra coherency data may be useful in making backups [Andreas Dilger].

     (No particular data were offered, but things like last backup
     timestamp, the data version number and the DOS archive bit would come
     into this category).

(11) Allow the filesystem to indicate what it can/cannot provide: A
     filesystem can now say it doesn't support a standard stat feature if
     that isn't available, so if, for instance, inode numbers or UIDs don't
     exist or are fabricated locally...

     (This requires a separate system call - I have an fsinfo() call idea
     for this).

(12) Store a 16-byte volume ID in the superblock that can be returned in
     struct xstat [Steve French].

     (Deferred to fsinfo).

(13) Include granularity fields in the time data to indicate the
     granularity of each of the times (NFSv4 time_delta) [Steve French].

     (Deferred to fsinfo).

(14) FS_IOC_GETFLAGS value.  These could be translated to BSD's st_flags.
     Note that the Linux IOC flags are a mess and filesystems such as Ext4
     define flags that aren't in linux/fs.h, so translation in the kernel
     may be a necessity (or, possibly, we provide the filesystem type too).

     (Some attributes are made available in stx_attributes, but the general
     feeling was that the IOC flags were to ext[234]-specific and shouldn't
     be exposed through statx this way).

(15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
     Michael Kerrisk].

     (Deferred, probably to fsinfo.  Finding out if there's an ACL or
     seclabal might require extra filesystem operations).

(16) Femtosecond-resolution timestamps [Dave Chinner].

     (A __reserved field has been left in the statx_timestamp struct for
     this - if there proves to be a need).

(17) A set multiple attributes syscall to go with this.

===============
NEW SYSTEM CALL
===============

The new system call is:

	int ret = statx(int dfd,
			const char *filename,
			unsigned int flags,
			unsigned int mask,
			struct statx *buffer);

The dfd, filename and flags parameters indicate the file to query, in a
similar way to fstatat().  There is no equivalent of lstat() as that can be
emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags.  There is
also no equivalent of fstat() as that can be emulated by passing a NULL
filename to statx() with the fd of interest in dfd.

Whether or not statx() synchronises the attributes with the backing store
can be controlled by OR'ing a value into the flags argument (this typically
only affects network filesystems):

 (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this
     respect.

 (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise
     its attributes with the server - which might require data writeback to
     occur to get the timestamps correct.

 (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a
     network filesystem.  The resulting values should be considered
     approximate.

mask is a bitmask indicating the fields in struct statx that are of
interest to the caller.  The user should set this to STATX_BASIC_STATS to
get the basic set returned by stat().  It should be noted that asking for
more information may entail extra I/O operations.

buffer points to the destination for the data.  This must be 256 bytes in
size.

======================
MAIN ATTRIBUTES RECORD
======================

The following structures are defined in which to return the main attribute
set:

	struct statx_timestamp {
		__s64	tv_sec;
		__s32	tv_nsec;
		__s32	__reserved;
	};

	struct statx {
		__u32	stx_mask;
		__u32	stx_blksize;
		__u64	stx_attributes;
		__u32	stx_nlink;
		__u32	stx_uid;
		__u32	stx_gid;
		__u16	stx_mode;
		__u16	__spare0[1];
		__u64	stx_ino;
		__u64	stx_size;
		__u64	stx_blocks;
		__u64	__spare1[1];
		struct statx_timestamp	stx_atime;
		struct statx_timestamp	stx_btime;
		struct statx_timestamp	stx_ctime;
		struct statx_timestamp	stx_mtime;
		__u32	stx_rdev_major;
		__u32	stx_rdev_minor;
		__u32	stx_dev_major;
		__u32	stx_dev_minor;
		__u64	__spare2[14];
	};

The defined bits in request_mask and stx_mask are:

	STATX_TYPE		Want/got stx_mode & S_IFMT
	STATX_MODE		Want/got stx_mode & ~S_IFMT
	STATX_NLINK		Want/got stx_nlink
	STATX_UID		Want/got stx_uid
	STATX_GID		Want/got stx_gid
	STATX_ATIME		Want/got stx_atime{,_ns}
	STATX_MTIME		Want/got stx_mtime{,_ns}
	STATX_CTIME		Want/got stx_ctime{,_ns}
	STATX_INO		Want/got stx_ino
	STATX_SIZE		Want/got stx_size
	STATX_BLOCKS		Want/got stx_blocks
	STATX_BASIC_STATS	[The stuff in the normal stat struct]
	STATX_BTIME		Want/got stx_btime{,_ns}
	STATX_ALL		[All currently available stuff]

stx_btime is the file creation time, stx_mask is a bitmask indicating the
data provided and __spares*[] are where as-yet undefined fields can be
placed.

Time fields are structures with separate seconds and nanoseconds fields
plus a reserved field in case we want to add even finer resolution.  Note
that times will be negative if before 1970; in such a case, the nanosecond
fields will also be negative if not zero.

The bits defined in the stx_attributes field convey information about a
file, how it is accessed, where it is and what it does.  The following
attributes map to FS_*_FL flags and are the same numerical value:

	STATX_ATTR_COMPRESSED		File is compressed by the fs
	STATX_ATTR_IMMUTABLE		File is marked immutable
	STATX_ATTR_APPEND		File is append-only
	STATX_ATTR_NODUMP		File is not to be dumped
	STATX_ATTR_ENCRYPTED		File requires key to decrypt in fs

Within the kernel, the supported flags are listed by:

	KSTAT_ATTR_FS_IOC_FLAGS

[Are any other IOC flags of sufficient general interest to be exposed
through this interface?]

New flags include:

	STATX_ATTR_AUTOMOUNT		Object is an automount trigger

These are for the use of GUI tools that might want to mark files specially,
depending on what they are.

Fields in struct statx come in a number of classes:

 (0) stx_dev_*, stx_blksize.

     These are local system information and are always available.

 (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino,
     stx_size, stx_blocks.

     These will be returned whether the caller asks for them or not.  The
     corresponding bits in stx_mask will be set to indicate whether they
     actually have valid values.

     If the caller didn't ask for them, then they may be approximated.  For
     example, NFS won't waste any time updating them from the server,
     unless as a byproduct of updating something requested.

     If the values don't actually exist for the underlying object (such as
     UID or GID on a DOS file), then the bit won't be set in the stx_mask,
     even if the caller asked for the value.  In such a case, the returned
     value will be a fabrication.

     Note that there are instances where the type might not be valid, for
     instance Windows reparse points.

 (2) stx_rdev_*.

     This will be set only if stx_mode indicates we're looking at a
     blockdev or a chardev, otherwise will be 0.

 (3) stx_btime.

     Similar to (1), except this will be set to 0 if it doesn't exist.

=======
TESTING
=======

The following test program can be used to test the statx system call:

	samples/statx/test-statx.c

Just compile and run, passing it paths to the files you want to examine.
The file is built automatically if CONFIG_SAMPLES is enabled.

Here's some example output.  Firstly, an NFS directory that crosses to
another FSID.  Note that the AUTOMOUNT attribute is set because transiting
this directory will cause d_automount to be invoked by the VFS.

	[root@andromeda ~]# /tmp/test-statx -A /warthog/data
	statx(/warthog/data) = 0
	results=7ff
	  Size: 4096            Blocks: 8          IO Block: 1048576  directory
	Device: 00:26           Inode: 1703937     Links: 125
	Access: (3777/drwxrwxrwx)  Uid:     0   Gid:  4041
	Access: 2016-11-24 09:02:12.219699527+0000
	Modify: 2016-11-17 10:44:36.225653653+0000
	Change: 2016-11-17 10:44:36.225653653+0000
	Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------)

Secondly, the result of automounting on that directory.

	[root@andromeda ~]# /tmp/test-statx /warthog/data
	statx(/warthog/data) = 0
	results=7ff
	  Size: 4096            Blocks: 8          IO Block: 1048576  directory
	Device: 00:27           Inode: 2           Links: 125
	Access: (3777/drwxrwxrwx)  Uid:     0   Gid:  4041
	Access: 2016-11-24 09:02:12.219699527+0000
	Modify: 2016-11-17 10:44:36.225653653+0000
	Change: 2016-11-17 10:44:36.225653653+0000

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-02 20:51:15 -05:00
Boris Brezillon 7554769641 UBI: provide an helper to check whether a LEB is mapped or not
This is part of the process of hiding UBI EBA's internal to other part of
the UBI implementation, so that we can add new information to the EBA
table without having to patch different places in the UBI code.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-10-02 22:48:14 +02:00
Boris Brezillon 9a5f09ac0a UBI: add an helper to check lnum validity
ubi_leb_valid() is here to replace the
lnum < 0 || lnum >= vol->reserved_pebs checks.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-10-02 22:48:14 +02:00
Richard Weinberger 61edc3f3b5 ubi: Don't bypass ->getattr()
Directly accessing inode fields bypasses ->getattr()
and can cause problems when the underlying filesystem
does not have the default ->getattr() implementation.

So instead of obtaining the backing inode via d_backing_inode()
use vfs_getattr() and obtain what we need from the kstat struct.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-06-14 10:51:42 +02:00
Richard Weinberger ad022c8718 Revert "mtd: switch ubi_open_volume_path() to vfs_stat()"
This reverts commit 322ea0bbf3.

vfs_stat() can only be used on user supplied buffers.
UBI's kapi.c is the API to the kernel and therefore vfs_stat()
is inappropriate.

This solves the problem that mounting any UBIFS will immediately
fail with -EINVAL.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-06-14 10:51:42 +02:00
Linus Torvalds 23a3e178b9 This pull request contains mostly cleanups and minor
improvements of UBI and UBIFS.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJXSJl6AAoJEEtJtSqsAOnWbbAP/2ls5KpGsRSoOP4NsKo5+8k9
 GsZ8hb0iN+UGDxMB5Jlj6O1ISNPz/8o7iGBuWa5OWFdh4Nn38sA1Qv996Rg3Ca3O
 8KYEGAD7POWANfxCTmyQX/L8AsOP62B3diFktPyGrtYmLsVzx/AFN9/nM27ticdF
 PQpUtLwvL/m7/oE6ymFCB4x34+DkTa7Jx4e+chVlF61CUipGc5I60VVX1Do+/DWt
 S37ajjIxMJx0BO7Zs6vrcH9uGFSbvSpVBr9/sC16t0Fa1/vAa9pXieg3y5XZmtOl
 +6NlwUxXVu5IjHkYYGZP8jsrA7bKJlflgCo/w6WKC6V0KECnBx+oSG9uMjDVPBSM
 rc4VF4nevnNLNqFGBRWhxiYrRUCdB1TcWVDOzM16peNeUNJiWj/+4PbsiJQBwECH
 ZnE/dBrjU7v+J8UmHsUJSveW6/5PluJf1loDzrip7gk5QD13wdpPo3VOycOj1vMD
 iS6H/TBjtcEzWWh7qcBbtgkbHQH0g+w9YSzVy7ODFweuoq+4qCZ1davqzzZaRv3q
 hylor1gea3b4GnH1EDjmXyQOOuIf2rLdJhou8WoZhmy2q3FrnE+89/CCOCMkeQMQ
 qQr1lAbxD7A+8bYwSZ7Nvv6lrBUrat59DV6FHFc69MdkGT7J6z85ZOyd3lxRvJc/
 XiZZkKhrYSKgV/iLFhHW
 =L16y
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.7-rc1' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS updates from Richard Weinberger:
 "This contains mostly cleanups and minor improvements of UBI and UBIFS"

* tag 'upstream-4.7-rc1' of git://git.infradead.org/linux-ubifs:
  ubifs: ubifs_dump_inode: Fix dumping field bulk_read
  UBI: Fix static volume checks when Fastmap is used
  UBI: Set free_count to zero before walking through erase list
  UBI: Silence an unintialized variable warning
  UBI: Clean up return in ubi_remove_volume()
  UBI: Modify wrong comment in ubi_leb_map function.
  UBI: Don't read back all data in ubi_eba_copy_leb()
  UBI: Add ro-mode sysfs attribute
2016-05-27 18:49:29 -07:00
z00189512 960b35d06b UBI: Modify wrong comment in ubi_leb_map function.
Signed-off-by: z00189512 <abc.zhangliang@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-05-24 15:24:20 +02:00
Al Viro 322ea0bbf3 mtd: switch ubi_open_volume_path() to vfs_stat()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-27 23:49:17 -04:00
David Howells bb668734c4 VFS: assorted d_backing_inode() annotations
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-15 15:06:59 -04:00
Richard Weinberger 9ff08979e1 UBI: Add initial support for scatter gather
Adds a new set of functions to deal with scatter gather.
ubi_eba_read_leb_sg() will read from a LEB into a scatter gather list.
The new data structure struct ubi_sgl will be used within UBI to
hold the scatter gather list itself and metadata to have a cursor
within the list.

Signed-off-by: Richard Weinberger <richard@nod.at>
Tested-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
2015-01-28 16:04:26 +01:00
Richard Weinberger fafdd2bf26 UBI: Implement UBI_METAONLY
UBI_METAONLY is a new open mode for UBI volumes, it indicates
that only meta data is being changed.
Meta data in terms of UBI volumes means data which is stored in the
UBI volume table but not on the volume itself.
While it does not interfere with UBI_READONLY and UBI_READWRITE
it is not allowed to use UBI_METAONLY together with UBI_EXCLUSIVE.

Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Cc: Andrew Murray <amurray@embedded-bits.co.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
Tested-by: Guido Martínez <guido@vanguardiasur.com.ar>
Reviewed-by: Guido Martínez <guido@vanguardiasur.com.ar>
Tested-by: Christoph Fritz <chf.fritz@googlemail.com>
Tested-by: Andrew Murray <amurray@embedded-bits.co.uk>
2015-01-28 15:57:04 +01:00
Tanya Brokhman 3260870331 UBI: Extend UBI layer debug/messaging capabilities
If there is more then one UBI device mounted, there is no way to
distinguish between messages from different UBI devices.
Add device number to all ubi layer message types.

The R/O block driver messages were replaced by pr_* since
ubi_device structure is not used by it.

Amended a bit by Artem.

Signed-off-by: Tanya Brokhman <tlinder@codeaurora.org>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-11-07 12:08:51 +02:00
Joel Reardon 62f384552b UBI: modify ubi_wl_flush function to clear work queue for a lnum
This patch modifies ubi_wl_flush to force the erasure of
particular volume id / logical eraseblock number pairs. Previous functionality
is preserved when passing UBI_ALL for both values. The locations where ubi_wl_flush
were called are appropriately changed: ubi_leb_erase only flushes for the
erased LEB, and ubi_create_volume forces only flushing for its volume id.
External code can call this new feature via the new function ubi_flush() added
to kapi.c, which simply passes through to ubi_wl_flush().

This was tested by disabling the call to do_work in ubi thread, which results
in the work queue remaining unless explicitly called to remove. UBIFS was
changed to call ubifs_leb_change 50 times for four different LEBs. Then the
new function was called to clear the queue: passing wrong volume ids / lnum,
correct ones, and finally UBI_ALL for both to ensure it was finally all
cleard. The work queue was dumped each time and the selective removal
of the particular LEB numbers was observed. Extra checks were enabled and
ubifs's integck was also run. Finally, the drive was repeatedly filled and
emptied to ensure that the queue was cleared normally.

Artem: amended the patch.

Signed-off-by: Joel Reardon <reardonj@inf.ethz.ch>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-05-21 11:34:41 +03:00
Artem Bityutskiy e2986827d5 UBI: get rid of dbg_err
This patch removes the 'dbg_err()' macro and we now use 'ubi_err' instead.
The idea of 'dbg_err()' was to compile out some error message to make the
binary a bit smaller - but I think it was a bad idea.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-05-20 20:26:00 +03:00
Richard Weinberger b36a261e8c UBI: Kill data type hint
We do not need this feature and to our shame it even was not working
and there was a bug found very recently.
	-- Artem Bityutskiy

Without the data type hint UBI2 (fastmap) will be easier to implement.

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-05-20 20:25:59 +03:00
Artem Bityutskiy 327cf2922b mtd: do not use mtd->sync directly
This patch teaches 'mtd_sync()' to do nothing when the MTD driver does
not have the '->sync()' method, which allows us to remove all direct
'mtd->sync' accesses.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:26:21 +00:00
Artem Bityutskiy 85f2f2a809 mtd: introduce mtd_sync interface
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:25:35 +00:00
Brian Norris d57f40544a mtd: utilize `mtd_is_*()' functions
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
2011-09-21 09:19:06 +03:00
Artem Bityutskiy f43ec882b8 UBI: provide LEB offset information
Provide the LEB offset information in the UBI device information data
structure. This piece of information is required by UBIFS to find out
what are the LEB offsets which are aligned to the max. write size.

If LEB offset not aligned to max. write size, then UBIFS has to take
this into account to write more optimally.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-03-08 10:12:48 +02:00
Artem Bityutskiy 30b542ef45 UBI: incorporate maximum write size
Incorporate MTD write buffer size into UBI device information
because UBIFS needs this field. UBI does not use it ATM, just
provides to upper layers in 'struct ubi_device_info'.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-03-08 10:12:48 +02:00
Shinya Kuribayashi 3f5026222e UBI: fix s/then/than/ typos
Signed-off-by: Shinya Kuribayashi <shinya.kuribayashi.px@renesas.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2010-05-07 08:33:10 +03:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Artem Bityutskiy b531b55a7b UBI: add more checks to chdev open
When opening UBI volumes by their character device names, make
sure we are opening character devices, not block devices or any
other inode type.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2010-01-12 13:19:15 +02:00
Corentin Chary b571028418 UBI: Add ubi_open_volume_path
Add an 'ubi_open_volume_path(path, mode)' function which works like
'open_bdev_exclusive(path, mode, ...)' where path is the special file
representing the UBI volume, typically /dev/ubi0_0.

This is needed to teach UBIFS being able to mount UBI character devices.

[Comments and the patch were amended a bit by Artem]

Signed-off-by: Corentin Chary <corentincj@iksaif.net>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-11-24 08:18:54 +02:00
Dmitry Pervushin 0e0ee1cc33 UBI: add notification API
UBI volume notifications are intended to create the API to get clients
notified about volume creation/deletion, renaming and re-sizing. A
client can subscribe to these notifications using 'ubi_volume_register()'
and cancel the subscription using 'ubi_volume_unregister()'. When UBI
volumes change, a blocking notifier is called. Clients also can request
"added" events on all volumes that existed before client subscribed
to the notifications.

If we use notifications instead of calling functions like 'ubi_gluebi_xxx()',
we can make the MTD emulation layer to be more flexible: build it as a
separate module and load/unload it on demand.

[Artem: many cleanups, rework locking, add "updated" event, provide
 device/volume info in notifiers]

Signed-off-by: Dmitry Pervushin <dpervushin@embeddedalley.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-06-02 13:53:35 +03:00
Artem Bityutskiy e1cf7e6dd4 UBI: improve debugging messages
Various minor improvements to the debugging messages which
I found useful while hunting problems.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-05-18 12:28:25 +03:00
Coly Li 73ac36ea14 fix similar typos to successfull
When I review ocfs2 code, find there are 2 typos to "successfull".  After
doing grep "successfull " in kernel tree, 22 typos found totally -- great
minds always think alike :)

This patch fixes all the similar typos. Thanks for Randy's ack and comments.

Signed-off-by: Coly Li <coyli@suse.de>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:15 -08:00
Artem Bityutskiy c8566350a3 UBI: fix and re-work debugging stuff
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:34:45 +03:00
Artem Bityutskiy a5bf619041 UBI: add ubi_sync() interface
To flush MTD device caches.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:32:56 +03:00
Kyungmin Park cadb40ccc1 UBI: avoid unnecessary division operations
UBI already checks that @min io size is the power of 2 at io_init.
It is save to use bit operations then.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:32:54 +03:00
Artem Bityutskiy ae616e1be1 UBI: fix warnings
drivers/mtd/ubi/cdev.c: In function ‘vol_cdev_read’:
drivers/mtd/ubi/cdev.c:187: warning: unused variable ‘vol_id’
CC [M]  drivers/mtd/ubi/kapi.o
drivers/mtd/ubi/kapi.c: In function ‘ubi_leb_erase’:
drivers/mtd/ubi/kapi.c:483: warning: unused variable ‘vol_id’
drivers/mtd/ubi/kapi.c: In function ‘ubi_leb_unmap’:
drivers/mtd/ubi/kapi.c:544: warning: unused variable ‘vol_id’
drivers/mtd/ubi/kapi.c: In function ‘ubi_leb_map’:
drivers/mtd/ubi/kapi.c:582: warning: unused variable ‘vol_id’

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-01-25 16:41:24 +02:00
Artem Bityutskiy 783b273afa UBI: use separate mutex for volumes checking
Introduce a separate mutex which serializes volumes checking,
because we cammot really use volumes_mutex - it cases reverse
locking problems with mtd_tbl_mutex when gluebi is used -
thanks to lockdep.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2007-12-26 19:15:17 +02:00
Artem Bityutskiy e73f4459d9 UBI: add UBI devices reference counting
This is one more step on the way to "removable" UBI devices. It
adds reference counting for UBI devices. Every time a volume on
this device is opened - the device's refcount is increased. It
is also increased if someone is reading any sysfs file of this
UBI device or of one of its volumes.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2007-12-26 19:15:17 +02:00
Artem Bityutskiy d05c77a816 UBI: introduce volume refcounting
Add ref_count field to UBI volumes and remove weired "vol->removed"
field. This way things are better understandable and we do not have
to do whold show_attr operation under spinlock.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2007-12-26 19:15:16 +02:00
Artem Bityutskiy 35ad5fb76c UBI: fix and cleanup volume opening functions
This patch fixes error codes of the functions - if the device number
is out of range, -EINVAL should be returned. It also removes unneeded
try_module_get call from the open by name function.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2007-12-26 19:15:15 +02:00
Artem Bityutskiy 450f872a8e UBI: get device when opening volume
When a volume is opened, get its kref via get_device() call.
And put the reference when closing the volume. With this, we
may have a bit saner volume delete.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2007-12-26 19:15:15 +02:00
Artem Bityutskiy cae0a77125 UBI: tweak volumes locking
Transform vtbl_mutex to volumes_mutex - this just makes code
easier to understand.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2007-12-26 19:15:15 +02:00
Artem Bityutskiy 89b96b6929 UBI: improve internal interfaces
Pass volume description object to the EBA function which makes
more sense, and EBA function do not have to find the volume
description object by volume ID.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2007-12-26 19:15:15 +02:00
Artem Bityutskiy 49dfc29928 UBI: remove redundant field
Remove redundant ubi->major field - we have it in ubi->cdev.dev
already.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2007-12-26 19:15:14 +02:00
Artem Bityutskiy 393852ecfe UBI: add ubi_leb_map interface
The idea of this interface belongs to Adrian Hunter. The
interface is extremely useful when one has to have a guarantee
that an LEB will contain all 0xFFs even in case of an unclean
reboot. UBI does have an 'ubi_leb_erase()' call which may do
this, but it is stupid and ineffecient, because it flushes whole
queue. I should be re-worked to just be a pair of unmap,
map calls.

The user of the interfaci is UBIFS at the moment.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2007-12-26 19:15:14 +02:00
Jesper Juhl 0169b49d52 UBI: don't use array index before testing if it is negative
I can't find anything guaranteeing that 'ubi_num' cannot be <0 in
drivers/mtd/ubi/kapi.c::ubi_open_volume(), and in fact the code
even tests for that and errors out if so. Unfortunately the test
for "ubi_num < 0" happens after we've already used 'ubi_num' as
an array index - bad thing to do if it is negative.
This patch moves the test earlier in the function and then moves
the indexing using that variable after the check. A bit safer :-)

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2007-10-14 13:10:20 +03:00
Artem Bityutskiy 503990ebb2 UBI: remove unneeded error checks
Pointed to by viro.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2007-07-18 16:58:53 +03:00
Fernando Luis Vázquez Cao 2db61c95c0 UBI: cleanup usage of try_module_get
The use of try_module_get(THIS_MODULE) in ubi_get_device_info does not
offer real protection against unexpected driver unloads, since we could
be preempted before try_modules_get gets executed. It is the caller who
should manipulate the refcounts. Besides, ubi_get_device_info is an
exported symbol which guarantees protection when accessed through
symbol_get.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2007-07-18 16:58:45 +03:00
Artem Bityutskiy 4ab60a0d7c UBI: do not let to read too much
In case of static volumes it is prohibited to read more data
then available.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2007-07-18 16:52:42 +03:00
Artem B. Bityutskiy 801c135ce7 UBI: Unsorted Block Images
UBI (Latin: "where?") manages multiple logical volumes on a single
flash device, specifically supporting NAND flash devices. UBI provides
a flexible partitioning concept which still allows for wear-levelling
across the whole flash device.

In a sense, UBI may be compared to the Logical Volume Manager
(LVM). Whereas LVM maps logical sector numbers to physical HDD sector
numbers, UBI maps logical eraseblocks to physical eraseblocks.

More information may be found at
http://www.linux-mtd.infradead.org/doc/ubi.html

Partitioning/Re-partitioning

  An UBI volume occupies a certain number of erase blocks. This is
  limited by a configured maximum volume size, which could also be
  viewed as the partition size. Each individual UBI volume's size can
  be changed independently of the other UBI volumes, provided that the
  sum of all volume sizes doesn't exceed a certain limit.

  UBI supports dynamic volumes and static volumes. Static volumes are
  read-only and their contents are protected by CRC check sums.

Bad eraseblocks handling

  UBI transparently handles bad eraseblocks. When a physical
  eraseblock becomes bad, it is substituted by a good physical
  eraseblock, and the user does not even notice this.

Scrubbing

  On a NAND flash bit flips can occur on any write operation,
  sometimes also on read. If bit flips persist on the device, at first
  they can still be corrected by ECC, but once they accumulate,
  correction will become impossible. Thus it is best to actively scrub
  the affected eraseblock, by first copying it to a free eraseblock
  and then erasing the original. The UBI layer performs this type of
  scrubbing under the covers, transparently to the UBI volume users.

Erase Counts

  UBI maintains an erase count header per eraseblock. This frees
  higher-level layers (like file systems) from doing this and allows
  for centralized erase count management instead. The erase counts are
  used by the wear-levelling algorithm in the UBI layer. The algorithm
  itself is exchangeable.

Booting from NAND

  For booting directly from NAND flash the hardware must at least be
  capable of fetching and executing a small portion of the NAND
  flash. Some NAND flash controllers have this kind of support. They
  usually limit the window to a few kilobytes in erase block 0. This
  "initial program loader" (IPL) must then contain sufficient logic to
  load and execute the next boot phase.

  Due to bad eraseblocks, which may be randomly scattered over the
  flash device, it is problematic to store the "secondary program
  loader" (SPL) statically. Also, due to bit-flips it may become
  corrupted over time. UBI allows to solve this problem gracefully by
  storing the SPL in a small static UBI volume.

UBI volumes vs. static partitions

  UBI volumes are still very similar to static MTD partitions:

    * both consist of eraseblocks (logical eraseblocks in case of UBI
      volumes, and physical eraseblocks in case of static partitions;
    * both support three basic operations - read, write, erase.

  But UBI volumes have the following advantages over traditional
  static MTD partitions:

    * there are no eraseblock wear-leveling constraints in case of UBI
      volumes, so the user should not care about this;
    * there are no bit-flips and bad eraseblocks in case of UBI volumes.

  So, UBI volumes may be considered as flash devices with relaxed
  restrictions.

Where can it be found?

  Documentation, kernel code and applications can be found in the MTD
  gits.

What are the applications for?

  The applications help to create binary flash images for two purposes: pfi
  files (partial flash images) for in-system update of UBI volumes, and plain
  binary images, with or without OOB data in case of NAND, for a manufacturing
  step. Furthermore some tools are/and will be created that allow flash content
  analysis after a system has crashed..

Who did UBI?

  The original ideas, where UBI is based on, were developed by Andreas
  Arnez, Frank Haverkamp and Thomas Gleixner. Josh W. Boyer and some others
  were involved too. The implementation of the kernel layer was done by Artem
  B. Bityutskiy. The user-space applications and tools were written by Oliver
  Lohmann with contributions from Frank Haverkamp, Andreas Arnez, and Artem.
  Joern Engel contributed a patch which modifies JFFS2 so that it can be run on
  a UBI volume. Thomas Gleixner did modifications to the NAND layer. Alexander
  Schmidt made some testing work as well as core functionality improvements.

Signed-off-by: Artem B. Bityutskiy <dedekind@linutronix.de>
Signed-off-by: Frank Haverkamp <haver@vnet.ibm.com>
2007-04-27 14:23:33 +03:00