Commit graph

1957 commits

Author SHA1 Message Date
Linus Torvalds c5850150d0 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: stop using the page cache to back the buffer cache
  xfs: register the inode cache shrinker before quotachecks
  xfs: xfs_trans_read_buf() should return an error on failure
  xfs: introduce inode cluster buffer trylocks for xfs_iflush
  vmap: flush vmap aliases when mapping fails
  xfs: preallocation transactions do not need to be synchronous

Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_buf.c due to plug removal.
2011-03-28 15:51:02 -07:00
Dave Chinner 0e6e847ffe xfs: stop using the page cache to back the buffer cache
Now that the buffer cache has it's own LRU, we do not need to use
the page cache to provide persistent caching and reclaim
infrastructure. Convert the buffer cache to use alloc_pages()
instead of the page cache. This will remove all the overhead of page
cache management from setup and teardown of the buffers, as well as
needing to mark pages accessed as we find buffers in the buffer
cache.

By avoiding the page cache, we also remove the need to keep state in
the page_private(page) field for persistant storage across buffer
free/buffer rebuild and so all that code can be removed. This also
fixes the long-standing problem of not having enough bits in the
page_private field to track all the state needed for a 512
sector/64k page setup.

It also removes the need for page locking during reads as the pages
are unique to the buffer and nobody else will be attempting to
access them.

Finally, it removes the buftarg address space lock as a point of
global contention on workloads that allocate and free buffers
quickly such as when creating or removing large numbers of inodes in
parallel. This remove the 16TB limit on filesystem size on 32 bit
machines as the page index (32 bit) is no longer used for lookups
of metadata buffers - the buffer cache is now solely indexed by disk
address which is stored in a 64 bit field in the buffer.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
2011-03-26 09:16:45 +11:00
Dave Chinner 704b2907c2 xfs: register the inode cache shrinker before quotachecks
During mount, we can do a quotacheck that involves a bulkstat pass
on all inodes. If there are more inodes in the filesystem than can
be held in memory, we require the inode cache shrinker to run to
ensure that we don't run out of memory.

Unfortunately, the inode cache shrinker is not registered until we
get to the end of the superblock setup process, which is after a
quotacheck is run if it is needed. Hence we need to register the
inode cache shrinker earlier in the mount process so that we don't
OOM during mount. This requires that we also initialise the syncd
work before we register the shrinker, so we nee dto juggle that
around as well.

While there, make sure that we have set up the block sizes in the
VFS superblock correctly before the quotacheck is run so that any
inodes that are cached as a result of the quotacheck have their
block size fields set up correctly.

Cc: stable@kernel.org
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
2011-03-26 09:14:57 +11:00
Dave Chinner 7401aafd50 xfs: xfs_trans_read_buf() should return an error on failure
When inside a transaction and we fail to read a buffer,
xfs_trans_read_buf returns a null buffer pointer and no error.
xfs_do_da_buf() checks the error return, but not the buffer, and as
a result this read failure condition causes a panic when it attempts
to dereference the non-existant buffer.

Make xfs_trans_read_buf() return the same error for this situation
regardless of whether it is in a transaction or not. This means
every caller does not need to check both the error return and the
buffer before proceeding to use the buffer.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
2011-03-26 09:14:44 +11:00
Dave Chinner 1bfd8d0419 xfs: introduce inode cluster buffer trylocks for xfs_iflush
There is an ABBA deadlock between synchronous inode flushing in
xfs_reclaim_inode and xfs_icluster_free. xfs_icluster_free locks the
buffer, then takes inode ilocks, whilst synchronous reclaim takes
the ilock followed by the buffer lock in xfs_iflush().

To avoid this deadlock, separate the inode cluster buffer locking
semantics from the synchronous inode flush semantics, allowing
callers to attempt to lock the buffer but still issue synchronous IO
if it can get the buffer. This requires xfs_iflush() calls that
currently use non-blocking semantics to pass SYNC_TRYLOCK rather
than 0 as the flags parameter.

This allows xfs_reclaim_inode to avoid the deadlock on the buffer
lock and detect the failure so that it can drop the inode ilock and
restart the reclaim attempt on the inode. This allows
xfs_ifree_cluster to obtain the inode lock, mark the inode stale and
release it and hence defuse the deadlock situation. It also has the
pleasant side effect of avoiding IO in xfs_reclaim_inode when it
tries to next reclaim the inode as it is now marked stale.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2011-03-26 09:13:55 +11:00
Dave Chinner a19fb38096 vmap: flush vmap aliases when mapping fails
On 32 bit systems, vmalloc space is limited and XFS can chew through
it quickly as the vmalloc space is lazily freed. This can result in
failure to map buffers, even when there is apparently large amounts
of vmalloc space available. Hence, if we fail to map a buffer, purge
the aliases that have not yet been freed to hopefuly free up enough
vmalloc space to allow a retry to succeed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2011-03-26 09:13:42 +11:00
Dave Chinner 8287889742 xfs: preallocation transactions do not need to be synchronous
Preallocation and hole punch transactions are currently synchronous
and this is causing performance problems in some cases. The
transactions don't need to be synchronous as we don't need to
guarantee the preallocation is persistent on disk until a
fdatasync, fsync, sync operation occurs. If the file is opened
O_SYNC or O_DATASYNC, only then should the transaction be issued
synchronously.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2011-03-26 09:13:08 +11:00
Linus Torvalds 6c51038900 Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
  Documentation/iostats.txt: bit-size reference etc.
  cfq-iosched: removing unnecessary think time checking
  cfq-iosched: Don't clear queue stats when preempt.
  blk-throttle: Reset group slice when limits are changed
  blk-cgroup: Only give unaccounted_time under debug
  cfq-iosched: Don't set active queue in preempt
  block: fix non-atomic access to genhd inflight structures
  block: attempt to merge with existing requests on plug flush
  block: NULL dereference on error path in __blkdev_get()
  cfq-iosched: Don't update group weights when on service tree
  fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
  block: Require subsystems to explicitly allocate bio_set integrity mempool
  jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
  jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
  fs: make fsync_buffers_list() plug
  mm: make generic_writepages() use plugging
  blk-cgroup: Add unaccounted time to timeslice_used.
  block: fixup plugging stubs for !CONFIG_BLOCK
  block: remove obsolete comments for blkdev_issue_zeroout.
  blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
  ...

Fix up conflicts in fs/{aio.c,super.c}
2011-03-24 10:16:26 -07:00
Linus Torvalds 3155fe6df5 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (23 commits)
  xfs: don't name variables "panic"
  xfs: factor agf counter updates into a helper
  xfs: clean up the xfs_alloc_compute_aligned calling convention
  xfs: kill support/debug.[ch]
  xfs: Convert remaining cmn_err() callers to new API
  xfs: convert the quota debug prints to new API
  xfs: rename xfs_cmn_err_fsblock_zero()
  xfs: convert xfs_fs_cmn_err to new error logging API
  xfs: kill xfs_fs_mount_cmn_err() macro
  xfs: kill xfs_fs_repair_cmn_err() macro
  xfs: convert xfs_cmn_err to xfs_alert_tag
  xfs: Convert xlog_warn to new logging interface
  xfs: Convert linux-2.6/ files to new logging interface
  xfs: introduce new logging API.
  xfs: zero proper structure size for geometry calls
  xfs: enable delaylog by default
  xfs: more sensible inode refcounting for ialloc
  xfs: stop using xfs_trans_iget in the RT allocator
  xfs: check if device support discard in xfs_ioc_trim()
  xfs: prevent leaking uninitialized stack memory in FSGEOMETRY_V1
  ...
2011-03-21 14:24:56 -07:00
Linus Torvalds a44f99c7ef Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6
* 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: (25 commits)
  video: change to new flag variable
  scsi: change to new flag variable
  rtc: change to new flag variable
  rapidio: change to new flag variable
  pps: change to new flag variable
  net: change to new flag variable
  misc: change to new flag variable
  message: change to new flag variable
  memstick: change to new flag variable
  isdn: change to new flag variable
  ieee802154: change to new flag variable
  ide: change to new flag variable
  hwmon: change to new flag variable
  dma: change to new flag variable
  char: change to new flag variable
  fs: change to new flag variable
  xtensa: change to new flag variable
  um: change to new flag variables
  s390: change to new flag variable
  mips: change to new flag variable
  ...

Fix up trivial conflict in drivers/hwmon/Makefile
2011-03-20 18:14:55 -07:00
matt mooney 0ccd234ca0 fs: change to new flag variable
Replace EXTRA_CFLAGS with ccflags-y. And change ntfs-objs to ntfs-y
for cleaner conditional inclusion.

Signed-off-by: matt mooney <mfm@muteddisk.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2011-03-17 14:02:57 +01:00
Linus Torvalds 0f6e0e8448 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits)
  AppArmor: kill unused macros in lsm.c
  AppArmor: cleanup generated files correctly
  KEYS: Add an iovec version of KEYCTL_INSTANTIATE
  KEYS: Add a new keyctl op to reject a key with a specified error code
  KEYS: Add a key type op to permit the key description to be vetted
  KEYS: Add an RCU payload dereference macro
  AppArmor: Cleanup make file to remove cruft and make it easier to read
  SELinux: implement the new sb_remount LSM hook
  LSM: Pass -o remount options to the LSM
  SELinux: Compute SID for the newly created socket
  SELinux: Socket retains creator role and MLS attribute
  SELinux: Auto-generate security_is_socket_class
  TOMOYO: Fix memory leak upon file open.
  Revert "selinux: simplify ioctl checking"
  selinux: drop unused packet flow permissions
  selinux: Fix packet forwarding checks on postrouting
  selinux: Fix wrong checks for selinux_policycap_netpeer
  selinux: Fix check for xfrm selinux context algorithm
  ima: remove unnecessary call to ima_must_measure
  IMA: remove IMA imbalance checking
  ...
2011-03-16 09:15:43 -07:00
Linus Torvalds bd2895eead Merge branch 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix build failure introduced by s/freezeable/freezable/
  workqueue: add system_freezeable_wq
  rds/ib: use system_wq instead of rds_ib_fmr_wq
  net/9p: replace p9_poll_task with a work
  net/9p: use system_wq instead of p9_mux_wq
  xfs: convert to alloc_workqueue()
  reiserfs: make commit_wq use the default concurrency level
  ocfs2: use system_wq instead of ocfs2_quota_wq
  ext4: convert to alloc_workqueue()
  scsi/scsi_tgt_lib: scsi_tgtd isn't used in memory reclaim path
  scsi/be2iscsi,qla2xxx: convert to alloc_workqueue()
  misc/iwmc3200top: use system_wq instead of dedicated workqueues
  i2o: use alloc_workqueue() instead of create_workqueue()
  acpi: kacpi*_wq don't need WQ_MEM_RECLAIM
  fs/aio: aio_wq isn't used in memory reclaim path
  input/tps6507x-ts: use system_wq instead of dedicated workqueue
  cpufreq: use system_wq instead of dedicated workqueues
  wireless/ipw2x00: use system_wq instead of dedicated workqueues
  arm/omap: use system_wq in mailbox
  workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER
2011-03-16 08:20:19 -07:00
Aneesh Kumar K.V 5fe0c23788 exportfs: Return the minimum required handle size
The exportfs encode handle function should return the minimum required
handle size. This helps user to find out the handle size by passing 0
handle size in the first step and then redoing to the call again with
the returned handle size value.

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:28 -04:00
Alex Elder 0c9ba97318 xfs: don't name variables "panic"
The new xfs_alert_tag() used a variable named "panic",
and that is to be avoided.  Rename it.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-03-11 16:34:51 -06:00
Jens Axboe 4c63f5646e Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core
Conflicts:
	block/blk-core.c
	block/blk-flush.c
	drivers/md/raid1.c
	drivers/md/raid10.c
	drivers/md/raid5.c
	fs/nilfs2/btnode.c
	fs/nilfs2/mdt.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:58:35 +01:00
Jens Axboe 721a9602e6 block: kill off REQ_UNPLUG
With the plugging now being explicitly controlled by the
submitter, callers need not pass down unplugging hints
to the block layer. If they want to unplug, it's because they
manually plugged on their own - in which case, they should just
unplug at will.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:52:27 +01:00
Jens Axboe 7eaceaccab block: remove per-queue plugging
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:52:07 +01:00
Christoph Hellwig ecb6928fcf xfs: factor agf counter updates into a helper
Updating the AGF and transactions counters is duplicated between allocating
and freeing extents.  Factor the code into a common helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-03-09 08:23:47 -06:00
Christoph Hellwig 86fa8af69d xfs: clean up the xfs_alloc_compute_aligned calling convention
Pass a xfs_alloc_arg structure to xfs_alloc_compute_aligned and derive
the alignment and minlen paramters from it.  This cleans up the existing
callers, and we'll need even more information from the xfs_alloc_arg
in subsequent patches.  Based on a patch from Dave Chinner.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-03-09 08:23:33 -06:00
James Morris fe3fa43039 Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next 2011-03-08 11:38:10 +11:00
Dave Chinner 9130090b5f xfs: kill support/debug.[ch]
The remaining functionality in debug.[ch] is effectively just assert
handling, conditional debug definitions and hex dumping. The hex
dumping and assert function can be moved into the new printk module,
while the rest can be moved into top-level header files. This allows
fs/xfs/support/debug.[ch] to be completely removed from the
codebase.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:09:35 +11:00
Dave Chinner 0b932cccbd xfs: Convert remaining cmn_err() callers to new API
Once converted, kill the remainder of the cmn_err() interface.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:08:35 +11:00
Dave Chinner 8221112b43 xfs: convert the quota debug prints to new API
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:07:35 +11:00
Dave Chinner 6d4a8ecb34 xfs: rename xfs_cmn_err_fsblock_zero()
The "cmn_err" part of the function name is no longer relevant. Rename
the function to xfs_alert_fsblock_zero() to match the new logging
API.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:06:35 +11:00
Dave Chinner 5348778699 xfs: convert xfs_fs_cmn_err to new error logging API
Continue to clean up the error logging code by converting all the
callers of xfs_fs_cmn_err() to the new API. Once done, remove the
unused old API function.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:05:35 +11:00
Dave Chinner af34e09da4 xfs: kill xfs_fs_mount_cmn_err() macro
The xfs_fs_mount_cmn_err() hides a simple check as to whether the
mount path should output an error or not. Remove the macro and open
code the check.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:04:35 +11:00
Dave Chinner 65333b4c3d xfs: kill xfs_fs_repair_cmn_err() macro
In certain cases of inode corruption, the xfs_fs_repair_cmn_err()
macro is used to output an extra message in the corruption report.
That extra message is "unmount and run xfs_repair", which really
applies to any corruption report. Each case that this macro is
called (except one) a following call to xfs_corruption_error() is
made to optionally dump more information about the error.

Hence, move the output of "run xfs_repair" to xfs_corruption_error()
so that it is output on all corruption reports.  Also, convert the
callers of the repair macro that don't call xfs_corruption_error()
to call it, hence provide consiѕtent error reporting for all cases
where xfs_fs_repair_cmn_err() used to be called.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:03:35 +11:00
Dave Chinner 6a19d9393a xfs: convert xfs_cmn_err to xfs_alert_tag
Continue the conversion of the old cmn_err interface be converting
all the conditional panic tag errors to xfs_alert_tag() and then
removing xfs_cmn_err().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:02:35 +11:00
Dave Chinner a0fa2b679e xfs: Convert xlog_warn to new logging interface
Convert the xfs log operations to use the new error logging
interfaces. This removes the xlog_{warn,panic} wrappers and makes
almost all errors emit the device they belong to instead of just
refering to "XFS".

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:01:35 +11:00
Dave Chinner 4f10700a2e xfs: Convert linux-2.6/ files to new logging interface
Convert the files in fs/xfs/linux-2.6/ to use the new xfs_<level>
logging format that replaces the old Irix inherited cmn_err()
interfaces. While there, also convert naked printk calls to use the
relevant xfs logging function to standardise output format.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:00:35 +11:00
Alex Elder af24ee9ea8 xfs: zero proper structure size for geometry calls
Commit 493f3358cb added this call to
xfs_fs_geometry() in order to avoid passing kernel stack data back
to user space:

+       memset(geo, 0, sizeof(*geo));

Unfortunately, one of the callers of that function passes the
address of a smaller data type, cast to fit the type that
xfs_fs_geometry() requires.  As a result, this can happen:

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted
in: f87aca93

Pid: 262, comm: xfs_fsr Not tainted 2.6.38-rc6-493f3358cb2+ #1
Call Trace:

[<c12991ac>] ? panic+0x50/0x150
[<c102ed71>] ? __stack_chk_fail+0x10/0x18
[<f87aca93>] ? xfs_ioc_fsgeometry_v1+0x56/0x5d [xfs]

Fix this by fixing that one caller to pass the right type and then
copy out the subset it is interested in.

Note: This patch is an alternative to one originally proposed by
Eric Sandeen.

Reported-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu>
Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu>
2011-03-01 21:21:13 -06:00
Dave Chinner 10e38391c0 xfs: introduce new logging API.
Most of the logging infrastructure in XFS is unneccessary and
designed around the infrastructure supplied by Irix rather than
Linux. To rationalise the logging interfaces, start by introducing
simple printk wrappers similar to the dev_printk() infrastructure.
Later patches will convert code to use this new interface.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-02 14:20:59 +11:00
Alex Elder eeb2036b8a xfs: zero proper structure size for geometry calls
Commit 493f3358cb added this call to
xfs_fs_geometry() in order to avoid passing kernel stack data back
to user space:

+       memset(geo, 0, sizeof(*geo));

Unfortunately, one of the callers of that function passes the
address of a smaller data type, cast to fit the type that
xfs_fs_geometry() requires.  As a result, this can happen:

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted
in: f87aca93

Pid: 262, comm: xfs_fsr Not tainted 2.6.38-rc6-493f3358cb2+ #1
Call Trace:

[<c12991ac>] ? panic+0x50/0x150
[<c102ed71>] ? __stack_chk_fail+0x10/0x18
[<f87aca93>] ? xfs_ioc_fsgeometry_v1+0x56/0x5d [xfs]

Fix this by fixing that one caller to pass the right type and then
copy out the subset it is interested in.

Note: This patch is an alternative to one originally proposed by
Eric Sandeen.

Reported-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu>
Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu>
2011-03-01 21:19:59 -06:00
Christoph Hellwig 20ad9ea9be xfs: enable delaylog by default
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-02-22 20:33:25 -06:00
Christoph Hellwig ec3ba85f40 xfs: more sensible inode refcounting for ialloc
Currently we return iodes from xfs_ialloc with just a single reference held.
But we need two references, as one is dropped during transaction commit and
the second needs to be transfered to the VFS.  Change xfs_ialloc to use
xfs_iget plus xfs_trans_ijoin_ref to grab two references to the inode,
and remove the now superflous IHOLD calls from all callers.  This also
greatly simplifies the error handling in xfs_create and also allow to remove
xfs_trans_iget as no other callers are left.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-02-22 20:32:28 -06:00
Christoph Hellwig 1050c71e29 xfs: stop using xfs_trans_iget in the RT allocator
During mount we establish references to the RT inodes, which we keep for
the lifetime of the filesystem.  Instead of using xfs_trans_iget to grab
additional references when adding RT inodes to transactions use the
combination of xfs_ilock and xfs_trans_ijoin_ref, which archives the same
end result with less overhead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-02-22 20:30:21 -06:00
Lukas Czerner be715140b5 xfs: check if device support discard in xfs_ioc_trim()
Right now we, are relying on the fact that when we attempt to
actually do the discard, blkdev_issue_discar() returns -EOPNOTSUPP
and the user is informed that the device does not support discard.

However, in the case where the we do not hit any suitable free
extent to trim in FITRIM code, it will finish without any error.
This is very confusing, because it seems that FITRIM was successful
even though the device does not actually supports discard.

Solution: Check for the discard support before attempt to search for
free extents.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-02-22 15:08:44 -06:00
Dan Rosenberg 3a3675b7f2 xfs: prevent leaking uninitialized stack memory in FSGEOMETRY_V1
The FSGEOMETRY_V1 ioctl (and its compat equivalent) calls out to
xfs_fs_geometry() with a version number of 3.  This code path does not
fill in the logsunit member of the passed xfs_fsop_geom_t, leading to
the leaking of four bytes of uninitialized stack data to potentially
unprivileged callers.

v2 switches to memset() to avoid future issues if structure members
change, on suggestion of Dave Chinner.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Reviewed-by: Eugene Teo <eugeneteo@kernel.org>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-02-22 15:06:47 -06:00
Lukas Czerner 5d15765594 xfs: check if device support discard in xfs_ioc_trim()
Right now we, are relying on the fact that when we attempt to
actually do the discard, blkdev_issue_discar() returns -EOPNOTSUPP
and the user is informed that the device does not support discard.

However, in the case where the we do not hit any suitable free
extent to trim in FITRIM code, it will finish without any error.
This is very confusing, because it seems that FITRIM was successful
even though the device does not actually supports discard.

Solution: Check for the discard support before attempt to search for
free extents.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-02-21 20:39:00 -06:00
Dan Rosenberg c4d0c3b097 xfs: prevent leaking uninitialized stack memory in FSGEOMETRY_V1
The FSGEOMETRY_V1 ioctl (and its compat equivalent) calls out to
xfs_fs_geometry() with a version number of 3.  This code path does not
fill in the logsunit member of the passed xfs_fsop_geom_t, leading to
the leaking of four bytes of uninitialized stack data to potentially
unprivileged callers.

v2 switches to memset() to avoid future issues if structure members
change, on suggestion of Dave Chinner.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Reviewed-by: Eugene Teo <eugeneteo@kernel.org>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-02-21 19:55:47 -06:00
Tejun Heo 43d133c18b Merge branch 'master' into for-2.6.39 2011-02-21 09:43:56 +01:00
Christoph Hellwig 9681153b46 xfs: add lockdep annotations for the rt inodes
The rt bitmap and summary inodes do not participate in the normal inode
locking protocol.  Instead the rt bitmap inode can be locked in any
transaction involving rt allocations, and the both of the rt inodes can
be locked at the same time.  Add specific lockdep subclasses for the rt
inodes to prevent lockdep from blowing up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-02-07 13:29:18 -06:00
Christoph Hellwig 0d8b30ad19 xfs: fix xfs_get_extsz_hint for a zero extent size hint
We can easily set the extsize flag without setting an extent size
hint, or one that evaluates to zero.  Historically the di_extsize
field was only used when it was non-zero, but the commit

	"Cleanup inode extent size hint extraction"

broke this.  Restore the old behaviour, thus fixing xfsqa 090 with
a debug kernel.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-02-07 13:29:14 -06:00
Christoph Hellwig 04e99455ea xfs: only lock the rt bitmap inode once per allocation
Currently both xfs_rtpick_extent and xfs_rtallocate_extent call
xfs_trans_iget to grab and lock the rt bitmap inode, which results in a
deadlock since the removal of the lock recursion counters in commit

	"xfs: simplify inode to transaction joining"

Fix this by acquiring and locking the inode in xfs_bmap_rtalloc before
calling into xfs_rtpick_extent and xfs_rtallocate_extent.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-02-07 13:29:06 -06:00
Eric Paris 2a7dba391e fs/vfs/security: pass last path component to LSM on inode creation
SELinux would like to implement a new labeling behavior of newly created
inodes.  We currently label new inodes based on the parent and the creating
process.  This new behavior would also take into account the name of the
new object when deciding the new label.  This is not the (supposed) full path,
just the last component of the path.

This is very useful because creating /etc/shadow is different than creating
/etc/passwd but the kernel hooks are unable to differentiate these
operations.  We currently require that userspace realize it is doing some
difficult operation like that and than userspace jumps through SELinux hoops
to get things set up correctly.  This patch does not implement new
behavior, that is obviously contained in a seperate SELinux patch, but it
does pass the needed name down to the correct LSM hook.  If no such name
exists it is fine to pass NULL.

Signed-off-by: Eric Paris <eparis@redhat.com>
2011-02-01 11:12:29 -05:00
Tejun Heo 83e759043a xfs: convert to alloc_workqueue()
Convert from create[_singlethread]_workqueue() to alloc_workqueue().

* xfsdatad_workqueue and xfsconvertd_workqueue are identity converted.
  Using higher concurrency limit might be useful but given the
  complexity of workqueue usage in xfs, proceeding cautiously seems
  better.

* xfs_mru_reap_wq is converted to non-ordered workqueue with max
  concurrency of 1 as the work items don't require any specific
  ordering and already have proper synchronization.  It seems it was
  singlethreaded to save worker threads, which is no longer a concern.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Alex Elder <aelder@sgi.com>
Cc: xfs-masters@oss.sgi.com
Cc: Christoph Hellwig <hch@infradead.org>
2011-02-01 11:42:43 +01:00
bpm@sgi.com 24446fc66f xfs: xfs_bmap_add_extent_delay_real should init br_startblock
When filling in the middle of a previous delayed allocation in
xfs_bmap_add_extent_delay_real, set br_startblock of the new delay
extent to the right to nullstartblock instead of 0 before inserting
the extent into the ifork (xfs_iext_insert), rather than setting
br_startblock afterward.

Adding the extent into the ifork with br_startblock=0 can lead to
the extent being copied into the btree by xfs_bmap_extent_to_btree
if we happen to convert from extents format to btree format before
updating br_startblock with the correct value.  The unexpected
addition of this delay extent to the btree can cause subsequent
XFS_WANT_CORRUPTED_GOTO filesystem shutdown in several
xfs_bmap_add_extent_delay_real cases where we are converting a delay
extent to real and unexpectedly find an extent already inserted.
For example:

911         case BMAP_LEFT_FILLING:
912                 /*
913                  * Filling in the first part of a previous delayed allocation.
914                  * The left neighbor is not contiguous.
915                  */
916                 trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_);
917                 xfs_bmbt_set_startoff(ep, new_endoff);
918                 temp = PREV.br_blockcount - new->br_blockcount;
919                 xfs_bmbt_set_blockcount(ep, temp);
920                 xfs_iext_insert(ip, idx, 1, new, state);
921                 ip->i_df.if_lastex = idx;
922                 ip->i_d.di_nextents++;
923                 if (cur == NULL)
924                         rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
925                 else {
926                         rval = XFS_ILOG_CORE;
927                         if ((error = xfs_bmbt_lookup_eq(cur, new->br_startoff,
928                                         new->br_startblock, new->br_blockcount,
929                                         &i)))
930                                 goto done;
931                         XFS_WANT_CORRUPTED_GOTO(i == 0, done);

With the bogus extent in the btree we shutdown the filesystem at
931.  The conversion from extents to btree format happens when the
number of extents in the inode increases above ip->i_df.if_ext_max.
xfs_bmap_extent_to_btree copies extents from the ifork into the
btree, ignoring all delalloc extents which are denoted by
br_startblock having some value of nullstartblock.

SGI-PV: 1013221

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-01-28 09:13:29 -06:00
Dave Chinner 0fbca4d1c3 xfs: fix dquot shaker deadlock
Commit 368e136 ("xfs: remove duplicate code from dquot reclaim") fails
to unlock the dquot freelist when the number of loop restarts is
exceeded in xfs_qm_dqreclaim_one(). This causes hangs in memory
reclaim.

Rework the loop control logic into an unwind stack that all the
different cases jump into. This means there is only one set of code
that processes the loop exit criteria, and simplifies the unlocking
of all the items from different points in the loop. It also fixes a
double increment of the restart counter from the qi_dqlist_lock
case.

Reported-by: Malcolm Scott <lkml@malc.org.uk>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
2011-01-28 09:05:36 -06:00
Dave Chinner c6f990d1ff xfs: handle CIl transaction commit failures correctly
Failure to commit a transaction into the CIL is not handled
correctly. This currently can only happen when racing with a
shutdown and requires an explicit shutdown check, so it rare and can
be avoided. Remove the shutdown check and make the CIL commit a void
function to indicate it will always succeed, thereby removing the
incorrectly handled failure case.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2011-01-28 09:05:36 -06:00