Commit graph

601776 commits

Author SHA1 Message Date
Christoph Hellwig 3176c3e0ef xfs: kill ioflags
Now that we have the direct I/O kiocb flag there is no real need to sample
the value inside of XFS, and the invis flag was always just partially used
and isn't worth keeping this infrastructure around for.   This also splits
the read tracepoint into buffered vs direct as we've done for writes a long
time ago.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-20 11:31:42 +10:00
Christoph Hellwig 8f3e2058e1 xfs: don't pass ioflags around in the ioctl path
Instead check the file pointer for the invisble I/O flag directly, and
use the chance to drop redundant arguments from the xfs_ioc_space
prototype.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-20 11:29:35 +10:00
Brian Foster 9c7504aa72 xfs: track and serialize in-flight async buffers against unmount
Newly allocated XFS metadata buffers are added to the LRU once the hold
count is released, which typically occurs after I/O completion. There is
no other mechanism at current that tracks the existence or I/O state of
a new buffer. Further, readahead I/O tends to be submitted
asynchronously by nature, which means the I/O can remain in flight and
actually complete long after the calling context is gone. This means
that file descriptors or any other holds on the filesystem can be
released, allowing the filesystem to be unmounted while I/O is still in
flight. When I/O completion occurs, core data structures may have been
freed, causing completion to run into invalid memory accesses and likely
to panic.

This problem is reproduced on XFS via directory readahead. A filesystem
is mounted, a directory is opened/closed and the filesystem immediately
unmounted. The open/close cycle triggers a directory readahead that if
delayed long enough, runs buffer I/O completion after the unmount has
completed.

To address this problem, add a mechanism to track all in-flight,
asynchronous buffers using per-cpu counters in the buftarg. The buffer
is accounted on the first I/O submission after the current reference is
acquired and unaccounted once the buffer is returned to the LRU or
freed. Update xfs_wait_buftarg() to wait on all in-flight I/O before
walking the LRU list. Once in-flight I/O has completed and the workqueue
has drained, all new buffers should have been released onto the LRU.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-20 11:15:28 +10:00
Brian Foster c891c30a4d xfs: exclude never-released buffers from buftarg I/O accounting
The upcoming buftarg I/O accounting mechanism maintains a count of
all buffers that have undergone I/O in the current hold-release
cycle.  Certain buffers associated with core infrastructure (e.g.,
the xfs_mount superblock buffer, log buffers) are never released,
however. This means that accounting I/O submission on such buffers
elevates the buftarg count indefinitely and could lead to lockup on
unmount.

Define a new buffer flag to explicitly exclude buffers from buftarg
I/O accounting. Set the flag on the superblock and associated log
buffers.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-20 11:13:43 +10:00
Eric Sandeen 5539d36752 xfs: don't reset b_retries to 0 on every failure
With the code as it stands today, b_retries never increments because
it gets reset to 0 in the error callback.

Remove that, and fix a similar problem where the first retry time
was constantly being overwritten, which defeated the timeout tunable
as well.  We now only set first retry time if a non-zero timeout is
set, to match the behavior of only incrementing retries if a retry
value is set.

This way max retries & timeouts consistently take effect after a
tunable is set, rather than acting retroactively on a buffer which
has failed at some point in the past and has accumulated state from
those prior failures.

Thanks to dchinner for talking through this with me.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-20 10:54:09 +10:00
Eric Sandeen 0b4db5dff3 xfs: remove extraneous buffer flag changes
Fix up a couple places where extra flag manipulation occurs.

In the first case we clear XBF_ASYNC and then immediately reset it -
so don't bother clearing in the first place.

In the 2nd case we are at a point in the function where the buffer
must already be async, so there is no need to reset it.

Add consistent spacing around the " | " while we're at it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-20 10:53:22 +10:00
Eric Sandeen e97f6c545f xfs: fix xfs_error_get_cfg for negative errnos
xfs_error_get_cfg() is called with bp->b_error as an arg, which is
negative, so the switch statement won't ever find any matches.

This results in only the default error handler having any effect, as
EIO/ENOSPC/ENODEV get ignored due to the wrong sign.

It seems simplest to always flip the error sign to positive, so that
we can handle either negative errors in bp->b_error, or possibly a
positive errno via something like xfs_error_get_cfg(EIO) - this
future-proofs the function.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-20 10:48:51 +10:00
Hou Tao ad70328a50 xfs: remove the magic numbers in xfs_btree_block-related len macros
replace the magic numbers by offsetof(...) and sizeof(...), and add two
extra checks on xfs_check_ondisk_structs()

[dchinner: renamed header structures to be more descriptive]

Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-20 10:43:11 +10:00
Kaho Ng fbfb24bf10 xfs: indentation fix in xfs_btree_get_iroot()
The indentation in this function is different from the other functions.
Those spacebars are converted to tabs to improve readability.

Signed-off-by: Kaho Ng <ngkaho1234@gmail.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-20 10:37:50 +10:00
Dan Carpenter fbc21f33cd xfs: don't allow negative error tags
Errors go from zero which means no error to XFS_ERRTAG_MAX (22).  My
static checker complains that xfs_errortag_add() puts an upper bound on
this but not a lower bound.  Let's fix it by making it unsigned.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-20 10:37:13 +10:00
Jann Horn 7f1b62457b xfs: fix type confusion in xfs_ioc_swapext
When calling fdget() in xfs_ioc_swapext(), we need to verify that
the file descriptors passed into the ioctl point to XFS inodes
before we start operations on them. If we don't do this, we could be
referencing arbitrary kernel memory as an XFS inode. THis could lead
to memory corruption and/or performing locking operations on
attacker-chosen structures in kernel memory.

[dchinner: rewrite commit message ]
[dchinner: add comment explaining new check ]

Signed-off-by: Jann Horn <jann@thejh.net>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-20 10:30:30 +10:00
Dave Chinner f477cedc4e Merge branch 'xfs-4.8-misc-fixes-2' into for-next 2016-06-21 11:55:13 +10:00
Darrick J. Wong 19b54ee66c xfs: refactor btree maxlevels computation
Create a common function to calculate the maximum height of a per-AG
btree.  This will eventually be used by the rmapbt and refcountbt
code to calculate appropriate maxlevels values for each.  This is
important because the verifiers and the transaction block
reservations depend on accurate estimates of how many blocks are
needed to satisfy a btree split.

We were mistakenly using the max bnobt height for all the btrees,
which creates a dangerous situation since the larger records and
keys in an rmapbt make it very possible that the rmapbt will be
taller than the bnobt and so we can run out of transaction block
reservation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 11:53:28 +10:00
Darrick J. Wong e66a4c678e xfs: convert list of extents to free into a regular list
In struct xfs_bmap_free, convert the open-coded free extent list to
a regular list, then use list_sort to sort it prior to processing.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 11:53:28 +10:00
Dave Chinner 4d89e20bf1 xfs: separate freelist fixing into a separate helper
Break up xfs_free_extent() into a helper that fixes the freelist.
This helper will be used subsequently to ensure the freelist during
deferred rmap processing.

[darrick: refactor to put this at the head of the patchset]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 11:53:28 +10:00
Darrick J. Wong 59bad075bd xfs: rearrange xfs_bmap_add_free parameters
This is already in xfsprogs' libxfs, so port it to the kernel.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 11:53:28 +10:00
Darrick J. Wong 128f24d5d9 xfs: check for a valid error_tag in errortag_add
Currently we don't check the error_tag when someone's trying to set up
error injection testing.  If userspace passes in a value we don't know
about, send back an error.  This will help xfstests to _notrun a test
that uses error injection to test things like log replay.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 11:53:28 +10:00
Darrick J. Wong 479c641273 xfs: enable buffer deadlock postmortem diagnosis via ftrace
Create a second buf_trylock tracepoint so that we can distinguish
between a successful and a failed trylock.  With this piece, we can
use a script to look at the ftrace output to detect buffer deadlocks.

[dchinner: update to if/else as per hch's suggestion]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 11:53:28 +10:00
Darrick J. Wong 3f94c441e2 xfs: check offsets of variable length structures
Some of the directory/attr structures contain variable-length objects,
so the enclosing structure doesn't have a meaningful fixed size at
compile time.  We can check the offsets of the members before the
variable-length member, so do those.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 11:53:28 +10:00
Brian Foster 408fd48461 xfs: refactor xfs_reserve_blocks() to handle ENOSPC correctly
xfs_reserve_blocks() is responsible to update the XFS reserved block
pool count at mount time or based on user request. When the caller
requests to increase the reserve pool, blocks must be allocated from
the global counters such that they are no longer available for
general purpose use. If the requested reserve pool size is too
large, XFS reserves what blocks are available. The implementation
requires looking at the percpu counters and making an educated guess
as to how many blocks to try and allocate from xfs_mod_fdblocks(),
which can return -ENOSPC if the guess was not accurate due to
counters being modified in parallel.

xfs_reserve_blocks() retries the guess in this scenario until the
allocation succeeds or it is determined that there is no space
available in the fs. While not easily reproducible in the current
form, the retry code doesn't actually work correctly if
xfs_mod_fdblocks() actually fails. The problem is that the percpu
calculations use the m_resblks counter to determine how many blocks
to allocate, but unconditionally update m_resblks before the block
allocation has actually succeeded.  Therefore, if xfs_mod_fdblocks()
fails, the code jumps to the retry label and uses the already
updated m_resblks value to determine how many blocks to try and
allocate. If the percpu counters previously suggested that the
entire request was available, fdblocks_delta could end up set to 0.
In that case, m_resblks is updated to the requested value, yet no
blocks have been reserved at all.

Refactor xfs_reserve_blocks() to use an explicit loop and make the
code easier to follow. Since we have to drop the spinlock across the
xfs_mod_fdblocks() call, use a delta value for m_resblks as well and
only apply the delta once allocation succeeds.

[dchinner: convert to do {} while() loop]

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 11:53:28 +10:00
Brian Foster fa5a4f57dd xfs: cancel eofblocks background trimming on remount read-only
The filesystem quiesce sequence performs the operations necessary to
drain all background work, push pending transactions through the log
infrastructure and wait on I/O resulting from the final AIL push. We
have had reports of remount,ro hangs in xfs_log_quiesce() ->
xfs_wait_buftarg(), however, and some instrumentation code to detect
transaction commits at this point in the quiesce sequence has inculpated
the eofblocks background scanner as a cause.

While higher level remount code generally prevents user modifications by
the time the filesystem has made it to xfs_log_quiesce(), the background
scanner may still be alive and can perform pending work at any time. If
this occurs between the xfs_log_force() and xfs_wait_buftarg() calls
within xfs_log_quiesce(), this can lead to an indefinite lockup in
xfs_wait_buftarg().

To prevent this problem, cancel the background eofblocks scan worker
during the remount read-only quiesce sequence. This suspends background
trimming when a filesystem is remounted read-only. This is only done in
the remount path because the freeze codepath has already locked out new
transactions by the time the filesystem attempts to quiesce (and thus
waiting on an active work item could deadlock). Kick the eofblocks
worker to pick up where it left off once an fs is remounted back to
read-write.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 11:53:28 +10:00
Dave Chinner 9b7fad2076 Merge branch 'xfs-4.8-iomap-write' into for-next 2016-06-21 10:10:38 +10:00
Dave Chinner 07931b7be7 Merge branch 'fs-4.8-iomap-infrastructure' into for-next 2016-06-21 10:10:19 +10:00
Christoph Hellwig 3c2bdc912a xfs: kill xfs_zero_remaining_bytes
Instead punch the whole first, and the use the our zeroing helper
to punch out the edge blocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 10:02:23 +10:00
Christoph Hellwig bdb0d04fa6 xfs: split xfs_free_file_space in manageable pieces
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 10:00:55 +10:00
Christoph Hellwig 570b6211b8 xfs: use xfs_zero_range in xfs_zero_eof
We now skip holes in it, so no need to have the caller do it as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:57:26 +10:00
Christoph Hellwig 7bb41db3ea xfs: handle 64-bit length in xfs_iozero
We'll want to use this code for large offsets now that we're
skipping holes and unwritten extents efficiently.  Also rename it to
xfs_zero_range to be a bit more descriptive, and tell the caller if
we actually did any zeroing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:56:26 +10:00
Christoph Hellwig 459f0fbc2a xfs: use iomap infrastructure for DAX zeroing
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:55:18 +10:00
Christoph Hellwig d2bb140e99 xfs: use iomap fiemap implementation
Note that this removes support for the untested FIEMAP_FLAG_XATTR.  It
could be added relatively easily with iomap ops for the attr fork, but
without test coverage I don't feel safe doing this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:54:53 +10:00
Christoph Hellwig 6e8a27a816 xfs: remove buffered write support from __xfs_get_blocks
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:53:45 +10:00
Christoph Hellwig 68a9f5e700 xfs: implement iomap based buffered write path
Convert XFS to use the new iomap based multipage write path. This involves
implementing the ->iomap_begin and ->iomap_end methods, and switching the
buffered file write, page_mkwrite and xfs_iozero paths to the new iomap
helpers.

With this change __xfs_get_blocks will never be used for buffered writes,
and the code handling them can be removed.

Based on earlier code from Dave Chinner.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:53:44 +10:00
Christoph Hellwig f0c6bcba74 xfs: reorder zeroing and flushing sequence in truncate
Currently zeroing out blocks and waiting for writeout is a bit of a mess in
truncate.  This patch gives it a clear order in preparation for the iomap
path:

 (1) we first wait for any direct I/O to complete to prevent any races
     for it
 (2) we then perform the actual zeroing, and only use the truncate_page
     helpers for truncating down.  The truncate up case already is
     handled by the separate call to xfs_zero_eof.
 (3) only then we write back dirty data, as zeroing block may cause
     dirty pages when using either xfs_zero_eof or the new iomap
     infrastructure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:52:47 +10:00
Christoph Hellwig 3b3dce0527 xfs: make xfs_bmbt_to_iomap available outside of xfs_pnfs.c
And ensure it works for RT subvolume files an set the block device,
both of which will be needed to be able to use the function in the
buffered write path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:52:47 +10:00
Christoph Hellwig 8be9f564d2 fs: iomap based fiemap implementation
Add a simple fiemap implementation based on iomap_ops, partially based
on a previous implementation from Bob Peterson <rpeterso@redhat.com>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:38:45 +10:00
Christoph Hellwig 9a286f0e52 fs: support DAX based iomap zeroing
This avoid needing a separate inefficient get_block based DAX zero_range
implementation in file systems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:31:39 +10:00
Christoph Hellwig ae259a9c85 fs: introduce iomap infrastructure
Add infrastructure for multipage buffered writes.  This is implemented
using an main iterator that applies an actor function to a range that
can be written.

This infrastucture is used to implement a buffered write helper, one
to zero file ranges and one to implement the ->page_mkwrite VM
operations.  All of them borrow a fair amount of code from fs/buffers.
for now by using an internal version of __block_write_begin that
gets passed an iomap and builds the corresponding buffer head.

The file system is gets a set of paired ->iomap_begin and ->iomap_end
calls which allow it to map/reserve a range and get a notification
once the write code is finished with it.

Based on earlier code from Dave Chinner.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:23:11 +10:00
Christoph Hellwig 199a31c6d9 fs: move struct iomap from exportfs.h to a separate header
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:22:39 +10:00
Dave Chinner 26f1fe858f xfs: reduce lock hold times in buffer writeback
When we have a lot of metadata to flush from the AIL, the buffer
list can get very long. The current submission code tries to batch
submission to optimise IO order of the metadata (i.e. ascending
block order) to maximise block layer merging or IO to adjacent
metadata blocks.

Unfortunately, the method used can result in long lock times
occurring as buffers locked early on in the buffer list might not be
dispatched until the end of the IO licst processing. This is because
sorting does not occur util after the buffer list has been processed
and the buffers that are going to be submitted are locked. Hence
when the buffer list is several thousand buffers long, the lock hold
times before IO dispatch can be significant.

To fix this, sort the buffer list before we start trying to lock and
submit buffers. This means we can now submit buffers immediately
after they are locked, allowing merging to occur immediately on the
plug and dispatch to occur as quickly as possible. This means there
is minimal delay between locking the buffer and IO submission
occuring, hence reducing the worst case lock hold times seen during
delayed write buffer IO submission signficantly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-01 17:38:15 +10:00
Christoph Hellwig 4478fb1f2d xfs: define XFS_IOC_FREEZE even if FIFREEZE is defined
And the same for XFS_IOC_THAW.  Just because we now have a common
version of the ioctl we still need to provide the old name for it
for anyone using those.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-01 17:38:15 +10:00
Eric Sandeen 0d5a75e9e2 xfs: make several functions static
Al Viro noticed that xfs_lock_inodes should be static, and
that led to ... a few more.

These are just the easy ones, others require moving functions
higher in source files, so that's not done here to keep
this review simple.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-01 17:38:15 +10:00
Brian Foster 0c871f9a10 xfs: remove spurious shutdown type check from xfs_bmap_finish()
The static checker reports that after commit 8d99fe92fe ("xfs: fix
efi/efd error handling to avoid fs shutdown hangs"), the code has been
reworked such that error == -EFSCORRUPTED is not possible in this
codepath.

Remove the spurious error check and just use SHUTDOWN_META_IO_ERROR
unconditionally.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-01 17:38:15 +10:00
Brian Foster a3916e528b xfs: fix broken multi-fsb buffer logging
Multi-block buffers are logged based on buffer offset in
xfs_trans_log_buf(). xfs_buf_item_log() ultimately walks each mapping in
the buffer and marks the associated range to be logged in the
xfs_buf_log_format bitmap for that mapping. This code is broken,
however, in that it marks the actual buffer offsets of the associated
range in each bitmap rather than shifting to the byte range for that
particular mapping.

For example, on a 4k fsb fs, buffer offset 4096 refers to the first byte
of the second mapping in the buffer. This means byte 0 of the second log
format bitmap should be tagged as dirty. Instead, the current code marks
byte offset 4096 of the second log format bitmap, which is invalid and
potentially out of range of the mapping.

As a result of this, the log item format code invoked at transaction
commit time is not be able to correctly identify what parts of the
buffer to copy into log vectors. This can lead to NULL log vector
pointer dereferences in CIL push context if the item format code was not
able to locate any dirty ranges at all. This crash has been reproduced
on a 4k FSB filesystem using 16k directory blocks where an unlink
operation happened not to log anything in the first block of the
mapping. The logged offsets were all over 4k, marked as such in the
subsequent log format mappings, and thus left the transaction with an
xfs_log_item that is marked DIRTY but without any logged regions.

Further, even when the logged regions are marked correctly in the buffer
log format bitmaps, the format code doesn't copy the correct ranges of
the buffer into the log. This means that any logged region beyond the
first block of a multi-block buffer is subject to corruption after a
crash and log recovery sequence. This is due to a failure to convert the
mapping bm_len field from basic blocks to bytes in the buffer offset
tracking code in xfs_buf_item_format().

Update xfs_buf_item_log() to convert buffer offsets to segment relative
offsets when logging multi-block buffers. This ensures that the modified
regions of a buffer are logged correctly and avoids the aforementioned
crash. Also update xfs_buf_item_format() to correctly track the source
offset into the buffer for the log vector formatting code. This ensures
that the correct data is copied into the log.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-01 17:38:12 +10:00
Linus Torvalds 1a695a905c Linux 4.7-rc1 2016-05-29 09:29:24 -07:00
George Spelvin e0ab7af9bd hash_string: Fix zero-length case for !DCACHE_WORD_ACCESS
The self-test was updated to cover zero-length strings; the function
needs to be updated, too.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Fixes: fcfd2fbf22 ("fs/namei.c: Add hashlen_string() function")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-29 07:33:47 -07:00
George Spelvin f2a031b66e Rename other copy of hash_string to hashlen_string
The original name was simply hash_string(), but that conflicted with a
function with that name in drivers/base/power/trace.c, and I decided
that calling it "hashlen_" was better anyway.

But you have to do it in two places.

[ This caused build errors for architectures that don't define
  CONFIG_DCACHE_WORD_ACCESS   - Linus ]

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: fcfd2fbf22 ("fs/namei.c: Add hashlen_string() function")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-28 22:34:33 -07:00
Mikulas Patocka 037369b872 hpfs: implement the show_options method
The HPFS filesystem used generic_show_options to produce string that is
displayed in /proc/mounts.  However, there is a problem that the options
may disappear after remount.  If we mount the filesystem with option1
and then remount it with option2, /proc/mounts should show both option1
and option2, however it only shows option2 because the whole option
string is replaced with replace_mount_options in hpfs_remount_fs.

To fix this bug, implement the hpfs_show_options function that prints
options that are currently selected.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-28 16:50:24 -07:00
Mikulas Patocka 01d6e08711 affs: fix remount failure when there are no options changed
Commit c8f33d0bec ("affs: kstrdup() memory handling") checks if the
kstrdup function returns NULL due to out-of-memory condition.

However, if we are remounting a filesystem with no change to
filesystem-specific options, the parameter data is NULL.  In this case,
kstrdup returns NULL (because it was passed NULL parameter), although no
out of memory condition exists.  The mount syscall then fails with
ENOMEM.

This patch fixes the bug.  We fail with ENOMEM only if data is non-NULL.

The patch also changes the call to replace_mount_options - if we didn't
pass any filesystem-specific options, we don't call
replace_mount_options (thus we don't erase existing reported options).

Fixes: c8f33d0bec ("affs: kstrdup() memory handling")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org	# v4.1+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-28 16:50:24 -07:00
Mikulas Patocka 44d51706b4 hpfs: fix remount failure when there are no options changed
Commit ce657611ba ("hpfs: kstrdup() out of memory handling") checks if
the kstrdup function returns NULL due to out-of-memory condition.

However, if we are remounting a filesystem with no change to
filesystem-specific options, the parameter data is NULL.  In this case,
kstrdup returns NULL (because it was passed NULL parameter), although no
out of memory condition exists.  The mount syscall then fails with
ENOMEM.

This patch fixes the bug.  We fail with ENOMEM only if data is non-NULL.

The patch also changes the call to replace_mount_options - if we didn't
pass any filesystem-specific options, we don't call
replace_mount_options (thus we don't erase existing reported options).

Fixes: ce657611ba ("hpfs: kstrdup() out of memory handling")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-28 16:50:24 -07:00
Linus Torvalds 4029632c34 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull more MIPS updates from Ralf Baechle:
 "This is the secondnd batch of MIPS patches for 4.7. Summary:

  CPS:
   - Copy EVA configuration when starting secondary VPs.

  EIC:
   - Clear Status IPL.

  Lasat:
   - Fix a few off by one bugs.

  lib:
   - Mark intrinsics notrace.  Not only are the intrinsics
     uninteresting, it would cause infinite recursion.

  MAINTAINERS:
   - Add file patterns for MIPS BRCM device tree bindings.
   - Add file patterns for mips device tree bindings.

  MT7628:
   - Fix MT7628 pinmux typos.
   - wled_an pinmux gpio.
   - EPHY LEDs pinmux support.

  Pistachio:
   - Enable KASLR

  VDSO:
   - Build microMIPS VDSO for microMIPS kernels.
   - Fix aliasing warning by building with `-fno-strict-aliasing' for
     debugging but also tracing them might result in recursion.

  Misc:
   - Add missing FROZEN hotplug notifier transitions.
   - Fix clk binding example for varioius PIC32 devices.
   - Fix cpu interrupt controller node-names in the DT files.
   - Fix XPA CPU feature separation.
   - Fix write_gc0_* macros when writing zero.
   - Add inline asm encoding helpers.
   - Add missing VZ accessor microMIPS encodings.
   - Fix little endian microMIPS MSA encodings.
   - Add 64-bit HTW fields and fix its configuration.
   - Fix sigreturn via VDSO on microMIPS kernel.
   - Lots of typo fixes.
   - Add definitions of SegCtl registers and use them"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (49 commits)
  MIPS: Add missing FROZEN hotplug notifier transitions
  MIPS: Build microMIPS VDSO for microMIPS kernels
  MIPS: Fix sigreturn via VDSO on microMIPS kernel
  MIPS: devicetree: fix cpu interrupt controller node-names
  MIPS: VDSO: Build with `-fno-strict-aliasing'
  MIPS: Pistachio: Enable KASLR
  MIPS: lib: Mark intrinsics notrace
  MIPS: Fix 64-bit HTW configuration
  MIPS: Add 64-bit HTW fields
  MAINTAINERS: Add file patterns for mips device tree bindings
  MAINTAINERS: Add file patterns for mips brcm device tree bindings
  MIPS: Simplify DSP instruction encoding macros
  MIPS: Add missing tlbinvf/XPA microMIPS encodings
  MIPS: Fix little endian microMIPS MSA encodings
  MIPS: Add missing VZ accessor microMIPS encodings
  MIPS: Add inline asm encoding helpers
  MIPS: Spelling fix lets -> let's
  MIPS: VR41xx: Fix typo
  MIPS: oprofile: Fix typo
  MIPS: math-emu: Fix typo
  ...
2016-05-28 16:41:39 -07:00
Guenter Roeck d66492bce1 fs: fix binfmt_aout.c build error
Various builds (such as i386:allmodconfig) fail with

  fs/binfmt_aout.c:133:2: error: expected identifier or '(' before 'return'
  fs/binfmt_aout.c:134:1: error: expected identifier or '(' before '}' token

[ Oops. My bad, I had stupidly thought that "allmodconfig" covered this
  on x86-64 too, but it obviously doesn't.  Egg on my face.  - Linus ]

Fixes: 5d22fc25d4 ("mm: remove more IS_ERR_VALUE abuses")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-28 16:34:59 -07:00