redonkable/remarkable-linux

Author	SHA1	Message	Date
Christoph Hellwig	4346cdd464	xfs: cleanup xfs_find_handle Remove the superflous igrab by keeping a reference on the path/file all the time and clean up various bits of surrounding code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com>	2009-02-08 21:51:14 +01:00
Josef 'Jeff' Sipek	ef8f7fc549	xfs: cleanup error handling in xfs_swap_extents Use multiple lables for proper error unwinding and get rid of some now superflous variables. Signed-off-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com>	2009-02-04 09:37:43 +01:00
Christoph Hellwig	d4bb6d0698	xfs: merge xfs_inode_flush into xfs_fs_write_inode Splitting the task for a VFS-induced inode flush into two functions doesn't make any sense, so merge the two functions dealing with it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-02-04 09:36:19 +01:00
Christoph Hellwig	e1486dea0b	xfs: factor out attr fork reset handling We currently duplicate code to reset the attribute fork after the last attribute has been deleted. Factor this out into a small helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com>	2009-02-04 09:36:00 +01:00
Christoph Hellwig	c52e9fd8a9	xfs: remove unused XFS_MOUNT_ILOCK/XFS_MOUNT_IUNLOCK These aren't only unused but also reference a lock that doesn't exist anymore. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com>	2009-02-04 09:34:34 +01:00
Christoph Hellwig	cb3f35bb3b	xfs: tiny cleanup for xfs_link The source and target inodes are guaranteed to never be the same by the VFS, so no need to check for that (and we would get into bad trouble later anyway if that were the case). Also clean up the error handling to use two gotos instead of nested conditions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com>	2009-02-04 09:34:20 +01:00
Christoph Hellwig	b93b6e434c	xfs: make sure to free the real-time inodes in the mount error path When mount fails after allocating the real-time inodes we currently leak them. Add a new helper to free the real-time inodes which can be used by both the mount and unmount path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com>	2009-02-04 09:33:58 +01:00
Christoph Hellwig	f9057e3da7	xfs: cleanup error handling in xfs_mountfs: Clean up the error handling in xfs_mountfs. Use readable goto label names, simplify the uuid handling and other error conditions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com>	2009-02-04 09:31:52 +01:00
Felix Blyakher	43f3f057c5	[XFS] Warn on transaction in flight on read-only remount Till VFS can correctly support read-only remount without racing, use WARN_ON instead of BUG_ON on detecting transaction in flight after quiescing filesystem. Signed-off-by: Felix Blyakher <felixb@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2009-02-03 11:04:54 -06:00
Dave Chinner	6139a23609	xfs: Check buffer lengths in log recovery Before trying to obtain, read or write a buffer, check that the buffer length is actually valid. If it is not valid, then something read in the recovery process has been corrupted and we should abort recovery. Reported-by: Eric Sesterhenn <snakebyte@gmx.de> Tested-by: Eric Sesterhenn <snakebyte@gmx.de> Reviewed-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-02-03 11:01:32 -06:00
Dave Chinner	3228149ceb	xfs: Check buffer lengths in log recovery Before trying to obtain, read or write a buffer, check that the buffer length is actually valid. If it is not valid, then something read in the recovery process has been corrupted and we should abort recovery. Reported-by: Eric Sesterhenn <snakebyte@gmx.de> Tested-by: Eric Sesterhenn <snakebyte@gmx.de> Reviewed-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-02-03 10:19:33 -06:00
Eric Sandeen	f0e0059b9c	don't reallocate sxp variable passed into xfs_swapext fixes kernel.org bugzilla 12538, xfs_fsr fails on 2.6.29-rc kernels Regression caused by `743bb4650d` This was an embarrasing mistake, reallocating the sxp pointer passed in from the main ioctl switch. Signed-off-by: Eric Sandeen <sandeen@sandeen.net Reported-by: Paul Martin <pm@debian.org> Tested-by: Paul Martin <pm@debian.org> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-01-27 14:51:39 -06:00
Eric Sandeen	ac12b4e25e	don't reallocate sxp variable passed into xfs_swapext fixes kernel.org bugzilla 12538, xfs_fsr fails on 2.6.29-rc kernels Regression caused by `743bb4650d` This was an embarrasing mistake, reallocating the sxp pointer passed in from the main ioctl switch. Signed-off-by: Eric Sandeen <sandeen@sandeen.net Reported-by: Paul Martin <pm@debian.org> Tested-by: Paul Martin <pm@debian.org> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-01-27 13:59:43 -06:00
Felix Blyakher	5e1065726e	[XFS] Warn on transaction in flight on read-only remount Till VFS can correctly support read-only remount without racing, use WARN_ON instead of BUG_ON on detecting transaction in flight after quiescing filesystem. Signed-off-by: Felix Blyakher <felixb@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2009-01-27 13:37:24 -06:00
Dave Chinner	74e2d06521	Long btree pointers are still 64 bit on disk [XFS] Long btree pointers are still 64 bit on disk On 32 bit machines with CONFIG_LBD=n, XFS reduces the in memory size of xfs_fsblock_t to 32 bits so that it will fit within 32 bit addressing. However, the disk format for long btree pointers are still 64 bits in size. The recent btree rewrite failed to take this into account when initialising new btree blocks, setting sibling pointers to NULL and checking if they are NULL. Hence checking whether a 64 bit NULL was the same as a 32 bit NULL was failingi resulting in NULL sibling pointers failing to be detected correctly. This showed up as WANT_CORRUPTED_GOTO shutdowns in xfs_btree_delrec. Fix this by making all the comparisons and setting of long pointer btree NULL blocks to the disk format, not the in memory format. i.e. use NULLDFSBNO. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Reported-by: Jacek Luczak <difrost.kernel@gmail.com> Reported-by: Danny ter Haar <dth@dth.net> Tested-by: Jacek Luczak <difrost.kernel@gmail.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-01-22 01:23:11 -06:00
Felix Blyakher	957274d7ce	Merge branch 'master' of git+ssh://oss.sgi.com/oss/git/xfs/xfs	2009-01-21 22:39:29 -06:00
Eric Sandeen	5253a11a81	[XFS] remove always-true #ifndef HAVE_FORMAT32 tests There are several tests for #ifndef HAVE_FORMAT32, but this is never defined anywhere so it is always the default behavior; just remove the ifndef goop. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2009-01-22 14:07:31 +11:00
Dave Chinner	33ad965dde	Long btree pointers are still 64 bit on disk [XFS] Long btree pointers are still 64 bit on disk On 32 bit machines with CONFIG_LBD=n, XFS reduces the in memory size of xfs_fsblock_t to 32 bits so that it will fit within 32 bit addressing. However, the disk format for long btree pointers are still 64 bits in size. The recent btree rewrite failed to take this into account when initialising new btree blocks, setting sibling pointers to NULL and checking if they are NULL. Hence checking whether a 64 bit NULL was the same as a 32 bit NULL was failingi resulting in NULL sibling pointers failing to be detected correctly. This showed up as WANT_CORRUPTED_GOTO shutdowns in xfs_btree_delrec. Fix this by making all the comparisons and setting of long pointer btree NULL blocks to the disk format, not the in memory format. i.e. use NULLDFSBNO. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Reported-by: Jacek Luczak <difrost.kernel@gmail.com> Reported-by: Danny ter Haar <dth@dth.net> Tested-by: Jacek Luczak <difrost.kernel@gmail.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-01-21 18:33:46 -06:00
Eric Sandeen	b6e3222732	[XFS] Remove the rest of the macro-to-function indirections. Remove the last of the macros-defined-to-static-functions. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2009-01-19 14:45:55 +11:00
Christoph Hellwig	b828d8c338	xfs: sanity check attr fork size Recently we have quite a few kerneloops reports about dereferencing a NULL if_data in the attribute fork. From looking over the code this can only happen if we pass a 0 size argument to xfs_iformat_local. This implies some sort of corruption and in fact the only mailinglist report about this from earlier this year was after a powerfail presumably on a system with write cache and without barriers. Add a quick sanity check for the attr fork size in xfs_iformat to catch these early and without an oops. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-01-19 14:45:11 +11:00
Christoph Hellwig	49739140e5	xfs: fix bad_features2 fixups for the root filesystem Currently the bad_features2 fixup and the alignment updates in the superblock are skipped if we mount a filesystem read-only. But for the root filesystem the typical case is to mount read-only first and only later remount writeable so we'll never perform this update at all. It's not a big problem but means the logs of people needing the fixup get spammed at every boot because they never happen on disk. Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-01-19 14:45:04 +11:00
Christoph Hellwig	5aa2dc0a06	xfs: add a lock class for group/project dquots We can have both a user and a group/project dquot locked at the same time, as long as the user dquot is locked first. Tell lockdep about that fact by making the group/project dquots a different lock class. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-01-19 14:44:59 +11:00
Christoph Hellwig	4f2d4ac6e5	xfs: lockdep annotations for xfs_dqlock2 xfs_dqlock2 locks two xfs_dquots, which is fine as it always locks the dquot with the lower id first. Use mutex_lock_nested to tell lockdep about this fact. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-01-19 14:44:52 +11:00
Christoph Hellwig	080dda7f5e	xfs: add a separate lock class for the per-mount list of dquots We can have both a a quota hash chain and the per-mount list locked at the same time. But given that both use the same struct dqhash as list head we have to tell lockdep that they are different lock classes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-01-19 14:44:44 +11:00
Christoph Hellwig	62e194ecda	xfs: use mnt_want_write in compat_attrmulti ioctl The compat version of the attrmulti ioctl needs to ask for and then later release write access to the mount just like the native version, otherwise we could potentially write to read-only mounts. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-01-19 14:44:30 +11:00
Christoph Hellwig	ab596ad897	xfs: fix dentry aliasing issues in open_by_handle Open by handle just grabs an inode by handle and then creates itself a dentry for it. While this works for regular files it is horribly broken for directories, where the VFS locking relies on the fact that there is only just one single dentry for a given inode, and that these are always connected to the root of the filesystem so that it's locking algorithms work (see Documentations/filesystems/Locking) Remove all the existing open by handle code and replace it with a small wrapper around the exportfs code which deals with all these issues. At the same time we also make the checks for a valid handle strict enough to reject all not perfectly well formed handles - given that we never hand out others that's okay and simplifies the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-01-19 14:43:18 +11:00
Christoph Hellwig	2809f76afc	xfs: sanity check attr fork size Recently we have quite a few kerneloops reports about dereferencing a NULL if_data in the attribute fork. From looking over the code this can only happen if we pass a 0 size argument to xfs_iformat_local. This implies some sort of corruption and in fact the only mailinglist report about this from earlier this year was after a powerfail presumably on a system with write cache and without barriers. Add a quick sanity check for the attr fork size in xfs_iformat to catch these early and without an oops. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-01-19 02:04:16 +01:00
Christoph Hellwig	7884bc8617	xfs: fix bad_features2 fixups for the root filesystem Currently the bad_features2 fixup and the alignment updates in the superblock are skipped if we mount a filesystem read-only. But for the root filesystem the typical case is to mount read-only first and only later remount writeable so we'll never perform this update at all. It's not a big problem but means the logs of people needing the fixup get spammed at every boot because they never happen on disk. Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-01-19 02:04:07 +01:00
Christoph Hellwig	98b8c7a0c4	xfs: add a lock class for group/project dquots We can have both a user and a group/project dquot locked at the same time, as long as the user dquot is locked first. Tell lockdep about that fact by making the group/project dquots a different lock class. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-01-19 02:03:25 +01:00
Christoph Hellwig	5bb87a33b2	xfs: lockdep annotations for xfs_dqlock2 xfs_dqlock2 locks two xfs_dquots, which is fine as it always locks the dquot with the lower id first. Use mutex_lock_nested to tell lockdep about this fact. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-01-19 02:03:19 +01:00
Christoph Hellwig	a4edd1da20	xfs: add a separate lock class for the per-mount list of dquots We can have both a a quota hash chain and the per-mount list locked at the same time. But given that both use the same struct dqhash as list head we have to tell lockdep that they are different lock classes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-01-19 02:03:11 +01:00
Christoph Hellwig	178eae342b	xfs: use mnt_want_write in compat_attrmulti ioctl The compat version of the attrmulti ioctl needs to ask for and then later release write access to the mount just like the native version, otherwise we could potentially write to read-only mounts. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-01-19 02:03:03 +01:00
Christoph Hellwig	d296d30a99	xfs: fix dentry aliasing issues in open_by_handle Open by handle just grabs an inode by handle and then creates itself a dentry for it. While this works for regular files it is horribly broken for directories, where the VFS locking relies on the fact that there is only just one single dentry for a given inode, and that these are always connected to the root of the filesystem so that it's locking algorithms work (see Documentations/filesystems/Locking) Remove all the existing open by handle code and replace it with a small wrapper around the exportfs code which deals with all these issues. At the same time we also make the checks for a valid handle strict enough to reject all not perfectly well formed handles - given that we never hand out others that's okay and simplifies the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-01-19 02:02:57 +01:00
Eric Sandeen	9d87c3192d	[XFS] Remove the rest of the macro-to-function indirections. Remove the last of the macros-defined-to-static-functions. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2009-01-16 17:10:42 +11:00
Lachlan McIlroy	cb7a97d015	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into for-linus	2009-01-14 16:29:51 +11:00
Lachlan McIlroy	c088f4e9da	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6	2009-01-14 16:29:08 +11:00
Takashi Sato	8e961870bb	filesystem freeze: remove XFS specific ioctl interfaces for freeze feature It removes XFS specific ioctl interfaces and request codes for freeze feature. This patch has been supplied by David Chinner. Signed-off-by: Dave Chinner <dgc@sgi.com> Signed-off-by: Takashi Sato <t-sato@yk.jp.nec.com> Cc: Dave Chinner <david@fromorbit.com> Cc: <xfs-masters@oss.sgi.com> Cc: <linux-ext4@vger.kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-09 16:54:42 -08:00
Takashi Sato	c4be0c1dc4	filesystem freeze: add error handling of write_super_lockfs/unlockfs Currently, ext3 in mainline Linux doesn't have the freeze feature which suspends write requests. So, we cannot take a backup which keeps the filesystem's consistency with the storage device's features (snapshot and replication) while it is mounted. In many case, a commercial filesystem (e.g. VxFS) has the freeze feature and it would be used to get the consistent backup. If Linux's standard filesystem ext3 has the freeze feature, we can do it without a commercial filesystem. So I have implemented the ioctls of the freeze feature. I think we can take the consistent backup with the following steps. 1. Freeze the filesystem with the freeze ioctl. 2. Separate the replication volume or create the snapshot with the storage device's feature. 3. Unfreeze the filesystem with the unfreeze ioctl. 4. Take the backup from the separated replication volume or the snapshot. This patch: VFS: Changed the type of write_super_lockfs and unlockfs from "void" to "int" so that they can return an error. Rename write_super_lockfs and unlockfs of the super block operation freeze_fs and unfreeze_fs to avoid a confusion. ext3, ext4, xfs, gfs2, jfs: Changed the type of write_super_lockfs and unlockfs from "void" to "int" so that write_super_lockfs returns an error if needed, and unlockfs always returns 0. reiserfs: Changed the type of write_super_lockfs and unlockfs from "void" to "int" so that they always return 0 (success) to keep a current behavior. Signed-off-by: Takashi Sato <t-sato@yk.jp.nec.com> Signed-off-by: Masayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com> Cc: <xfs-masters@oss.sgi.com> Cc: <linux-ext4@vger.kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-09 16:54:42 -08:00
Nick Piggin	0087167c9d	[XFS] use scalable vmap API Implement XFS's large buffer support with the new vmap APIs. See the vmap rewrite (`db64fe02`) for some numbers. The biggest improvement that comes from using the new APIs is avoiding the global KVA allocation lock on every call. Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2009-01-09 17:09:47 +11:00
Nick Piggin	958f8c0e4f	[XFS] remove old vmap cache XFS's vmap batching simply defers a number (up to 64) of vunmaps, and keeps track of them in a list. To purge the batch, it just goes through the list and calls vunamp on each one. This is pretty poor: a global TLB flush is generally still performed on each vunmap, with the most expensive parts of the operation being the broadcast IPIs and locking involved in the SMP callouts, and the locking involved in the vmap management -- none of these are avoided by just batching up the calls. I'm actually surprised it ever made much difference. (Now that the lazy vmap allocator is upstream, this description is not quite right, but the vunmap batching still doesn't seem to do much) Rip all this logic out of XFS completely. I will improve vmap performance and scalability directly in subsequent patch. Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2009-01-09 17:09:25 +11:00
Lachlan McIlroy	ce79735c12	Merge branch 'for-linus' of git+ssh://git.melbourne.sgi.com/git/xfs	2009-01-09 16:24:48 +11:00
Christoph Hellwig	058652a37d	[XFS] make xfs_ino_t an unsigned long long Currently xfs_ino_t is defined as a u64 which can either be an unsigned long long or on some 64 bit platforms and unsigned long. Just making it and unsigned long long mean's it's still always 64 bits wide, but we don't need to resort to cases to print it. Fixes a warning regression on 64 bit powerpc in current git. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2009-01-09 16:19:14 +11:00
Christoph Hellwig	1544031976	[XFS] truncate readdir offsets to signed 32 bit values John Stanley reported EOVERFLOW errors in readdir from his self-build glibc. I traced this down to glibc enabling d_off overflow checks in one of the about five million different getdents implementations. In 2.6.28 Dave Woodhouse moved our readdir double buffering required for NFS4 readdirplus into nfsd and at that point we lost the capping of the directory offsets to 32 bit signed values. Johns glibc used getdents64 to even implement readdir for normal 32 bit offset dirents, and failed with EOVERFLOW only if this happens on the first dirent in a getdents call. I managed to come up with a testcase that uses raw getdents and does the EOVERFLOW check manually. We always hit it with our last entry due to the special end of directory marker. The patch below is a dumb version of just putting back the masking, to make sure we have the same behavior as in 2.6.27 and earlier. I will work on a better and cleaner fix for 2.6.30. Reported-by: John Stanley <jpsinthemix@verizon.net> Tested-by: John Stanley <jpsinthemix@verizon.net> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2009-01-09 16:18:24 +11:00
Christoph Hellwig	e6edbd1c1c	[XFS] fix compile of xfs_btree_readahead_lblock on m68k Change the left/right variables to the proper always 64bit xfs_dfsbo_t type because otherwise compilation fails for Geert on m68k without CONFIG_LBD: \| fs/xfs/xfs_btree.c: In function 'xfs_btree_readahead_lblock': \| fs/xfs/xfs_btree.c:736: warning: comparison is always true due to limited range of data type \| fs/xfs/xfs_btree.c:741: warning: comparison is always true due to limited range of data type Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2009-01-09 16:16:51 +11:00
Eric Sandeen	fb82557f16	[XFS] Remove macro-to-function indirections in the mask code Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2009-01-09 15:53:54 +11:00
Eric Sandeen	c9fb86a917	[XFS] Remove macro-to-function indirections in attr code Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2009-01-09 15:46:44 +11:00
Eric Sandeen	9800b55035	[XFS] Remove several unused typedefs. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2009-01-09 15:46:16 +11:00
Christoph Hellwig	c9a98553d5	[XFS] pass XFS_IGET_BULKSTAT to xfs_iget for handle operations NFS clients or users of the handle ioctls can pass us arbitrary inode numbers through the exportfs interface. Make sure we use the XFS_IGET_BULKSTAT so that these don't cause shutdowns due to the corruption checks. Also translate the EINVAL we get back for invalid inode clusters into an ESTALE which is more appropinquate, and remove the useless check for a NULL inode on a successfull xfs_iget return. I have a testcase to reproduce this using the handle interface which I will submit to xfsqa. Reported-by: Mario Becroft <mb@gem.win.co.nz> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2009-01-09 15:17:17 +11:00
Lachlan McIlroy	6206aa8b2b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6	2009-01-08 13:22:55 +11:00
Frederik Schwarzer	025dfdafe7	trivial: fix then -> than typos in comments and documentation - (better, more, bigger ...) then -> (...) than Signed-off-by: Frederik Schwarzer <schwarzerf@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-01-06 11:28:06 +01:00
Nick Piggin	95f8e302c0	[XFS] use scalable vmap API Implement XFS's large buffer support with the new vmap APIs. See the vmap rewrite (`db64fe02`) for some numbers. The biggest improvement that comes from using the new APIs is avoiding the global KVA allocation lock on every call. Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2009-01-06 14:43:09 +11:00
Nick Piggin	d2859751cd	[XFS] remove old vmap cache XFS's vmap batching simply defers a number (up to 64) of vunmaps, and keeps track of them in a list. To purge the batch, it just goes through the list and calls vunamp on each one. This is pretty poor: a global TLB flush is generally still performed on each vunmap, with the most expensive parts of the operation being the broadcast IPIs and locking involved in the SMP callouts, and the locking involved in the vmap management -- none of these are avoided by just batching up the calls. I'm actually surprised it ever made much difference. (Now that the lazy vmap allocator is upstream, this description is not quite right, but the vunmap batching still doesn't seem to do much) Rip all this logic out of XFS completely. I will improve vmap performance and scalability directly in subsequent patch. Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2009-01-06 14:40:44 +11:00
Lachlan McIlroy	0a8c5395f9	[XFS] Fix merge failures Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: fs/xfs/linux-2.6/xfs_cred.h fs/xfs/linux-2.6/xfs_globals.h fs/xfs/linux-2.6/xfs_ioctl.c fs/xfs/xfs_vnodeops.h Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-29 16:47:18 +11:00
James Morris	cbacc2c7f0	Merge branch 'next' into for-linus	2008-12-25 11:40:09 +11:00
Lachlan McIlroy	25051158bb	[XFS] Fix race in xfs_write() between direct and buffered I/O with DMAPI The iolock is dropped and re-acquired around the call to XFS_SEND_NAMESP(). While the iolock is released the file can become cached. We then 'goto retry' and - if we are doing direct I/O - mapping->nrpages may now be non zero but need_i_mutex will be zero and we will hit the WARN_ON(). Since we have dropped the I/O lock then the file size may have also changed so what we need to do here is 'goto start' like we do for the XFS_SEND_DATA() DMAPI event. We also need to update the filesize before releasing the iolock so that needs to be done before the XFS_SEND_NAMESP event. If we drop the iolock before setting the filesize we could race with a truncate. Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-24 14:07:32 +11:00
Christoph Hellwig	ad1ad968f4	[XFS] handle unaligned data in xfs_bmbt_disk_get_all In libxfs xfs_bmbt_disk_get_all needs to handle unaligned data and thus has been updated to use get_unaligned_be64. In kernelspace we don't strictly need it as the routine is only used for tracing and xfsidbg, but let's keep the two implementations in sync. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-23 11:54:46 +11:00
Christoph Hellwig	efc557570d	[XFS] avoid memory allocations in xfs_fs_vcmn_err xfs_fs_vcmn_err can be called under a spinlock, but does a sleeping memory allocation to create buffer for it's internal sprintf. Fortunately it's the only caller of icmn_err, so we can merge the two and have one single static buffer and spinlock protecting it. While we're at it make sure we proper __attribute__ format annotations so that the compiler can detect mismatched format strings. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-22 18:02:01 +11:00
Lachlan McIlroy	9f6c92b9cc	[XFS] Fix speculative allocation beyond eof Speculative allocation beyond eof doesn't work properly. It was broken some time ago after a code cleanup that moved what is now xfs_iomap_eof_align_last_fsb() and xfs_iomap_eof_want_preallocate() out of xfs_iomap_write_delay() into separate functions. The code used to use the current file size in various checks but got changed to be max(file_size, i_new_size). Since i_new_size is the result of 'offset + count' then in xfs_iomap_eof_want_preallocate() the check for '(offset + count) <= isize' will always be true. ie if 'offset + count' is > ip->i_size then isize will be i_new_size and equal to 'offset + count'. This change fixes all the places that used to use the current file size. Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-22 17:56:49 +11:00
Lachlan McIlroy	4fdc778179	[XFS] Remove XFS_BUF_SHUT() and friends Code does nothing so remove it. Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-22 17:52:58 +11:00
Lachlan McIlroy	d415867e0a	[XFS] Use the incore inode size in xfs_file_readdir() We should be using the incore inode size here not the linux inode size. The incore inode size is always up to date for directories whereas the linux inode size is not updated for directories. We've hit assertions in xfs_bmap() and traced it back to the linux inode size being zero but the incore size being correct. Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-22 17:50:56 +11:00
Lachlan McIlroy	4d9d4ebf5d	Merge branch 'master' of git+ssh://git.melbourne.sgi.com/git/xfs	2008-12-12 15:28:02 +11:00
Lachlan McIlroy	cfbe52672f	[XFS] set b_error from bio error in xfs_buf_bio_end_io Preserve any error returned by the bio layer. Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-12 15:27:25 +11:00
Christoph Hellwig	c4cd747ee6	[XFS] use inode_change_ok for setattr permission checking Instead of implementing our own checks use inode_change_ok to check for necessary permission in setattr. There is a slight change in behaviour as inode_change_ok doesn't allow i_mode updates to add the suid or sgid without superuser privilegues while the old XFS code just stripped away those bits from the file mode. (First sent on Semptember 29th) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-11 13:15:10 +11:00
Christoph Hellwig	4d4be482a4	[XFS] add a FMODE flag to make XFS invisible I/O less hacky XFS has a mode called invisble I/O that doesn't update any of the timestamps. It's used for HSM-style applications and exposed through the nasty open by handle ioctl. Instead of doing directly assignment of file operations that set an internal flag for it add a new FMODE_NOCMTIME flag that we can check in the normal file operations. (addition of the generic VFS flag has been ACKed by Al as an interims solution) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-11 13:14:41 +11:00
Christoph Hellwig	6d73cf133c	[XFS] resync headers with libxfs - xfs_sb.h add the XFS_SB_VERSION2_PARENTBIT features2 that has been around in userspace for some time - xfs_inode.h: move a few things out of __KERNEL__ that are needed by userspace - xfs_mount.h: only include xfs_sync.h under __KERNEL__ - xfs_inode.c: minor whitespace fixup. I accidentaly changes this when importing this file for use by userspace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-11 13:14:17 +11:00
Christoph Hellwig	2175dd9574	[XFS] simplify projid check in xfs_rename Check for the project ID after attaching all inodes to the transaction. That way the unlock in the error case is done by the transaction subsystem, which guaratees that is uses the right flags (which was wrong from day one of this check), and avoids having special code unlocking an array of inodes with potential duplicates. Attaching the inode first is the method used by xfs_rename and the other namespace methods all other error that require multiple locked inodes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-11 13:13:52 +11:00
Christoph Hellwig	15ac08a8b2	[XFS] replace b_fspriv with b_mount Replace the b_fspriv pointer and it's ugly accessors with a properly types xfs_mount pointer. Also switch log reocvery over to it instead of using b_fspriv for the mount pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-11 13:13:33 +11:00
Lachlan McIlroy	e055f13a6d	[XFS] Remove unused tracing code None of this code appears to be used anywhere so remove it. Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-10 11:51:54 +11:00
Dave Chinner	576a488a27	[XFS] Fix hang after disallowed rename across directory quota domains When project quota is active and is being used for directory tree quota control, we disallow rename outside the current directory tree. This requires a check to be made after all the inodes involved in the rename are locked. We fail to unlock the inodes correctly if we disallow the rename when the target is outside the current directory tree. This results in a hang on the next access to the inodes involved in failed rename. Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Dave Chinner <david@fromorbit.com> Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-05 15:39:13 +11:00
Lachlan McIlroy	797eaed40e	[XFS] Remove unnecessary assertion Hit this assert because an inode was tagged with XFS_ICI_RECLAIM_TAG but not XFS_IRECLAIMABLE\|XFS_IRECLAIM. This is because xfs_iget_cache_hit() first clears XFS_IRECLAIMABLE and then calls __xfs_inode_clear_reclaim_tag() while only holding the pag_ici_lock in read mode so we can race with xfs_reclaim_inodes_ag(). Looks like xfs_reclaim_inodes_ag() will do the right thing anyway so just remove the assert. Thanks to Christoph for pointing out where the problem was. Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Reviewed-by: Christoph Hellwig <hch@infradead.org>	2008-12-05 14:15:49 +11:00
Lachlan McIlroy	a5b429d41f	[XFS] Remove unused variable in ktrace_free() entries_size is probably left over from when we used to pass the size to kmem_free(). Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>	2008-12-05 13:31:51 +11:00
Lachlan McIlroy	c6422617a1	[XFS] Check return value of xfs_buf_get_noaddr() We check the return value of all other calls to xfs_buf_get_noaddr(). Make sense to do it here too. Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>	2008-12-05 13:16:15 +11:00
Dave Chinner	6a0775a991	[XFS] Fix hang after disallowed rename across directory quota domains When project quota is active and is being used for directory tree quota control, we disallow rename outside the current directory tree. This requires a check to be made after all the inodes involved in the rename are locked. We fail to unlock the inodes correctly if we disallow the rename when the target is outside the current directory tree. This results in a hang on the next access to the inodes involved in failed rename. Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Dave Chinner <david@fromorbit.com> Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-05 12:50:04 +11:00
Christoph Hellwig	8bb57320f3	[XFS] Fix compile with CONFIG_COMPAT enabled Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-05 11:23:10 +11:00
Christoph Hellwig	5a8d0f3c7a	move inode tracing out of xfs_vnode. Move the inode tracing into xfs_iget.c / xfs_inode.h and kill xfs_vnode.c now that it's empty. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:25 +11:00
Christoph Hellwig	25e41b3d52	move vn_iowait / vn_iowake into xfs_aops.c The whole machinery to wait on I/O completion is related to the I/O path and should be there instead of in xfs_vnode.c. Also give the functions more descriptive names. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:24 +11:00
Christoph Hellwig	583fa586f0	kill vn_ioerror There's just one caller of this helper, and it's much cleaner to just merge the xfs_do_force_shutdown call into it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:24 +11:00
Christoph Hellwig	f95099ba5a	kill xfs_unmount_flush There's almost nothing left in this function, instead remove the IRELE on the real times inodes and the call to XFS_QM_UNMOUNT into xfs_unmountfs. For the regular unmount case that means it now also happenes after dmapi notification, but otherwise there is no difference in behaviour. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:24 +11:00
Christoph Hellwig	e57481dc26	no explicit xfs_iflush for special inodes during unmount Currently we explicitly call xfs_iflush on the quota, real-time and root inodes from xfs_unmount_flush. But we just called xfs_sync_inodes with SYNC_ATTR and do an XFS_bflush aka xfs_flush_buftarg to make sure all inodes are on disk already, so there is no need for these special cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:23 +11:00
Christoph Hellwig	070c4616ec	use xfs_trans_ijoin in xfs_trans_iget Use xfs_trans_ijoin in xfs_trans_iget in case we need to join an inode into a transaction instead of opencoding it. Based on a discussion with and an incomplete patch from Niv Sardi. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:23 +11:00
Christoph Hellwig	b56757becf	remove leftovers of shared read-only support We never supported shared read-only filesystems, so remove the dead code left over from IRIX for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:23 +11:00
Christoph Hellwig	e88f11abe0	remove unused m_inode_quiesce member from struct xfs_mount Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:22 +11:00
Christoph Hellwig	6bd16ff270	kill dead inode flags There are a few inode flags around that aren't used anywhere, so remove them. Also update xfsidbg to display all used inode flags correctly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:22 +11:00
Christoph Hellwig	5efcbb853b	cleanup xfs_sb.h feature flag helpers The various inlines in xfs_sb.h that deal with the superblock version and fature flags were converted from macros a while ago, and this show by the odd coding style full of useless braces and backslashes and the avoidance of conditionals. Clean these up to look like normal C code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:22 +11:00
Christoph Hellwig	df6771bde1	kill dead quota flags Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:22 +11:00
Christoph Hellwig	63ad2a5c4c	remove dead code from sv_t implementation Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:21 +11:00
Christoph Hellwig	39e2defe73	reduce l_icloglock roundtrips All but one caller of xlog_state_want_sync drop and re-acquire l_icloglock around the call to it, just so that xlog_state_want_sync can acquire and drop it. Move all lock operation out of l_icloglock and assert that the lock is held when it is called. Note that it would make sense to extende this scheme to xlog_state_release_iclog, but the locking in there is more complicated and we'd like to keep the atomic_dec_and_lock optmization for those callers not having l_icloglock yet. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:21 +11:00
Christoph Hellwig	d9424b3c4a	stop using igrab in xfs_vn_link ->link is guranteed to get an already reference inode passed so we can do a simple increment of i_count instead of using igrab and thus avoid banging on the global inode_lock. This is what most filesystems already do. Also move the increment after the call to xfs_link to simplify error handling. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:21 +11:00
Christoph Hellwig	5d765b976c	kill xfs_buf_iostart xfs_buf_iostart is a "shared" helper for xfs_buf_read_flags, xfs_bawrite, and xfs_bdwrite - except that there isn't much shared code but rather special cases for each caller. So remove this function and move the functionality to the caller. xfs_bawrite and xfs_bdwrite are now big enough to be moved out of line and the xfs_buf_read_flags is moved into a new helper called _xfs_buf_read. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:20 +11:00
Christoph Hellwig	5cafdeb289	cleanup the inode reclaim path Merge xfs_iextract and xfs_idestroy into xfs_ireclaim as they are never called individually. Also rewrite most comments in this area as they were severly out of date. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:20 +11:00
Christoph Hellwig	ccd0be6cfc	remove unused prototypes for xfs_ihash_init / xfs_ihash_free Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:20 +11:00
Christoph Hellwig	73e6335c14	remove unused behvavior cruft in xfs_super.h Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:19 +11:00
Christoph Hellwig	2234d54d3d	remove useless mnt_want_write call in xfs_write When mnt_want_write was introduced a call to it was added around xfs_ichgtime, but there is no need for this because a file can't be open read/write on a r/o mount, and a mount can't degrade r/o while we still have files open for writing. As the mnt_want_write changes were never merged into the CVS tree this patch is for mainline only. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:19 +11:00
Christoph Hellwig	ddcd856d81	[XFS] fix compile on 32 bit systems The recent compat patches make xfs_file.c include xfs_ioctl32.h unconditional, which breaks the build on 32 bit systems which don't have the various compat defintions. Remove the include and move the defintion of xfs_file_compat_ioctl to xfs_ioctl.h so that we can avoid including all the compat defintions in xfs_file.c Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-04 13:07:29 +11:00
sandeen@sandeen.net	e5d412f178	[XFS] Reorder xfs_ioctl32.c for some tidiness Put things in IMHO a more readable order, now that it's all done; add some comments. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:18:21 +11:00
sandeen@sandeen.net	710d62aaaf	[XFS] Hook up compat XFS_IOC_FSSETDM_BY_HANDLE ioctl handler Add a compat handler for XFS_IOC_FSSETDM_BY_HANDLE. I haven't tested this, lacking dmapi tools to do so (unless xfsqa magically gets this somehow?) Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:17:43 +11:00
sandeen@sandeen.net	28750975ac	[XFS] Hook up compat XFS_IOC_ATTRMULTI_BY_HANDLE ioctl handler Add a compat handler for XFS_IOC_ATTRMULTI_BY_HANDLE Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:17:07 +11:00
sandeen@sandeen.net	ebeecd2b04	[XFS] Hook up compat XFS_IOC_ATTRLIST_BY_HANDLE ioctl handler Add a compat handler for XFS_IOC_ATTRLIST_BY_HANDLE Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:16:45 +11:00
sandeen@sandeen.net	af819d2763	[XFS] Fix compat XFS_IOC_FSBULKSTAT_SINGLE ioctl The XFS_IOC_FSBULKSTAT_SINGLE ioctl passes in the desired inode number, while XFS_IOC_FSBULKSTAT passes in the previous/last-stat'd inode number. The compat handler wasn't differentiating these, so when a XFS_IOC_FSBULKSTAT_SINGLE request for inode 128 was sent in, stat information for 131 was sent out. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:16:24 +11:00
sandeen@sandeen.net	65fbaf2489	[XFS] Fix xfs_bulkstat_one size checks & error handling The 32-bit xfs_blkstat_one handler was failing because a size check checked whether the remaining (32-bit) user buffer was less than the (64-bit) bulkstat buffer, and failed with ENOMEM if so. Move this check into the respective handlers so that they check the correct sizes. Also, the formatters were returning negative errors or positive bytes copied; this was odd in the positive error value world of xfs, and handled wrong by at least some of the callers, which treated the bytes returned as an error value. Move the bytes-used assignment into the formatters. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:16:03 +11:00
sandeen@sandeen.net	2ee4fa5cb7	[XFS] Make the bulkstat_one compat ioctl handling more sane Currently the compat formatter was handled by passing in "private_data" for the xfs_bulkstat_one formatter, which was really just another formatter... IMHO this got confusing. Instead, just make a new xfs_bulkstat_one_compat formatter for xfs_bulkstat, and call it via a wrapper. Also, don't translate the ioctl nrs into their native counterparts, that just clouds the issue; we're in a compat handler anyway, just switch on the 32-bit cmds. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:15:36 +11:00
sandeen@sandeen.net	471d591031	[XFS] Add compat handlers for data & rt growfs ioctls The args for XFS_IOC_FSGROWFSDATA and XFS_IOC_FSGROWFSRTA have padding on the end on intel, so add arg copyin functions, and then just call the growfs ioctl helpers. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:15:09 +11:00
sandeen@sandeen.net	e94fc4a43e	[XFS] Add compat handlers for swapext ioctl The big hitter here was the bstat field, which contains different sized time_t on 32 vs. 64 bit. Add a copyin function to translate the 32-bit arg to 64-bit, and call the swapext ioctl helper. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:10:04 +11:00
sandeen@sandeen.net	d5547f9fee	[XFS] Clean up some existing compat ioctl calls Create a new xfs_ioctl.h file which has prototypes for ioctl helpers that may be called in compat mode. Change several compat ioctl cases which are IOW to simply copy in the userspace argument, then call the common ioctl helper. This also fixes xfs_compat_ioc_fsgeometry_v1(), which had it backwards before; it copied in an (empty) arg, then copied out the native result, which probably corrupted userspace. It should be translating on the copyout. Also, a bit of formatting cleanup for consistency, and conversion of all error returns to use XFS_ERROR(). Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:09:43 +11:00
sandeen@sandeen.net	ffae263a64	[XFS] Move compat ioctl structs & numbers into xfs_ioctl32.h This makes the c file less cluttered and a bit more readable. Consistently name the ioctl number macros with "_32" and the compatibility stuctures with "_compat." Rename the helpers which simply copy in the arg with "_copyin" for easy identification. Finally, for a few of the existing helpers, modify them so that they directly call the native ioctl helper after userspace argument fixup. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:08:44 +11:00
sandeen@sandeen.net	743bb4650d	[XFS] Move copy_from_user calls out of ioctl helpers into ioctl switch. Moving the copy_from_user out of some of the ioctl helpers will make it easier for the compat ioctl switch to copy in the right struct, then just pass to the underlying helper. Also, move common access checks into the helpers themselves, and out of the native ioctl switch code, to reduce code duplication between native & compat ioctl callers. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:08:01 +11:00
Christoph Hellwig	0e446673a1	[XFS] fix error handling in xlog_recover_process_one_iunlink If we fail after xfs_iget we have to drop the reference count, spotted by Dave Chinner. Also remove some useless asserts and stop trying to deal with di_mode == 0 inodes because never gets those without passing the IGET_CREATE flag to xfs_iget. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:38:22 +11:00
Christoph Hellwig	24f211bad0	[XFS] move inode allocation out xfs_iread Allocate the inode in xfs_iget_cache_miss and pass it into xfs_iread. This simplifies the error handling and allows xfs_iread to be shared with userspace which already uses these semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:38:17 +11:00
Christoph Hellwig	b48d8d6437	[XFS] kill the XFS_IMAP_BULKSTAT flag Just pass down the XFS_IGET_* flags all the way down to xfs_imap instead of translating them mid-way. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:38:13 +11:00
Christoph Hellwig	92bfc6e7c4	[XFS] embededd struct xfs_imap into xfs_inode Most uses of struct xfs_imap are to map and inode to a buffer. To avoid copying around the inode location information we should just embedd a strcut xfs_imap into the xfs_inode. To make sure it doesn't bloat an inode the im_len is changed to a ushort, which is fine as that's what the users exepect anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:38:08 +11:00
Christoph Hellwig	94e1b69d1a	[XFS] merge xfs_imap into xfs_dilocate xfs_imap is the only caller of xfs_dilocate and doesn't add any significant value. Merge the two functions and document the various cases we have for inode cluster lookup in the new xfs_imap. Also remove the unused im_agblkno and im_ioffset fields from struct xfs_imap while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:38:03 +11:00
Christoph Hellwig	a194189503	[XFS] remove dead code for old inode item recovery We have removed the support for old-style inode items a while ago and xlog_recover_do_inode_trans is now only called for XFS_LI_INODE items. That means we can remove the call to xfs_imap there and with it the XFS_IMAP_LOOKUP that is set by all other callers. We can also mark xfs_imap static now. (First sent on October 21st) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:58 +11:00
Christoph Hellwig	76d8b277f7	[XFS] stop using xfs_itobp in xfs_iread The only caller of xfs_itobp that doesn't have i_blkno setup is now the initial inode read. It needs access to the whole xfs_imap so using xfs_inotobp is not an option. Instead opencode the buffer lookup in xfs_iread and kill all the functionality for the initial map from xfs_itobp. (First sent on October 21st) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:52 +11:00
Christoph Hellwig	23fac50f95	[XFS] split up xlog_recover_process_iunlinks Split out the body of the main loop into a separate helper to make the code readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:48 +11:00
Christoph Hellwig	51ce16d519	[XFS] kill XFS_DINODE_VERSION_ defines These names don't add any value at all over just using the numerical values. (First sent on October 9th) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:42 +11:00
Christoph Hellwig	81591fe2db	[XFS] kill xfs_dinode_core_t Now that we have a separate xfs_icdinode_t for the in-core inode which gets logged there is no need anymore for the xfs_dinode vs xfs_dinode_core split - the fact that part of the structure gets logged through the inode log item and a small part not can better be described in a comment. All sizeof operations on the dinode_core either really wanted the icdinode and are switched to that one, or had already added the size of the agi unlinked list pointer. Later both will be replaced with helpers once we get the larger CRC-enabled dinode. Removing the data and attribute fork unions also has the advantage that xfs_dinode.h doesn't need to pull in every header under the sun. While we're at it also add some more comments describing the dinode structure. (First sent on October 7th) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:35 +11:00
Christoph Hellwig	d42f08f61c	[XFS] kill xfs_ialloc_log_di xfs_ialloc_log_di is only used to log the full inode core + di_next_unlinked. That means all the offset magic is not nessecary and we can simply use xfs_trans_log_buf directly. Also add a comment describing what we should do here instead. (First sent on October 7th) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:31 +11:00
Christoph Hellwig	b28708d6a0	[XFS] sanitize xlog_in_core_t definition Move all fields from xlog_iclog_fields_t into xlog_in_core_t instead of having them in a substructure and the using #defines to make it look like they were directly in xlog_in_core_t. Also document that xlog_in_core_2_t is grossly misnamed, and make all references to it typesafe. (First sent on Semptember 15th) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:25 +11:00
From: Christoph Hellwig	4805621a37	[XFS] factor out xfs_read_agf helper Add a helper to read the AGF header and perform basic verification. Based on hunks from a larger patch from Dave Chinner. (First sent on Juli 23rd) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:20 +11:00
Christoph Hellwig	5e1be0fb1a	[XFS] factor out xfs_read_agi helper Add a helper to read the AGI header and perform basic verification. Based on hunks from a larger patch from Dave Chinner. (First sent on Juli 23rd) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:15 +11:00
Dave Chinner	26c5295135	[XFS] remove i_gen from incore inode i_gen is incremented in directory operations when the directory is changed. It is never read or otherwise used so it should be removed to help reduce the size of the struct xfs_inode. The patch also removes a duplicate logging of the directory inode core. We only need to do this once per transaction so kill the one associated with the i_gen increment. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:10 +11:00
Christoph Hellwig	207fcfad58	[XFS] remove xfs_vfsops.h The only thing left is xfs_do_force_shutdown which already has a defintion in xfs_mount.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:06 +11:00
Christoph Hellwig	2b5decd09e	[XFS] remove xfs_vfs.h The only thing left are the forced shutdown flags and freeze macros which fit into xfs_mount.h much better. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:36:59 +11:00
Christoph Hellwig	00dd4029e9	[XFS] remove bhv_statvfs_t typedef Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:36:46 +11:00
Eric Sandeen	f35642e2f8	[XFS] Hook up the fiemap ioctl. This adds the fiemap inode_operation, which for us converts the fiemap values & flags into a getbmapx structure which can be sent to xfs_getbmap. The formatter then copies the bmv array back into the user's fiemap buffer via the fiemap helpers. If we wanted to be more clever, we could also return mapping data for in-inode attributes, but I'm not terribly motivated to do that just yet. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:29:42 +11:00
Eric Sandeen	5af317c942	[XFS] Add new getbmap flags. This adds a new output flag, BMV_OF_LAST to indicate if we've hit the last extent in the inode. This potentially saves an extra call from userspace to see when the whole mapping is done. It also adds BMV_IF_DELALLOC and BMV_OF_DELALLOC to request, and indicate, delayed-allocation extents. In this case bmv_block is set to -2 (-1 was already taken for HOLESTARTBLOCK; unfortunately these are the reverse of the in-kernel constants.) These new flags facilitate addition of the new fiemap interface. Rather than adding sh_delalloc, remove sh_unwritten & just test the flags directly. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:29:28 +11:00
Eric Sandeen	8a7141a8b9	[XFS] convert xfs_getbmap to take formatter functions Preliminary work to hook up fiemap, this allows us to pass in an arbitrary formatter to copy extent data back to userspace. The formatter takes info for 1 extent, a pointer to the user "thing" and a pointer to a "filled" variable to indicate whether a userspace buffer did get filled in (for fiemap, hole "extents" are skipped). I'm just using the getbmapx struct as a "common denominator" because as far as I can see, it holds all info that any formatters will care about. ("thing" because fiemap doesn't pass the user pointer around, but rather has a pointer to a fiemap info structure, and helpers associated with it) Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:29:00 +11:00
Dave Chinner	0924b585fc	[XFS] fix uninitialised variable bug in dquot release. gcc is warning about an uninitialised variable in xfs_growfs_rt(). This is a false positive. Fix it by changing the scope of the transaction pointer to wholly within the internal loop inside the function. While there, preemptively change xfs_growfs_rt_alloc() in the same way as it has exactly the same structure as xfs_growfs_rt() but gcc is not warning about it. Yet. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:11:36 +11:00
Dave Chinner	2e6560929d	[XFS] fix error inversion problems with data flushing XFS gets the sign of the error wrong in several places when gathering the error from generic linux functions. These functions return negative error values, while the core XFS code returns positive error values. Hence when XFS inverts the error to be returned to the VFS, it can incorrectly invert a negative error and this error will be ignored by the syscall return. Fix all the problems related to calling filemap_* functions. Problem initially identified by Nick Piggin in xfs_fsync(). Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:11:10 +11:00
Christoph Hellwig	65795910c1	[XFS] fix spurious gcc warnings Some recent gcc warnings don't like passing string variables to printf-like functions without using at least a "%s" format string. Change the two occurances of that in xfs to please gcc. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:07:37 +11:00
Christoph Hellwig	6c31b93a14	[XFS] allow inode64 mount option on 32 bit systems Now that we've stopped using the Linux inode cache when can trivally support the inode64 mount option on 32bit architectures. As far as the kernel and most userspace is concerned this works perfectly, but applications still using really old stat and readdir interfaces will get an EOVERFLOW error when hitting an inode number not fitting into 32 bits (that problem of course also exists when using these applications on a 64bit kernel). Note that because inode64 is simply a mount option we can currently mount a filesystem having > 32 bit inode numbers and cause a variety of problems, all this is solved but this patch which enables XFS_BIG_INUMS, even when inode64 is not used. (First sent on October 18th) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:07:20 +11:00
Christoph Hellwig	f999a5bf3f	[XFS] wire up ->open for directories Currently there's no ->open method set for directories on XFS. That means we don't perform any check for opening too large directories without O_LARGEFILE, we don't check for shut down filesystems, and we don't actually do the readahead for the first block in the directory. Instead of just setting the directories open routine to xfs_file_open we merge the shutdown check directly into xfs_file_open and create a new xfs_dir_open that first calls xfs_file_open and then performs the readahead for block 0. (First sent on September 29th) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:07:08 +11:00
Christoph Hellwig	bac8dca9f9	[XFS] fix NULL pointer dereference in xfs_log_force_umount xfs_log_force_umount may be called very early during log recovery where If we fail a buffer read in xlog_recover_do_inode_trans we abort the mount. But at that point log recovery has started delayed writeback of inode buffers. As part of the aborted mount we try to flush out all delwri buffers, but at that point we have already freed the superblock, and set mp->m_sb_bp to NULL, and xfs_log_force_umount which gets called after the inode buffer writeback trips over it. Make xfs_log_force_umount a little more careful when accessing mp->m_sb_bp to avoid this. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:06:44 +11:00
Dave Chinner	cc09c0dc57	[XFS] Fix double free of log tickets When an I/O error occurs during an intermediate commit on a rolling transaction, xfs_trans_commit() will free the transaction structure and the related ticket. However, the duplicate transaction that gets used as the transaction continues still contains a pointer to the ticket. Hence when the duplicate transaction is cancelled and freed, we free the ticket a second time. Add reference counting to the ticket so that we hold an extra reference to the ticket over the transaction commit. We drop the extra reference once we have checked that the transaction commit did not return an error, thus avoiding a double free on commit error. Credit to Nick Piggin for tripping over the problem. SGI-PV: 989741 Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-11-17 17:37:10 +11:00
James Morris	2b82892565	Merge branch 'master' into next Conflicts: security/keys/internal.h security/keys/process_keys.c security/keys/request_key.c Fixed conflicts above by using the non 'tsk' versions. Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 11:29:12 +11:00
David Howells	745ca2475a	CRED: Pass credentials through dentry_open() Pass credentials through dentry_open() so that the COW creds patch can have SELinux's flush_unauthorized_files() pass the appropriate creds back to itself when it opens its null chardev. The security_dentry_open() call also now takes a creds pointer, as does the dentry_open hook in struct security_operations. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:22 +11:00
David Howells	b6dff3ec5e	CRED: Separate task security context from task_struct Separate the task security context from task_struct. At this point, the security data is temporarily embedded in the task_struct with two pointers pointing to it. Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in entry.S via asm-offsets. With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:16 +11:00
David Howells	82ab8deda7	CRED: Wrap task credential accesses in the XFS filesystem Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(\|e\|s\|fs)[ug]id to current_(\|e\|s\|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: xfs@oss.sgi.com Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:04 +11:00
David Chinner	220ca310a5	[XFS] XFS: Check for valid transaction headers in recovery When we are about to add a new item to a transaction in recovery, we need to check that it is valid first. Currently we just assert that header magic number matches, but in production systems that is not present and we add a corrupted transaction to the list to be processed. This results in a kernel oops later when processing the corrupted transaction. Instead, if we detect a corrupted transaction, abort recovery and leave the user to clean up the mess that has occurred. SGI-PV: 988145 SGI-Modid: xfs-linux-melb:xfs-kern:32356a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-11-10 18:01:50 +11:00
Dave Chinner	8f330f5149	[XFS] handle memory allocation failures during log initialisation When there is no memory left in the system, xfs_buf_get_noaddr() can fail. If this happens at mount time during xlog_alloc_log() we fail to catch the error and oops. Catch the error from xfs_buf_get_noaddr(), and allow other memory allocations to fail and catch those errors too. Report the error to the console and fail the mount with ENOMEM. Tested by manually injecting errors into xfs_buf_get_noaddr() and xlog_alloc_log(). Version 2: o remove unnecessary casts of the returned pointer from kmem_zalloc() SGI-PV: 987246 Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-11-10 17:57:06 +11:00
David Chinner	6f9f51adb6	[XFS] Account for allocated blocks when expanding directories When we create a directory, we reserve a number of blocks for the maximum possible expansion of of the directory due to various btree splits, freespace allocation, etc. Unfortunately, each allocation is not reflected in the total number of blocks still available to the transaction, so the maximal reservation is used over and over again. This leads to problems where an allocation group has only enough blocks for some of the allocations required for the directory modification. After the first N allocations, the remaining blocks in the allocation group drops below the total reservation, and subsequent allocations fail because the allocator will not allow the allocation to proceed if the AG does not have the enough blocks available for the entire allocation total. This results in an ENOSPC occurring after an allocation has already occurred. This results in aborting the directory operation (leaving the directory in an inconsistent state) and cancelling a dirty transaction, which results in a filesystem shutdown. Avoid the problem by reflecting the number of blocks allocated in any directory expansion in the total number of blocks available to the modification in progress. This prevents a directory modification from being aborted part way through with an ENOSPC. SGI-PV: 988144 SGI-Modid: xfs-linux-melb:xfs-kern:32340a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-11-10 17:51:14 +11:00
Lachlan McIlroy	2cf7f0da3a	[XFS] Wait for all I/O on truncate to zero file size It's possible to have outstanding xfs_ioend_t's queued when the file size is zero. This can happen in the direct I/O path when a direct I/O write fails due to ENOSPC. In this case the xfs_ioend_t will still be queued (ie xfs_end_io_direct() does not know that the I/O failed so can't force the xfs_ioend_t to be flushed synchronously). When we truncate a file on unlink we don't know to wait for these xfs_ioend_ts and we can have a use-after-free situation if the inode is reclaimed before the xfs_ioend_t is finally processed. As was suggested by Dave Chinner lets wait for all I/Os to complete when truncating the file size to zero. SGI-PV: 981668 SGI-Modid: xfs-linux-melb:xfs-kern:32216a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-11-10 17:51:00 +11:00
Lachlan McIlroy	9ccbece546	[XFS] Fix use-after-free with log and quotas Destroying the quota stuff on unmount can access the log - ie XFS_QM_DONE() ends up in xfs_dqunlock() which calls xfs_trans_unlocked_item() and then xfs_log_move_tail(). By this time the log has already been destroyed. Just move the cleanup of the quota code earlier in xfs_unmountfs() before the call to xfs_log_unmount(). Moving XFS_QM_DONE() up near XFS_QM_DQPURGEALL() seems like a good spot. SGI-PV: 987086 SGI-Modid: xfs-linux-melb:xfs-kern:32148a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Peter Leckie <pleckie@sgi.com>	2008-11-10 17:43:23 +11:00
Dave Chinner	6307091fe6	[XFS] Avoid using inodes that haven't been completely initialised The radix tree walks in xfs_sync_inodes_ag and xfs_qm_dqrele_all_inodes() can find inodes that are still undergoing initialisation. Avoid them by checking for the the XFS_INEW() flag once we have a reference on the inode. This flag is cleared once the inode is properly initialised. SGI-PV: 987246 Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-11-10 17:13:23 +11:00
Dave Chinner	cb4f0d1d42	[XFS] fix uninitialised variable bug in dquot release gcc on ARM warns about an using an uninitialised variable in xfs_qm_dqrele_all_inodes(). This is a real bug, but gcc on x86_64 is not reporting this warning so it went unnoticed. Fix the bug by bring the inode radix tree walk code up to date with xfs_sync_inodes_ag(). SGI-PV: 987246 Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-11-10 17:11:18 +11:00
Dave Chinner	644c3567d1	[XFS] handle memory allocation failures during log initialisation When there is no memory left in the system, xfs_buf_get_noaddr() can fail. If this happens at mount time during xlog_alloc_log() we fail to catch the error and oops. Catch the error from xfs_buf_get_noaddr(), and allow other memory allocations to fail and catch those errors too. Report the error to the console and fail the mount with ENOMEM. Tested by manually injecting errors into xfs_buf_get_noaddr() and xlog_alloc_log(). Version 2: o remove unnecessary casts of the returned pointer from kmem_zalloc() SGI-PV: 987246 Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-11-10 16:50:24 +11:00
David Howells	91b7771251	CRED: Wrap task credential accesses in the XFS filesystem Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(\|e\|s\|fs)[ug]id to current_(\|e\|s\|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com>	2008-10-31 15:50:04 +11:00
David Chinner	6bfb3d065f	[XFS] Fix race when looking up reclaimable inodes If we get a race looking up a reclaimable inode, we can end up with the winner proceeding to use the inode before it has been completely re-initialised. This is a Bad Thing. Fix the race by checking whether we are still initialising the inod eonce we have a reference to it, and if so wait for the initialisation to complete before continuing. While there, fix a leaked reference count in the same code when encountering an unlinked inode and we are not doing a lookup for a create operation. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32429a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 18:32:43 +11:00
Tim Shimmin	e0b8e8b65d	[XFS] remove restricted chown parameter from xfs linux On Linux all filesystems are supposed to be operating under Posix' restricted chown. Restricted chown means it restricts chown to the owner unless you have CAP_FOWNER. NOTE: that 2 files outside of fs/xfs have been modified too for this change. Reviewed-by: Dave Chinner <david@fromorbit.com> SGI-PV: 988919 SGI-Modid: xfs-linux-melb:xfs-kern:32413a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 18:30:48 +11:00
Christoph Hellwig	ea5a3dc835	[XFS] kill sys_cred capable_cred has been unused for a while so we can kill it and sys_cred. That also means the cred argument to xfs_setattr and xfs_change_file_space can be removed now. SGI-PV: 988918 SGI-Modid: xfs-linux-melb:xfs-kern:32412a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 18:27:48 +11:00
David Chinner	7ee49acfe5	[XFS] correctly select first log item to push Under heavy metadata load we are seeing log hangs. The AIL has items in it ready to be pushed, and they are within the push target window. However, we are not pushing them when the last pushed LSN is less than the LSN of the first log item on the AIL. This is a regression introduced by the AIL push cursor modifications. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32409a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-10-30 18:26:51 +11:00
Christoph Hellwig	9ed0451ee0	[XFS] free partially initialized inodes using destroy_inode To make sure we free the security data inodes need to be freed using the proper VFS helper (which we also need to export for this). We mark these inodes bad so we can skip the flush path for them. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32398a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>	2008-10-30 18:26:04 +11:00
Christoph Hellwig	c679eef052	[XFS] stop using xfs_itobp in xfs_bulkstat xfs_bulkstat only wants the dinode, offset and buffer from a given inode number. Instead of using xfs_itobp on a fake inode which is complicated and currently leads to leaks of the security data just use xfs_inotobp which is designed to do exactly the kind of lookup xfs_bulkstat wants. The only thing that's missing in xfs_inotobp is a flags paramter that let's us pass down XFS_IMAP_BULKSTAT, but that can easily added. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32397a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>	2008-10-30 18:04:13 +11:00
David Chinner	455486b9cc	[XFS] avoid all reclaimable inodes in xfs_sync_inodes_ag If we are syncing data in xfs_sync_inodes_ag(), the VFS inode must still be referencable as the dirty data state is carried on the VFS inode. hence if we can't get a reference via igrab(), the inode must be in reclaim which implies that it has no dirty data attached. Leave such inodes to the reclaim code to flush the dirty inode state to disk and so avoid attempting to access the VFS inode when it may not exist in xfs_sync_inodes_ag(). Version 4: o don't reference linux inode until after igrab() succeeds Version 3: o converted unlock/rele to an xfs_iput() call. Version 2: o change igrab logic to be more linear o remove initial reclaimable inode check now that we are using igrab() failure to find reclaimable inodes o assert that igrab failure occurs only on reclaimable inodes o clean up inode locking - only grab the iolock if we are doing a SYNC_DELWRI call and we have a dirty inode. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32391a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Peter Leckie <pleckie@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 18:03:14 +11:00
David Chinner	56e73ec47d	[XFS] Can't lock inodes in radix tree preload region When we are inside a radix tree preload region, we cannot sleep. Recently we moved the inode locking inside the preload region for the inode radix tree. Fix that, and fix a missed unlock in another error path in the same code at the same time. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32385a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:55:27 +11:00
Christoph Hellwig	2b7035fd74	[XFS] Trivial xfs_remove comment fixup The dp to ip comment should be for the unconditional xfs_droplink call, and the "." link obviously only exists for directories, so it should be in the is_dir conditional. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32374a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 17:55:18 +11:00
Christoph Hellwig	1ec7944beb	[XFS] fix biosize option iosizelog shouldn't be the same as iosize but the logarithm of it. Then again the current biosize option doesn't make much sense to me as it doesn't set the preferred I/O size as mentioned in the comment next to it but rather the allocation size and thus is identical to the allocsize option (except for the missing logarithm). It's also not documented in Documentation/filesystems/xfs.txt or the mount manpage. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32373a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 17:55:08 +11:00
Christoph Hellwig	469fc23d5d	[XFS] fix the noquota mount option Noquota should clear all mount options, and not just user and group quota. Probably doesn't matter very much in real life. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32372a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 17:54:57 +11:00
Christoph Hellwig	9d565ffa33	[XFS] kill struct xfs_mount_args No need to parse the mount option into a structure before applying it to struct xfs_mount. The content of xfs_start_flags gets merged into xfs_parseargs. Calls inbetween don't care and can use mount members instead of the args struct. This patch uncovered that the mount option for shared filesystems wasn't ever exposed on Linux. The code to handle it is #if 0'ed in this patch pending a decision on this feature. I'll send a writeup about it to the list soon. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32371a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 17:53:24 +11:00
David Chinner	5a792c4579	[XFS] XFS: Check for valid transaction headers in recovery When we are about to add a new item to a transaction in recovery, we need to check that it is valid first. Currently we just assert that header magic number matches, but in production systems that is not present and we add a corrupted transaction to the list to be processed. This results in a kernel oops later when processing the corrupted transaction. Instead, if we detect a corrupted transaction, abort recovery and leave the user to clean up the mess that has occurred. SGI-PV: 988145 SGI-Modid: xfs-linux-melb:xfs-kern:32356a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 17:40:09 +11:00
David Chinner	783a2f656f	[XFS] Finish removing the mount pointer from the AIL API Change all the remaining AIL API functions that are passed struct xfs_mount pointers to pass pointers directly to the struct xfs_ail being used. With this conversion, all external access to the AIL is via the struct xfs_ail. Hence the operation and referencing of the AIL is almost entirely independent of the xfs_mount that is using it - it is now much more tightly tied to the log and the items it is tracking in the log than it is tied to the xfs_mount. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32353a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:39:58 +11:00
David Chinner	fc1829f34d	[XFS] Add ail pointer into log items Add an xfs_ail pointer to log items so that the log items can reference the AIL directly during callbacks without needed a struct xfs_mount. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32352a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:39:46 +11:00
David Chinner	a9c21c1b9d	[XFS] Given the log a pointer to the AIL When we need to go from the log to the AIL, we have to go via the xfs_mount. Add a xfs_ail pointer to the log so we can go directly to the AIL associated with the log. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32351a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:39:35 +11:00
David Chinner	c7e8f26827	[XFS] Move the AIL lock into the struct xfs_ail Bring the ail lock inside the struct xfs_ail. This means the AIL can be entirely manipulated via the struct xfs_ail rather than needing both the struct xfs_mount and the struct xfs_ail. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32350a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:39:23 +11:00
David Chinner	7b2e2a31f5	[XFS] Allow 64 bit machines to avoid the AIL lock during flushes When copying lsn's from the log item to the inode or dquot flush lsn, we currently grab the AIL lock. We do this because the LSN is a 64 bit quantity and it needs to be read atomically. The lock is used to guarantee atomicity for 32 bit platforms. Make the LSN copying a small function, and make the function used conditional on BITS_PER_LONG so that 64 bit machines don't need to take the AIL lock in these places. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32349a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:39:12 +11:00
David Chinner	5b00f14fbd	[XFS] move the AIl traversal over to a consistent interface With the new cursor interface, it makes sense to make all the traversing code use the cursor interface and make the old one go away. This means more of the AIL interfacing is done by passing struct xfs_ail pointers around the place instead of struct xfs_mount pointers. We can replace the use of xfs_trans_first_ail() in xfs_log_need_covered() as it is only checking if the AIL is empty. We can do that with a call to xfs_trans_ail_tail() instead, where a zero LSN returned indicates and empty AIL... SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32348a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:39:00 +11:00
David Chinner	27d8d5fe0e	[XFS] Use a cursor for AIL traversal. To replace the current generation number ensuring sanity of the AIL traversal, replace it with an external cursor that is linked to the AIL. Basically, we store the next item in the cursor whenever we want to drop the AIL lock to do something to the current item. When we regain the lock. the current item may already be free, so we can't reference it, but the next item in the traversal is already held in the cursor. When we move or delete an object, we search all the active cursors and if there is an item match we clear the cursor(s) that point to the object. This forces the traversal to restart transparently. We don't invalidate the cursor on insert because the cursor still points to a valid item. If the intem is inserted between the current item and the cursor it does not matter; the traversal is considered to be past the insertion point so it will be picked up in the next traversal. Hence traversal restarts pretty much disappear altogether with this method of traversal, which should substantially reduce the overhead of pushing on a busy AIL. Version 2 o add restart logic o comment cursor interface o minor cleanups SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32347a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:38:39 +11:00
David Chinner	82fa901245	[XFS] Allocate the struct xfs_ail Rather than embedding the struct xfs_ail in the struct xfs_mount, allocate it during AIL initialisation. Add a back pointer to the struct xfs_ail so that we can pass around the xfs_ail and still be able to access the xfs_mount if need be. This is th first step involved in isolating the AIL implementation from the surrounding filesystem code. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32346a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:38:26 +11:00
David Chinner	a7444053fb	[XFS] Account for allocated blocks when expanding directories When we create a directory, we reserve a number of blocks for the maximum possible expansion of of the directory due to various btree splits, freespace allocation, etc. Unfortunately, each allocation is not reflected in the total number of blocks still available to the transaction, so the maximal reservation is used over and over again. This leads to problems where an allocation group has only enough blocks for some of the allocations required for the directory modification. After the first N allocations, the remaining blocks in the allocation group drops below the total reservation, and subsequent allocations fail because the allocator will not allow the allocation to proceed if the AG does not have the enough blocks available for the entire allocation total. This results in an ENOSPC occurring after an allocation has already occurred. This results in aborting the directory operation (leaving the directory in an inconsistent state) and cancelling a dirty transaction, which results in a filesystem shutdown. Avoid the problem by reflecting the number of blocks allocated in any directory expansion in the total number of blocks available to the modification in progress. This prevents a directory modification from being aborted part way through with an ENOSPC. SGI-PV: 988144 SGI-Modid: xfs-linux-melb:xfs-kern:32340a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 17:38:12 +11:00
David Chinner	8c38ab0320	[XFS] Prevent looping in xfs_sync_inodes_ag If the last block of the AG has inodes in it and the AG is an exactly power-of-2 size then the last inode in the AG points to the last block in the AG. If we try to find the next inode in the AG by adding one to the inode number, we increment the inode number past the size of the AG. The result is that the macro XFS_INO_TO_AGINO() will strip the AG portion of the inode number and return an inode number of zero. That is, instead of terminating the lookup loop because we hit the inode number went outside the valid range for the AG, the search index returns to zero and we start traversing the radix tree from the start again. This results in an endless loop in xfs_sync_inodes_ag(). Fix it be detecting if the new search index decreases as a result of incrementing the current inode number. That indicate an overflow and hence that we have finished processing the AG so we can terminate the loop. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32335a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:38:00 +11:00
David Chinner	116545130c	[XFS] kill deleted inodes list Now that the deleted inodes list is unused, kill it. This also removes the i_reclaim list head from the xfs_inode, shrinking it by two pointers. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32334a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:37:49 +11:00
David Chinner	7a3be02bae	[XFS] use the inode radix tree for reclaiming inodes Use the reclaim tag to walk the radix tree and find the inodes under reclaim. This was the only user of the deleted inode list. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32333a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:37:37 +11:00
David Chinner	396beb8531	[XFS] mark inodes for reclaim via a tag in the inode radix tree Prepare for removing the deleted inode list by marking inodes for reclaim in the inode radix trees so that we can use the radix trees to find reclaimable inodes. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32331a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:37:26 +11:00
David Chinner	1dc3318ae1	[XFS] rename inode reclaim functions The function names xfs_finish_reclaim and xfs_finish_reclaim_all are not very descriptive of what they are reclaiming. Rename to xfs_reclaim_inode[s] to match the xfs_sync_inodes() function. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32330a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:37:15 +11:00
David Chinner	fce08f2f3b	[XFS] move inode reclaim functions to xfs_sync.c Background inode reclaim is run by the xfssyncd. Move the reclaim worker functions to be close to the sync code as the are very similar in structure and are both run from the same background thread. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32329a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:37:03 +11:00
Lachlan McIlroy	493dca6178	[XFS] Fix build warning - xfs_fs_alloc_inode() needs a return statement SGI-PV: 988141 SGI-Modid: xfs-linux-melb:xfs-kern:32325a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 17:36:52 +11:00
David Chinner	99fa8cb3c5	[XFS] Prevent use-after-free caused by synchronous inode reclaim With the combined linux and XFS inode, we need to ensure that the combined structure is not freed before the generic code is finished with the inode. As it turns out, there is a case where the XFS inode is freed before the linux inode - when xfs_reclaim() is called from ->clear_inode() on a clean inode, the xfs inode is freed during that call. The generic code references the inode after the ->clear_inode() call, so this is a use after free situation. Fix the problem by moving the xfs_reclaim() call to ->destroy_inode() instead of in ->clear_inode(). This ensures the combined inode structure is not freed until after the generic code has finished with it. SGI-PV: 988141 SGI-Modid: xfs-linux-melb:xfs-kern:32324a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:36:40 +11:00
David Chinner	bf904248a2	[XFS] Combine the XFS and Linux inodes To avoid issues with different lifecycles of XFS and Linux inodes, embedd the linux inode inside the XFS inode. This means that the linux inode has the same lifecycle as the XFS inode, even when it has been released by the OS. XFS inodes don't live much longer than this (a short stint in reclaim at most), so there isn't significant memory usage penalties here. Version 3 o kill xfs_icount() Version 2 o remove unused commented out code from xfs_iget(). o kill useless cast in VFS_I() SGI-PV: 988141 SGI-Modid: xfs-linux-melb:xfs-kern:32323a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:36:14 +11:00
David Chinner	94b97e39b0	[XFS] Never call mark_inode_dirty_sync() directly Once the Linux inode and the XFS inode are combined, we cannot rely on just check if the linux inode exists as a method of determining if it is valid or not. Hence we should always call xfs_mark_inode_dirty_sync() instead as it does the correct checks to determine if the liinux inode is in a valid state or not. SGI-PV: 988141 SGI-Modid: xfs-linux-melb:xfs-kern:32318a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:21:30 +11:00
David Chinner	6441e54915	[XFS] factor xfs_iget_core() into hit and miss cases There are really two cases in xfs_iget_core(). The first is the cache hit case, the second is the miss case. They share very little code, and hence can easily be factored out into separate functions. This makes the code much easier to understand and subsequently modify. SGI-PV: 988141 SGI-Modid: xfs-linux-melb:xfs-kern:32317a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:21:19 +11:00
Christoph Hellwig	3471394ba5	[XFS] fix instant oops with tracing enabled We can only read inode->i_count if the inode is actually there and not a NULL pointer. This was introduced in one of the recent sync patches. SGI-PV: 988255 SGI-Modid: xfs-linux-melb:xfs-kern:32315a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 17:21:10 +11:00
David Chinner	76bf105cb1	[XFS] Move remaining quiesce code. With all the other filesystem sync code it in xfs_sync.c including the data quiesce code, it makes sense to move the remaining quiesce code to the same place. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32312a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:16:21 +11:00
David Chinner	a4e4c4f4a8	[XFS] Kill xfs_sync() There are no more callers to xfs_sync() now, so remove the function altogther. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32311a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:16:11 +11:00
David Chinner	cb56a4b995	[XFS] Kill SYNC_CLOSE SYNC_CLOSE is only ever used and checked in conjunction with SYNC_WAIT, and this only done in one spot. The only thing this does is make XFS_bflush() calls to the data buftargs. This will happen very shortly afterwards the xfs_sync() call anyway in the unmount path via the xfs_close_devices(), so this code is redundant and can be removed. That only user of SYNC_CLOSE is now gone, so kill the flag completely. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32310a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:16:00 +11:00
David Chinner	e9f1c6ee12	[XFS] make SYNC_DELWRI no longer use xfs_sync Continue to de-multiplex xfs_sync be replacing all SYNC_DELWRI callers with direct calls functions that do the work. Isolate the data quiesce case to a function in xfs_sync.c. Isolate the FSDATA case with explicit calls to xfs_sync_fsdata(). Version 2: o Push delwri related log forces into xfs_sync_inodes(). SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32309a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:15:50 +11:00
David Chinner	be97d9d557	[XFS] make SYNC_ATTR no longer use xfs_sync Continue to de-multiplex xfs_sync be replacing all SYNC_ATTR callers with direct calls xfs_sync_inodes(). Add an assert into xfs_sync() to ensure we caught all the SYNC_ATTR callers. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32308a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:15:38 +11:00
David Chinner	aacaa880bf	[XFS] xfssyncd: don't call xfs_sync Start de-multiplexing xfs_sync() by making xfs_sync_worker() call the specific sync functions it needs. This is only a small, unique subset of the entire xfs_sync() code so is easier to follow. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32307a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:15:29 +11:00
David Chinner	dfd837a9eb	[XFS] kill xfs_syncsub Now that the only caller is xfs_sync(), merge the two together as it makes no sense to keep them separate. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32306a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:15:21 +11:00
David Chinner	2030b5aba8	[XFS] use xfs_sync_inodes rather than xfs_syncsub Kill the unused arg in xfs_syncsub() and xfs_sync_inodes(). For callers of xfs_syncsub() that only want to flush inodes, replace xfs_syncsub() with direct calls to xfs_sync_inodes() as that is all that is being done with the specific flags being passed in. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32305a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:15:12 +11:00
David Chinner	bc60a99323	[XFS] Use struct inodes instead of vnodes to kill vn_grab With the sync code relocated to the linux-2.6 directory we can use struct inodes directly. If we do the same thing for the quota release code, we can remove vn_grab altogether. While here, convert the VN_BAD() checks to is_bad_inode() so we can remove vnodes entirely from this code. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32304a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:15:03 +11:00
Christoph Hellwig	2af75df7be	[XFS] split out two helpers from xfs_syncsub Split out two helpers from xfs_syncsub for the dummy log commit and the superblock writeout. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32303a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 17:14:53 +11:00
Christoph Hellwig	4e8938feba	[XFS] Move XFS_BMAP_SANITY_CHECK out of line. Move the XFS_BMAP_SANITY_CHECK macro out of line and make it a properly typed function. Also pass the xfs_buf for the btree block instead of just the btree block header, as we will need some additional information for it to implement CRC checking of btree blocks. SGI-PV: 988146 SGI-Modid: xfs-linux-melb:xfs-kern:32301a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 17:14:43 +11:00
Christoph Hellwig	7cc95a821d	[XFS] Always use struct xfs_btree_block instead of short / longform structures. Always use the generic xfs_btree_block type instead of the short / long structures. Add XFS_BTREE_SBLOCK_LEN / XFS_BTREE_LBLOCK_LEN defines for the length of a short / long form block. The rationale for this is that we will grow more btree block header variants to support CRCs and other RAS information, and always accessing them through the same datatype with unions for the short / long form pointers makes implementing this much easier. SGI-PV: 988146 SGI-Modid: xfs-linux-melb:xfs-kern:32300a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 17:14:34 +11:00
Christoph Hellwig	136341b41a	[XFS] cleanup btree record / key / ptr addressing macros. Replace the generic record / key / ptr addressing macros that use cpp token pasting with simpler macros that do the job for just one given btree type. The new macros lose the cur argument and thus can be used outside the core btree code, but also gain an xfs_mount * argument to allow for checking the CRC flag in the near future. Note that many of these macros aren't actually used in the kernel code, but only in userspace (mostly in xfs_repair). SGI-PV: 988146 SGI-Modid: xfs-linux-melb:xfs-kern:32295a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 17:11:40 +11:00
David Chinner	6c7699c047	[XFS] remove the mount inode list Now we've removed all users of the mount inode list, we can kill it. This reduces the size of the xfs_inode by 2 pointers. SGI-PV: 988139 SGI-Modid: xfs-linux-melb:xfs-kern:32293a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:11:29 +11:00
Christoph Hellwig	60197e8df3	[XFS] Cleanup maxrecs calculation. Clean up the way the maximum and minimum records for the btree blocks are calculated. For the alloc and inobt btrees all the values are pre-calculated in xfs_mount_common, and we switch the current loop around the ugly generic macros that use cpp token pasting to generate type names to two small helpers in normal C code. For the bmbt and bmdr trees these helpers also exist, but can be called during runtime, too. Here we also kill various macros dealing with them and inline the logic into the get_minrecs / get_maxrecs / get_dmaxrecs methods in xfs_bmap_btree.c. Note that all these new helpers take an xfs_mount * argument which will be needed to determine the size of a btree block once we add support for extended btree blocks with CRCs and other RAS information. SGI-PV: 988146 SGI-Modid: xfs-linux-melb:xfs-kern:32292a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 17:11:19 +11:00
David Chinner	5b4d89ae0f	[XFS] Traverse inode trees when releasing dquots Make releasing all inode dquots traverse the per-ag inode radix trees rather than the mount inode list. This removes another user of the mount inode list. Version 3 o fix comment relating to avoiding trying to release the quota inodes and those in reclaim. Version 2 o add comment explaining use of gang lookups for a single inode o use IRELE, not VN_RELE o move check for ag initialisation to caller. SGI-PV: 988139 SGI-Modid: xfs-linux-melb:xfs-kern:32291a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:08:03 +11:00
David Chinner	683a897080	[XFS] Use the inode tree for finding dirty inodes Update xfs_sync_inodes to walk the inode radix tree cache to find dirty inodes. This removes a huge bunch of nasty, messy code for traversing the mount inode list safely and removes another user of the mount inode list. Version 3 o rediff against new linux-2.6/xfs_sync.c code Version 2 o add comment explaining use of gang lookups for a single inode o use IRELE, not VN_RELE o move check for ag initialisation to caller. SGI-PV: 988139 SGI-Modid: xfs-linux-melb:xfs-kern:32290a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:07:29 +11:00
David Chinner	2f8a3ce1c2	[XFS] don't block in xfs_qm_dqflush() during async writeback. Normally dquots are written back via delayed write mechanisms. They are flushed to their backing buffer by xfssyncd, which is then pushed out by either AIL or xfsbufd flushing. The flush from the xfssyncd is supposed to be non-blocking, but xfs_qm_dqflush() always waits for pinned duots, which means that it will block for the length of time it takes to do a synchronous log force. This causes unnecessary extra log I/O to be issued whenever we try to flush a busy dquot. Avoid the log forces and blocking xfssyncd by making xfs_qm_dqflush() pay attention to what type of sync it is doing when it sees a pinned dquot and not waiting when doing non-blocking flushes. SGI-PV: 988147 SGI-Modid: xfs-linux-melb:xfs-kern:32287a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Peter Leckie <pleckie@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 17:07:20 +11:00
David Chinner	75c68f411b	[XFS] Remove xfs_iflush_all and clean up xfs_finish_reclaim_all() xfs_iflush_all() walks the m_inodes list to find inodes that need reclaiming. We already have such a list - the m_del_inodes list. Replace xfs_iflush_all() with a call to xfs_finish_reclaim_all() and clean up xfs_finish_reclaim_all() to handle the different flush modes now needed. Originally based on a patch from Christoph Hellwig. Version 3 o rediff against new linux-2.6/xfs_sync.c code Version 2 o revert xfs_syncsub() inode reclaim behaviour back to original code o xfs_quiesce_fs() should use XFS_IFLUSH_DELWRI_ELSE_ASYNC, not XFS_IFLUSH_ASYNC, to prevent change of behaviour. SGI-PV: 988139 SGI-Modid: xfs-linux-melb:xfs-kern:32284a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:06:28 +11:00

... 2 3 4 5 6 ...

1416 commits