redonkable/alistair23-linux

Author	SHA1	Message	Date
Sage Weil	c10f5e12ba	ceph: clear dir complete on d_move d_move() reorders the d_subdirs list, breaking the readdir result caching. Unless/until d_move preserves that ordering, clear CEPH_I_COMPLETE on rename. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-03 10:49:22 -07:00
Ryusuke Konishi	973bec34bf	nilfs2: fix sync silent failure As of `32a88aa1`, __sync_filesystem() will return 0 if s_bdi is not set. And nilfs does not set s_bdi anywhere. I noticed this problem by the warning introduced by the recent commit `5129a469` ("Catch filesystem lacking s_bdi"). WARNING: at fs/super.c:959 vfs_kern_mount+0xc5/0x14e() Hardware name: PowerEdge 2850 Modules linked in: nilfs2 loop tpm_tis tpm tpm_bios video shpchp pci_hotplug output dcdbas Pid: 3773, comm: mount.nilfs2 Not tainted 2.6.34-rc6-debug #38 Call Trace: [<c1028422>] warn_slowpath_common+0x60/0x90 [<c102845f>] warn_slowpath_null+0xd/0x10 [<c1095936>] vfs_kern_mount+0xc5/0x14e [<c1095a03>] do_kern_mount+0x32/0xbd [<c10a811e>] do_mount+0x671/0x6d0 [<c1073794>] ? __get_free_pages+0x1f/0x21 [<c10a684f>] ? copy_mount_options+0x2b/0xe2 [<c107b634>] ? strndup_user+0x48/0x67 [<c10a81de>] sys_mount+0x61/0x8f [<c100280c>] sysenter_do_call+0x12/0x32 This ensures to set s_bdi for nilfs and fixes the sync silent failure. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-03 07:36:01 -07:00
David Howells	17d2c0a0c4	NFS: Fix RCU issues in the NFSv4 delegation code Fix a number of RCU issues in the NFSv4 delegation code. (1) delegation->cred doesn't need to be RCU protected as it's essentially an invariant refcounted structure. By the time we get to nfs_free_delegation(), the delegation is being released, so no one else should be attempting to use the saved credentials, and they can be cleared. However, since the list of delegations could still be under traversal at this point by such as nfs_client_return_marked_delegations(), the cred should be released in nfs_do_free_delegation() rather than in nfs_free_delegation(). Simply using rcu_assign_pointer() to clear it is insufficient as that doesn't stop the cred from being destroyed, and nor does calling put_rpccred() after call_rcu(), given that the latter is asynchronous. (2) nfs_detach_delegation_locked() and nfs_inode_set_delegation() should use rcu_derefence_protected() because they can only be called if nfs_client::cl_lock is held, and that guards against anyone changing nfsi->delegation under it. Furthermore, the barrier imposed by rcu_dereference() is superfluous, given that the spin_lock() is also a barrier. (3) nfs_detach_delegation_locked() is now passed a pointer to the nfs_client struct so that it can issue lockdep advice based on clp->cl_lock for (2). (4) nfs_inode_return_delegation_noreclaim() and nfs_inode_return_delegation() should use rcu_access_pointer() outside the spinlocked region as they merely examine the pointer and don't follow it, thus rendering unnecessary the need to impose a partial ordering over the one item of interest. These result in an RCU warning like the following: [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- fs/nfs/delegation.c:332 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by mount.nfs4/2281: #0: (&type->s_umount_key#34){+.+...}, at: [<ffffffff810b25b4>] deactivate_super+0x60/0x80 #1: (iprune_sem){+.+...}, at: [<ffffffff810c332a>] invalidate_inodes+0x39/0x13a stack backtrace: Pid: 2281, comm: mount.nfs4 Not tainted 2.6.34-rc1-cachefs #110 Call Trace: [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2 [<ffffffffa00b4591>] nfs_inode_return_delegation_noreclaim+0x5b/0xa0 [nfs] [<ffffffffa0095d63>] nfs4_clear_inode+0x11/0x1e [nfs] [<ffffffff810c2d92>] clear_inode+0x9e/0xf8 [<ffffffff810c3028>] dispose_list+0x67/0x10e [<ffffffff810c340d>] invalidate_inodes+0x11c/0x13a [<ffffffff810b1dc1>] generic_shutdown_super+0x42/0xf4 [<ffffffff810b1ebe>] kill_anon_super+0x11/0x4f [<ffffffffa009893c>] nfs4_kill_super+0x3f/0x72 [nfs] [<ffffffff810b25bc>] deactivate_super+0x68/0x80 [<ffffffff810c6744>] mntput_no_expire+0xbb/0xf8 [<ffffffff810c681b>] release_mounts+0x9a/0xb0 [<ffffffff810c689b>] put_mnt_ns+0x6a/0x79 [<ffffffffa00983a1>] nfs_follow_remote_path+0x5a/0x146 [nfs] [<ffffffffa0098334>] ? nfs_do_root_mount+0x82/0x95 [nfs] [<ffffffffa00985a9>] nfs4_try_mount+0x75/0xaf [nfs] [<ffffffffa0098874>] nfs4_get_sb+0x291/0x31a [nfs] [<ffffffff810b2059>] vfs_kern_mount+0xb8/0x177 [<ffffffff810b2176>] do_kern_mount+0x48/0xe8 [<ffffffff810c810b>] do_mount+0x782/0x7f9 [<ffffffff810c8205>] sys_mount+0x83/0xbe [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b Also on: fs/nfs/delegation.c:215 invoked rcu_dereference_check() without protection! [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2 [<ffffffffa00b4223>] nfs_inode_set_delegation+0xfe/0x219 [nfs] [<ffffffffa00a9c6f>] nfs4_opendata_to_nfs4_state+0x2c2/0x30d [nfs] [<ffffffffa00aa15d>] nfs4_do_open+0x2a6/0x3a6 [nfs] ... And: fs/nfs/delegation.c:40 invoked rcu_dereference_check() without protection! [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2 [<ffffffffa00b3bef>] nfs_free_delegation+0x3d/0x6e [nfs] [<ffffffffa00b3e71>] nfs_do_return_delegation+0x26/0x30 [nfs] [<ffffffffa00b406a>] __nfs_inode_return_delegation+0x1ef/0x1fe [nfs] [<ffffffffa00b448a>] nfs_client_return_marked_delegations+0xc9/0x124 [nfs] ... Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-01 12:37:18 -04:00
Trond Myklebust	8f649c3762	NFSv4: Fix the locking in nfs_inode_reclaim_delegation() Ensure that we correctly rcu-dereference the delegation itself, and that we protect against removal while we're changing the contents. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2010-05-01 12:36:18 -04:00
Li Dongyang	6b933c8e6f	ocfs2: Avoid direct write if we fall back to buffered I/O when we fall back to buffered write from direct write, we call __generic_file_aio_write() but that will end up doing direct write even we are only prepared to do buffered write because the file has the O_DIRECT flag set. This is a fix for https://bugzilla.novell.com/show_bug.cgi?id=591039 revised with Joel's comments. Signed-off-by: Li Dongyang <lidongyang@novell.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-04-30 13:45:13 -07:00
Joel Becker	f9221fd803	Merge branch 'skip_delete_inode' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2-mark into ocfs2-fixes	2010-04-30 13:37:29 -07:00
Ralf Baechle	12b1b32168	Inotify: Fix build failure in inotify user support CONFIG_INOTIFY_USER defined but CONFIG_ANON_INODES undefined will result in the following build failure: LD vmlinux fs/built-in.o: In function 'sys_inotify_init1': (.text.sys_inotify_init1+0x22c): undefined reference to 'anon_inode_getfd' fs/built-in.o: In function `sys_inotify_init1': (.text.sys_inotify_init1+0x22c): relocation truncated to fit: R_MIPS_26 against 'anon_inode_getfd' make[2]: * [vmlinux] Error 1 make[1]: * [sub-make] Error 2 make: *** [all] Error 2 Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-30 10:14:56 -07:00
Linus Torvalds	e97e7120eb	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: add a shrinker to background inode reclaim	2010-04-29 19:49:34 -07:00
Linus Torvalds	fed0a9c644	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: exofs: Fix "add bdi backing to mount session" fall out fs: fs/super.c needs to include backing-dev.h for !CONFIG_BLOCK	2010-04-29 17:18:07 -07:00
Dave Chinner	9bf729c0af	xfs: add a shrinker to background inode reclaim On low memory boxes or those with highmem, kernel can OOM before the background reclaims inodes via xfssyncd. Add a shrinker to run inode reclaim so that it inode reclaim is expedited when memory is low. This is more complex than it needs to be because the VM folk don't want a context added to the shrinker infrastructure. Hence we need to add a global list of XFS mount structures so the shrinker can traverse them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-04-29 16:22:13 -05:00
Boaz Harrosh	3c2023dd8e	exofs: Fix "add bdi backing to mount session" fall out The patch: add bdi backing to mount session (`b3d0ab7e60`) Has a bug in the placement of the bdi member at struct exofs_sb_info. The layout member must be kept last. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-29 20:35:29 +02:00
Jens Axboe	5477d0face	fs: fs/super.c needs to include backing-dev.h for !CONFIG_BLOCK When CONFIG_BLOCK is set, it ends up getting backing-dev.h included. But for !CONFIG_BLOCK, it isn't so lucky. The proper thing to do is include <linux/backing-dev.h> directly from the file it's used from, so do that. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-29 20:33:35 +02:00
Linus Torvalds	27fb8d7b1f	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4 nfs: fix some issues in nfs41_proc_reclaim_complete() NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear NFS: Fix an unstable write data integrity race nfs: testing for null instead of ERR_PTR() NFS: rsize and wsize settings ignored on v4 mounts NFSv4: Don't attempt an atomic open if the file is a mountpoint SUNRPC: Fix a bug in rpcauth_prune_expired	2010-04-29 10:23:44 -07:00
Arnd Bergmann	f80a0ca6ad	pktcdvd: improve BKL and compat_ioctl.c usage The pktcdvd driver uses proper locking and does not need the BKL in the ioctl and llseek functions of the character device, so kill both. Moving the compat_ioctl handling from common code into the driver itself fixes build problems when CONFIG_BLOCK is disabled. Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-29 08:44:37 -07:00
Boaz Harrosh	a36fed12a4	exofs: Fix "add bdi backing to mount session" fall out Commit `b3d0ab7e60` ("exofs: add bdi backing to mount session") has a bug in the placement of the bdi member at struct exofs_sb_info. The layout member must be kept last. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-29 07:59:16 -07:00
Al Viro	d9e80b7de9	nfs d_revalidate() is too trigger-happy with d_drop() If dentry found stale happens to be a root of disconnected tree, we can't d_drop() it; its d_hash is actually part of s_anon and d_drop() would simply hide it from shrink_dcache_for_umount(), leading to all sorts of fun, including busy inodes on umount and oopsen after that. Bug had been there since at least 2006 (commit c636eb already has it), so it's definitely -stable fodder. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-28 20:40:03 -07:00
Xiaotian Feng	9699eda6bc	nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4 With CONFIG_NFS_V4 and data version 4, nfs_get_sb will allocate memory for export_path in nfs4_validate_text_mount_data, so we need to free it then. This is addressed in following kmemleak report: unreferenced object 0xffff88016bf48a50 (size 16): comm "mount.nfs", pid 22567, jiffies 4651574704 (age 175471.200s) hex dump (first 16 bytes): 2f 6f 70 74 2f 77 6f 72 6b 00 6b 6b 6b 6b 6b a5 /opt/work.kkkkk. backtrace: [<ffffffff814b34f9>] kmemleak_alloc+0x60/0xa7 [<ffffffff81102c76>] kmemleak_alloc_recursive.clone.5+0x1b/0x1d [<ffffffff811046b3>] __kmalloc_track_caller+0x18f/0x1b7 [<ffffffff810e1b08>] kstrndup+0x37/0x54 [<ffffffffa0336971>] nfs_parse_devname+0x152/0x204 [nfs] [<ffffffffa0336af3>] nfs4_validate_text_mount_data+0xd0/0xdc [nfs] [<ffffffffa0338deb>] nfs_get_sb+0x325/0x736 [nfs] [<ffffffff81113671>] vfs_kern_mount+0xbd/0x17c [<ffffffff81113798>] do_kern_mount+0x4d/0xed [<ffffffff81129a87>] do_mount+0x787/0x7fe [<ffffffff81129b86>] sys_mount+0x88/0xc2 [<ffffffff81009b42>] system_call_fastpath+0x16/0x1b Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Benny Halevy <bhalevy@panasas.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-04-28 13:46:28 -04:00
Dan Carpenter	acf82b85a7	nfs: fix some issues in nfs41_proc_reclaim_complete() The original code passed an ERR_PTR() to rpc_put_task() and instead of returning zero on success it returned -ENOMEM. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-04-28 13:45:12 -04:00
Linus Torvalds	970b06485f	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: coda: move backing-dev.h kernel include inside __KERNEL__ mtd: ensure that bdi entries are properly initialized and registered Move mtd_bdi_*mappable to mtdcore.c btrfs: convert to using bdi_setup_and_register() Catch filesystems lacking s_bdi drbd: Terminate a connection early if sending the protocol fails drbd: fix memory leak Fix JFFS2 sync silent failure smbfs: add bdi backing to mount session ncpfs: add bdi backing to mount session exofs: add bdi backing to mount session ecryptfs: add bdi backing to mount session coda: add bdi backing to mount session cifs: add bdi backing to mount session afs: add bdi backing to mount session. 9p: add bdi backing to mount session bdi: add helper function for doing init and register of a bdi for a file system block: ensure jiffies wrap is handled correctly in blk_rq_timed_out_timer	2010-04-28 07:56:05 -07:00
Linus Torvalds	11e39d993d	Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux * 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: nfsd4: bug in read_buf	2010-04-27 16:26:21 -07:00
Jerome Marchand	3835541dd4	procfs: fix tid fdinfo Correct the file_operations struct in fdinfo entry of tid_base_stuff[]. Presently /proc//task//fdinfo contains symlinks to opened files like /proc/*/fd/. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-27 16:26:03 -07:00
Trond Myklebust	ba8b06e67e	NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear Neil Brown reports that he is seeing the BUG_ON(ret == 0) trigger in nfs_page_async_flush. According to the trace in https://bugzilla.novell.com/show_bug.cgi?id=599628 the problem appears to be due to nfs_wb_page() not waiting for the PG_writeback flag to clear. There is a ditto problem in nfs_wb_page_cancel() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-04-27 18:33:54 -04:00
Christoph Egger	16a5b3c414	Remove redundant check for CONFIG_MMU The checks for CONFIG_MMU at this location are duplicated as all the code is located inside a #ifndef CONFIG_MMU block. So the first conditional block will always be included while the second never will. Signed-off-by: Christoph Egger <siccegge@stud.informatik.uni-erlangen.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-27 09:01:26 -07:00
Linus Torvalds	bc113f151a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus: squashfs: fix potential buffer over-run on 4K block file systems squashfs: add missing buffer free squashfs: fix warn_on when root inode is corrupted squashfs: fix locking bug in zlib wrapper	2010-04-27 08:59:38 -07:00
Neil Brown	2bc3c1179c	nfsd4: bug in read_buf When read_buf is called to move over to the next page in the pagelist of an NFSv4 request, it sets argp->end to essentially a random number, certainly not an address within the page which argp->p now points to. So subsequent calls to READ_BUF will think there is much more than a page of spare space (the cast to u32 ensures an unsigned comparison) so we can expect to fall off the end of the second page. We never encountered thsi in testing because typically the only operations which use more than two pages are write-like operations, which have their own decoding logic. Something like a getattr after a write may cross a page boundary, but it would be very unusual for it to cross another boundary after that. Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-04-26 15:39:08 -04:00
Dave Chinner	dd77ef924c	xfs: more swap extent fixes for dynamic fork offsets A new xfsqa test (226) with a prototype xfs_fsr change to try to handle dynamic fork offsets better triggers an assertion failure where the inode data fork is in btree format, yet there is room in the inode for it to be in extent format. The two inodes look like: before: ino 0x101 (target), num_extents 11, Max in-fork extents 6, broot size 40, fork offset 96 before: ino 0x115 (temp), num_extents 5, Max in-fork extents 3, broot size 40, fork offset 56 after: ino 0x101 (target), num_extents 5, Max in-fork extents 6, broot size 40, fork offset 96 after: ino 0x115 (temp), num_extents 11, Max in-fork extents 3, broot size 40, fork offset 56 Basically the target inode ends up with 5 extents in btree format, but it had space for 6 extents in extent format, so ends up incorrect. Notably here the broot size is the same, and that is where the kernel code is going wrong - the btree root will fit, so it lets the swap go ahead. The check should not allow the swap to take place if the number of extents while in btree format is less than the number of extents that can fit in the inode in extent format. Adding that check will prevent this swap and corruption from occurring. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-04-26 12:38:51 -05:00
Jens Axboe	e6d086d83c	btrfs: convert to using bdi_setup_and_register() It's now a provided helper, so get rid of the internal setup and btrfs atomic_t bdi enumerator. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-26 10:27:54 +02:00
Linus Torvalds	202f2bb070	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Issue the discard operation before releasing the blocks to be reused ext4: Fix buffer head leaks after calls to ext4_get_inode_loc() ext4: Fix possible lost inode write in no journal mode	2010-04-25 10:01:51 -07:00
Jörn Engel	5129a469a9	Catch filesystems lacking s_bdi noop_backing_dev_info is used only as a flag to mark filesystems that don't have any backing store, like tmpfs, procfs, spufs, etc. Signed-off-by: Joern Engel <joern@logfs.org> Changed the BUG_ON() to a WARN_ON(). Note that adding dirty inodes to the noop_backing_dev_info is not legal and will not result in them being flushed, but we already catch this condition in __mark_inode_dirty() when checking for a registered bdi. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-25 08:54:42 +02:00
Phillip Lougher	e0d1f70010	squashfs: fix potential buffer over-run on 4K block file systems Sizing the buffer based on block size is incorrect, leading to a potential buffer over-run on 4K block size file systems (because the metadata block size is always 8K). This bug doesn't seem have triggered because 4K block size file systems are not default, and also because metadata blocks after compression tend to be less than 4K. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2010-04-25 02:09:05 +01:00
Phillip Lougher	370ec3d1ed	squashfs: add missing buffer free Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2010-04-25 02:09:05 +01:00
Phillip Lougher	1cb08e9738	squashfs: fix warn_on when root inode is corrupted Fix warn_on triggered by mounting a fsfuzzer corrupted file system, where the root inode has been corrupted. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk> Reported-by: Steve Grubb <sgrubb@redhat.com>	2010-04-25 01:49:17 +01:00
Anton Blanchard	b8af67e268	fs/block_dev.c: fix performance regression in O_DIRECT\|O_SYNC writes to block devices We are seeing a large regression in database performance on recent kernels. The database opens a block device with O_DIRECT\|O_SYNC and a number of threads write to different regions of the file at the same time. A simple test case is below. I haven't defined DEVICE since getting it wrong will destroy your data :) On an 3 disk LVM with a 64k chunk size we see about 17MB/sec and only a few threads in IO wait: procs -----io---- -system-- -----cpu------ r b bi bo in cs us sy id wa st 0 3 0 16170 656 2259 0 0 86 14 0 0 2 0 16704 695 2408 0 0 92 8 0 0 2 0 17308 744 2653 0 0 86 14 0 0 2 0 17933 759 2777 0 0 89 10 0 Most threads are blocking in vfs_fsync_range, which has: mutex_lock(&mapping->host->i_mutex); err = fop->fsync(file, dentry, datasync); if (!ret) ret = err; mutex_unlock(&mapping->host->i_mutex); commit `148f948ba8` (vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode) offers some explanation of what is going on: Use these new helpers for syncing from generic VFS functions. This makes O_SYNC writes to block devices acquire i_mutex for syncing. If we really care about this, we can make block_fsync() drop the i_mutex and reacquire it before it returns. Thanks Jan for such a good commit message! As well as dropping i_mutex, Christoph suggests we should remove the call to sync_blockdev(): > sync_blockdev is an overcomplicated alias for filemap_write_and_wait on > the block device inode, which is exactly what we did just before calling > into ->fsync The patch below incorporates both suggestions. With it the testcase improves from 17MB/s to 68M/sec: procs -----io---- -system-- -----cpu------ r b bi bo in cs us sy id wa st 0 7 0 65536 1000 3878 0 0 70 30 0 0 34 0 69632 1016 3921 0 1 46 53 0 0 57 0 69632 1000 3921 0 0 55 45 0 0 53 0 69640 754 4111 0 0 81 19 0 Testcase: #define _GNU_SOURCE #include <stdio.h> #include <pthread.h> #include <unistd.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #define NR_THREADS 64 #define BUFSIZE (64 * 1024) #define DEVICE "/dev/mapper/XXXXXX" #define ALIGN(VAL, SIZE) (((VAL)+(SIZE)-1) & ~((SIZE)-1)) static int fd; static void doit(void arg) { unsigned long offset = (long)arg; char b, buf; b = malloc(BUFSIZE + 1024); buf = (char )ALIGN((unsigned long)b, 1024); memset(buf, 0, BUFSIZE); while (1) pwrite(fd, buf, BUFSIZE, offset); } int main(int argc, char argv[]) { int flags = O_RDWR\|O_DIRECT; int i; unsigned long offset = 0; if (argc > 1 && !strcmp(argv[1], "O_SYNC")) flags \|= O_SYNC; fd = open(DEVICE, flags); if (fd == -1) { perror("open"); exit(1); } for (i = 0; i < NR_THREADS-1; i++) { pthread_t tid; pthread_create(&tid, NULL, doit, (void )offset); offset += BUFSIZE; } doit((void )offset); return 0; } Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-24 11:31:26 -07:00
Jeff Mahoney	fb2162df74	reiserfs: fix corruption during shrinking of xattrs Commit `48b32a3553` ("reiserfs: use generic xattr handlers") introduced a problem that causes corruption when extended attributes are replaced with a smaller value. The issue is that the reiserfs_setattr to shrink the xattr file was moved from before the write to after the write. The root issue has always been in the reiserfs xattr code, but was papered over by the fact that in the shrink case, the file would just be expanded again while the xattr was written. The end result is that the last 8 bytes of xattr data are lost. This patch fixes it to use new_size. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=14826 Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reported-by: Christian Kujau <lists@nerdbynature.de> Tested-by: Christian Kujau <lists@nerdbynature.de> Cc: Edward Shishkin <edward.shishkin@gmail.com> Cc: Jethro Beekman <kernel@jbeekman.nl> Cc: Greg Surbey <gregsurbey@hotmail.com> Cc: Marco Gatti <marco.gatti@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-24 11:31:24 -07:00
Jeff Mahoney	cac36f7071	reiserfs: fix permissions on .reiserfs_priv Commit `677c9b2e39` ("reiserfs: remove privroot hiding in lookup") removed the magic from the lookup code to hide the .reiserfs_priv directory since it was getting loaded at mount-time instead. The intent was that the entry would be hidden from the user via a poisoned d_compare, but this was faulty. This introduced a security issue where unprivileged users could access and modify extended attributes or ACLs belonging to other users, including root. This patch resolves the issue by properly hiding .reiserfs_priv. This was the intent of the xattr poisoning code, but it appears to have never worked as expected. This is fixed by using d_revalidate instead of d_compare. This patch makes -oexpose_privroot a no-op. I'm fine leaving it this way. The effort involved in working out the corner cases wrt permissions and caching outweigh the benefit of the feature. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Edward Shishkin <edward.shishkin@gmail.com> Reported-by: Matt McCutchen <matt@mattmccutchen.net> Tested-by: Matt McCutchen <matt@mattmccutchen.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-24 11:31:24 -07:00
Joel Becker	a36d515c7a	ocfs2_dlmfs: Fix math error when reading LVB. When asked for a partial read of the LVB in a dlmfs file, we can accidentally calculate a negative count. Reported-by: Dan Carpenter <error27@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-04-23 15:24:59 -07:00
Tao Ma	c21a534e2f	ocfs2: Update VFS inode's id info after reflink. In reflink we update the id info on the disk but forgot to update the corresponding information in the VFS inode. Update them accordingly when we want to preserve the attributes. Reported-by: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Cc: <stable@kernel.org> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-04-23 14:43:22 -07:00
Dan Carpenter	0350cb078f	ocfs2: potential ERR_PTR dereference on error paths If "handle" is non null at the end of the function then we assume it's a valid pointer and pass it to ocfs2_commit_trans(); Signed-off-by: Dan Carpenter <error27@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-04-23 14:42:06 -07:00
Mark Fasheh	a9743fcdc0	ocfs2: Add directory entry later in ocfs2_symlink() and ocfs2_mknod() If we get a failure during creation of an inode we'll allow the orphan code to remove the inode, which is correct. However, we need to ensure that we don't get any errors after the call to ocfs2_add_entry(), otherwise we could leave a dangling directory reference. The solution is simple - in both cases, all I had to do was move ocfs2_dentry_attach_lock() above the ocfs2_add_entry() call. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-04-23 11:42:22 -07:00
Li Dongyang	062d340384	ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_mknod error path Mark the inode with flag OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_mknod, so we can kill the inode in case of error. [ Fixed up comment style -Mark ] Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-04-23 11:05:00 -07:00
Li Dongyang	ab41fdc8fd	ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_symlink error path Mark the inode with flag OCFS2_INODE_SKIP_ORPHAN_DIR when we get an error after allocating one, so that we can kill the inode. Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-04-23 11:05:00 -07:00
Li Dongyang	d4cd1871cf	ocfs2: add OCFS2_INODE_SKIP_ORPHAN_DIR flag and honor it in the inode wipe code Currently in the error path of ocfs2_symlink and ocfs2_mknod, we just call iput with the inode we failed with, but the inode wipe code will complain because we don't add the inode to orphan dir. One solution would be to lock the orphan dir during the entire transaction, but that's too heavy for a rare error path. Instead, we add a flag, OCFS2_INODE_SKIP_ORPHAN_DIR which tells the inode wipe code that it won't find this inode in the orphan dir. [ Merge fixes and comment style cleanups -Mark ] Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-04-23 11:03:49 -07:00
Josef Bacik	3a3076f4d6	Cleanup generic block based fiemap This cleans up a few of the complaints of __generic_block_fiemap. I've fixed all the typing stuff, used inline functions instead of macros, gotten rid of a couple of variables, and made sure the size and block requests are all block aligned. It also fixes a problem where sometimes FIEMAP_EXTENT_LAST wasn't being set properly. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-23 10:39:48 -07:00
Phillip Lougher	792590c723	squashfs: fix locking bug in zlib wrapper Fix locking bug in zlib wrapper introduced by recent decompressor changes. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2010-04-23 02:54:54 +01:00
Trond Myklebust	71d0a6112a	NFS: Fix an unstable write data integrity race Commit `2c61be0a94` (NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible) exposed a race on file close. In order to ensure correct close-to-open behaviour, we want to wait for all outstanding background commit operations to complete. This patch adds an inode flag that indicates if a commit operation is under way, and provides a mechanism to allow ->write_inode() to wait for its completion if this is a data integrity flush. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-04-22 15:35:57 -04:00
Dan Carpenter	cdd29ecfcb	nfs: testing for null instead of ERR_PTR() nfs_path() returns an ERR_PTR(), it doesn't return null. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-04-22 15:35:56 -04:00
Chuck Lever	356e76b855	NFS: rsize and wsize settings ignored on v4 mounts NFSv4 mounts ignore the rsize and wsize mount options, and always use the default transfer size for both. This seems to be because all NFSv4 mounts are now cloned, and the cloning logic doesn't copy the rsize and wsize settings from the parent nfs_server. I tested Fedora's 2.6.32.11-99 and it seems to have this problem as well, so I'm guessing that .33, .32, and perhaps older kernels have this issue as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Stable <stable@kernel.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-04-22 15:35:56 -04:00
Trond Myklebust	1f063d2cdf	NFSv4: Don't attempt an atomic open if the file is a mountpoint Fix https://bugzilla.kernel.org/show_bug.cgi?id=15789 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-04-22 15:35:55 -04:00
Jens Axboe	424264b7b2	smbfs: add bdi backing to mount session This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-22 12:37:07 +02:00
Jens Axboe	f1970c73cb	ncpfs: add bdi backing to mount session This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-22 12:31:11 +02:00
Jens Axboe	b3d0ab7e60	exofs: add bdi backing to mount session This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-22 12:26:04 +02:00
Jens Axboe	9df9c8b930	ecryptfs: add bdi backing to mount session This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-22 12:22:04 +02:00
Jens Axboe	5163d90076	coda: add bdi backing to mount session This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-22 12:12:40 +02:00
Jens Axboe	8044f7f468	cifs: add bdi backing to mount session This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-22 12:09:48 +02:00
Jens Axboe	e1da022275	afs: add bdi backing to mount session. This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-22 11:58:18 +02:00
Jens Axboe	0ed07ddb56	9p: add bdi backing to mount session This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-22 11:42:00 +02:00
Linus Torvalds	1ef6ce7a34	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68knommu: allow 4 coldfire serial ports m68knommu: fix coldfire tcdrain m68knommu: remove a duplicate vector setting line for 68360 Fix m68k-uclinux's rt_sigreturn trampoline m68knommu: correct the CC flags for Coldfire M5272 targets uclinux: error message when FLAT reloc symbol is invalid, v2	2010-04-21 12:33:12 -07:00
Linus Torvalds	255f41c595	Merge git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs * git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs: [LogFS] Split large truncated into smaller chunks [LogFS] Set s_bdi [LogFS] Prevent mempool_destroy NULL pointer dereference [LogFS] Move assertion [LogFS] Plug 8 byte information leak [LogFS] Prevent memory corruption on large deletes [LogFS] Remove unused method Fix trivial conflict with added header includes in fs/logfs/super.c	2010-04-21 12:31:12 -07:00
Linus Torvalds	9befb55ef5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: jfs: add jfs specific ->setattr call jfs: fix diAllocExt error in resizing filesystem jfs_dmap.[ch]: trivial typo fix: s/heigth/height/g	2010-04-21 12:30:07 -07:00
David Howells	083fd8b21a	AFS: Don't pass error value to page_cache_release() in error handling In the error handling in afs_mntpt_do_automount(), we pass an error pointer to page_cache_release() if read_mapping_page() failed. Instead, we should extend the gotos around the error handling we don't need. Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-21 12:27:43 -07:00
Jun Sun	d7dfee3f5d	uclinux: error message when FLAT reloc symbol is invalid, v2 This patch fixes a cosmetic error in printk. Text segment and data/bss segment are allocated from two different areas. It is not meaningful to give the diff between them in the error reporting messages. Signed-off-by: Jun Sun <jsun@junsun.net> Signed-off-by: Greg Ungerer <gerg@uclinux.org>	2010-04-21 13:28:49 +10:00
Theodore Ts'o	b90f687018	ext4: Issue the discard operation before releasing the blocks to be reused Otherwise, we can end up having data corruption because the blocks could get reused and then discarded! https://bugzilla.kernel.org/show_bug.cgi?id=15579 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-04-20 16:51:59 -04:00
Joern Engel	b6349ac89e	[LogFS] Split large truncated into smaller chunks Truncate would do an almost limitless amount of work without invoking the garbage collector in between. Split it up into more manageable, though still large, chunks. Signed-off-by: Joern Engel <joern@logfs.org>	2010-04-20 21:44:10 +02:00
Linus Torvalds	05ce7bfe54	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: quota: Convert __DQUOT_PARANOIA symbol to standard config option	2010-04-20 09:39:40 -07:00
Jan Kara	62af9b5205	quota: Convert __DQUOT_PARANOIA symbol to standard config option Make __DQUOT_PARANOIA define from the old days a standard config option and turn it off by default. This gets rid of a quota warning about writes before quota is turned on for systems with ext4 root filesystem. Currently there's no way to legally solve this because /etc/mtab has to be written before quota is turned on on most systems. Signed-off-by: Jan Kara <jack@suse.cz>	2010-04-20 18:25:25 +02:00
Linus Torvalds	9b030e2006	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6: eCryptfs: Turn lower lookup error messages into debug messages eCryptfs: Copy lower directory inode times and size on link ecryptfs: fix use with tmpfs by removing d_drop from ecryptfs_destroy_inode ecryptfs: fix error code for missing xattrs in lower fs eCryptfs: Decrypt symlink target for stat size eCryptfs: Strip metadata in xattr flag in encrypted view eCryptfs: Clear buffer before reading in metadata xattr eCryptfs: Rename ecryptfs_crypt_stat.num_header_bytes_at_front eCryptfs: Fix metadata in xattr feature regression	2010-04-19 14:20:32 -07:00
Tyler Hicks	9f37622f89	eCryptfs: Turn lower lookup error messages into debug messages Vaugue warnings about ENAMETOOLONG errors when looking up an encrypted file name have caused many users to become concerned about their data. Since this is a rather harmless condition, I'm moving this warning to only be printed when the ecryptfs_verbosity module param is 1. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-04-19 14:42:18 -05:00
Tyler Hicks	3a8380c075	eCryptfs: Copy lower directory inode times and size on link The timestamps and size of a lower inode involved in a link() call was being copied to the upper parent inode. Instead, we should be copying lower parent inode's timestamps and size to the upper parent inode. I discovered this bug using the POSIX test suite at Tuxera. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-04-19 14:42:15 -05:00
Jeff Mahoney	133b8f9d63	ecryptfs: fix use with tmpfs by removing d_drop from ecryptfs_destroy_inode Since tmpfs has no persistent storage, it pins all its dentries in memory so they have d_count=1 when other file systems would have d_count=0. ->lookup is only used to create new dentries. If the caller doesn't instantiate it, it's freed immediately at dput(). ->readdir reads directly from the dcache and depends on the dentries being hashed. When an ecryptfs mount is mounted, it associates the lower file and dentry with the ecryptfs files as they're accessed. When it's umounted and destroys all the in-memory ecryptfs inodes, it fput's the lower_files and d_drop's the lower_dentries. Commit `4981e081` added this and a d_delete in 2008 and several months later commit `caeeeecf` removed the d_delete. I believe the d_drop() needs to be removed as well. The d_drop effectively hides any file that has been accessed via ecryptfs from the underlying tmpfs since it depends on it being hashed for it to be accessible. I've removed the d_drop on my development node and see no ill effects with basic testing on both tmpfs and persistent storage. As a side effect, after ecryptfs d_drops the dentries on tmpfs, tmpfs BUGs on umount. This is due to the dentries being unhashed. tmpfs->kill_sb is kill_litter_super which calls d_genocide to drop the reference pinning the dentry. It skips unhashed and negative dentries, but shrink_dcache_for_umount_subtree doesn't. Since those dentries still have an elevated d_count, we get a BUG(). This patch removes the d_drop call and fixes both issues. This issue was reported at: https://bugzilla.novell.com/show_bug.cgi?id=567887 Reported-by: Árpád Bíró <biroa@demasz.hu> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Dustin Kirkland <kirkland@canonical.com> Cc: stable@kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-04-19 14:42:13 -05:00
Christian Pulvermacher	cfce08c6bd	ecryptfs: fix error code for missing xattrs in lower fs If the lower file system driver has extended attributes disabled, ecryptfs' own access functions return -ENOSYS instead of -EOPNOTSUPP. This breaks execution of programs in the ecryptfs mount, since the kernel expects the latter error when checking for security capabilities in xattrs. Signed-off-by: Christian Pulvermacher <pulvermacher@gmx.de> Cc: stable@kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-04-19 14:42:09 -05:00
Tyler Hicks	3a60a1686f	eCryptfs: Decrypt symlink target for stat size Create a getattr handler for eCryptfs symlinks that is capable of reading the lower target and decrypting its path. Prior to this patch, a stat's st_size field would represent the strlen of the encrypted path, while readlink() would return the strlen of the decrypted path. This could lead to confusion in some userspace applications, since the two values should be equal. https://bugs.launchpad.net/bugs/524919 Reported-by: Loïc Minier <loic.minier@canonical.com> Cc: stable@kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-04-19 14:41:51 -05:00
Joern Engel	b8639077ab	[LogFS] Set s_bdi Since `32a88aa1` sync() was turned into a NOP for logfs. Worse, sync() would not return an error, giving the illusion that writeout had actually happened. Afaics jffs2 was broken as well. Signed-off-by: Joern Engel <joern@logfs.org>	2010-04-17 19:54:27 +02:00
Dave Chinner	f1d486a361	xfs: don't warn on EAGAIN in inode reclaim Any inode reclaim flush that returns EAGAIN will result in the inode reclaim being attempted again later. There is no need to issue a warning into the logs about this situation. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-04-16 13:51:44 -05:00
Dave Chinner	b6f8dd49db	xfs: ensure that sync updates the log tail correctly Updates to the VFS layer removed an extra ->sync_fs call into the filesystem during the sync process (from the quota code). Unfortunately the sync code was unknowingly relying on this call to make sure metadata buffers were flushed via a xfs_buftarg_flush() call to move the tail of the log forward in memory before the final transactions of the sync process were issued. As a result, the old code would write a very recent log tail value to the log by the end of the sync process, and so a subsequent crash would leave nothing for log recovery to do. Hence in qa test 182, log recovery only replayed a small handle for inode fsync transactions in this case. However, with the removal of the extra ->sync_fs call, the log tail was now not moved forward with the inode fsync transactions near the end of the sync procese the first (and only) buftarg flush occurred after these transactions went to disk. The result is that log recovery now sees a large number of transactions for metadata that is already on disk. This usually isn't a problem, but when the transactions include inode chunk allocation, the inode create transactions and all subsequent changes are replayed as we cannt rely on what is on disk is valid. As a result, if the inode was written and contains unlogged changes, the unlogged changes are lost, thereby violating sync semantics. The fix is to always issue a transaction after the buftarg flush occurs is the log iѕ not idle or covered. This results in a dummy transaction being written that contains the up-to-date log tail value, which will be very recent. Indeed, it will be at least as recent as the old code would have left on disk, so log recovery will behave exactly as it used to in this situation. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-04-16 13:51:23 -05:00
Dmitry Monakhov	c7f2e1f0ac	jfs: add jfs specific ->setattr call generic setattr not longer responsible for quota transfer. use jfs_setattr for all jfs's inodes. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>	2010-04-16 08:05:50 -05:00
Bill Pemberton	2b0b39517d	jfs: fix diAllocExt error in resizing filesystem Resizing the filesystem would result in an diAllocExt error in some instances because changes in bmp->db_agsize would not get noticed if goto extendBmap was called. Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: jfs-discussion@lists.sourceforge.net Cc: linux-kernel@vger.kernel.org	2010-04-16 08:01:20 -05:00
Tao Ma	79681842e1	ocfs2: Reset status if we want to restart file extension. In __ocfs2_extend_allocation, we will restart our file extension if ((!status) && restart_func). But there is a bug that the status is still left as -EGAIN. This is really an old bug, but it is masked by the return value of ocfs2_journal_dirty. So it show up when we make ocfs2_journal_dirty void. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-04-16 03:10:54 -07:00
Joern Engel	1f1b0008e8	[LogFS] Prevent mempool_destroy NULL pointer dereference It would probably be better to just accept NULL pointers in mempool_destroy(). But for the current -rc series let's keep things simple. This patch was lost in the cracks for a while. Kevin Cernekee <cernekee@gmail.com> had to rediscover the problem and send a similar patch because of it. :( Signed-off-by: Joern Engel <joern@logfs.org>	2010-04-15 08:03:57 +02:00
Linus Torvalds	96e35b40c0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: use separate class for ceph sockets' sk_lock ceph: reserve one more caps space when doing readdir ceph: queue_cap_snap should always queue dirty context ceph: fix dentry reference leak in dcache readdir ceph: decode v5 of osdmap (pool names) [protocol change] ceph: fix ack counter reset on connection reset ceph: fix leaked inode ref due to snap metadata writeback race ceph: fix snap context reference leaks ceph: allow writeback of snapped pages older than 'oldest' snapc ceph: fix dentry rehashing on virtual .snap dir	2010-04-14 18:45:31 -07:00
Linus Torvalds	0fdfe5ad28	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFSv4: fix delegated locking NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible NFS: Fix a race with the new commit code NFS: Ensure that writeback_single_inode() calls write_inode() when syncing NFS: Fix the mode calculation in nfs_find_open_context NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR	2010-04-13 15:10:16 -07:00
Sage Weil	a6a5349d17	ceph: use separate class for ceph sockets' sk_lock Use a separate class for ceph sockets to prevent lockdep confusion. Because ceph sockets only get passed kernel pointers, there is no dependency from sk_lock -> mmap_sem. If we share the same class as other sockets, lockdep detects a circular dependency from mmap_sem (page fault) -> fs mutex -> sk_lock -> mmap_sem because dependencies are noted from both ceph and user contexts. Using a separate class prevents the sk_lock(ceph) -> mmap_sem dependency and makes lockdep happy. Signed-off-by: Sage Weil <sage@newdream.net>	2010-04-13 14:07:07 -07:00
Yehuda Sadeh	e1e4dd0caa	ceph: reserve one more caps space when doing readdir We were missing space for the directory cap. The result was a BUG at fs/ceph/caps.c:2178. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-04-13 12:28:54 -07:00
Sage Weil	fc837c8f04	ceph: queue_cap_snap should always queue dirty context This simplifies the calling convention, and fixes a bug where we queue a capsnap with a context other than i_head_snapc (the one that matches the dirty pages). The result was a BUG at fs/ceph/caps.c:2178 on writeback completion when a capsnap matching the writeback snapc could not be found. Signed-off-by: Sage Weil <sage@newdream.net>	2010-04-13 12:28:31 -07:00
Joern Engel	ead88af5f5	[LogFS] Move assertion The assertion is valid independently of the condition. Signed-off-by: Joern Engel <joern@logfs.org>	2010-04-13 17:57:21 +02:00
Joern Engel	d3a03f8031	[LogFS] Plug 8 byte information leak Within each journal segment, 8 bytes at offset 24 would remain uninitialized. Signed-off-by: Joern Engel <joern@logfs.org>	2010-04-13 17:54:27 +02:00
Joern Engel	032d8f7268	[LogFS] Prevent memory corruption on large deletes Removing sufficiently large files would create aliases for a large number of segments. This in turn results in a large number of journal entries and an overflow of s_je_array. Cheap fix is to add a BUG_ON, turning memory corruption into something annoying, but less dangerous. Real fix is to count the number of affected segments and prevent the problem completely. Signed-off-by: Joern Engel <joern@logfs.org>	2010-04-13 17:46:37 +02:00
Linus Torvalds	d6cf853d4d	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: make sure the chunk allocator doesn't create zero length chunks Btrfs: fix data enospc check overflow	2010-04-12 18:37:04 -07:00
Linus Torvalds	6a945f38be	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: quota: Fix possible dq_flags corruption quota: Hide warnings about writes to the filesystem before quota was turned on ext3: symlink must be handled via filesystem specific operation ext2: symlink must be handled via filesystem specific operation	2010-04-12 18:36:49 -07:00
Linus Torvalds	50fc88cb03	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6: udf: add speciffic ->setattr callback udf: potential integer overflow	2010-04-12 18:36:34 -07:00
Linus Torvalds	44fa2b4bee	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix typo "numer" -> "number" in alloc.c nilfs2: Remove an uninitialization warning in nilfs_btree_propagate_v() nilfs2: fix a wrong type conversion in nilfs_ioctl()	2010-04-12 18:34:25 -07:00
Sage Weil	f5b066287c	ceph: fix dentry reference leak in dcache readdir When filldir returned an error (e.g. buffer full for a large directory), we would leak a dentry reference, causing an oops on umount. Signed-off-by: Sage Weil <sage@newdream.net>	2010-04-12 14:25:51 -07:00
Andrew Perepechko	08261673cb	quota: Fix possible dq_flags corruption dq_flags are modified non-atomically in do_set_dqblk via __set_bit calls and atomically for example in mark_dquot_dirty or clear_dquot_dirty. Hence a change done by an atomic operation can be overwritten by a change done by a non-atomic one. Fix the problem by using atomic bitops even in do_set_dqblk. Signed-off-by: Andrew Perepechko <andrew.perepechko@sun.com> Signed-off-by: Jan Kara <jack@suse.cz>	2010-04-12 21:12:36 +02:00
Jan Kara	4c5e6c0e70	quota: Hide warnings about writes to the filesystem before quota was turned on For a root filesystem write to the filesystem before quota is turned on happens regularly and there's no way around it because of writes to syslog, /etc/mtab, and similar. So the warning is rather pointless for ordinary users. It's still useful during development so we just hide the warning behind __DQUOT_PARANOIA config option. Signed-off-by: Jan Kara <jack@suse.cz>	2010-04-12 21:12:19 +02:00
Dmitry Monakhov	774f03fb2c	ext3: symlink must be handled via filesystem specific operation generic setattr implementation is no longer responsible for quota transfer so synlinks must be handled via ext3_setattr. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>	2010-04-12 21:11:39 +02:00
Dmitry Monakhov	fc7683a3c3	ext2: symlink must be handled via filesystem specific operation generic setattr implementation is no longer responsible for quota transfer so synlinks must be handled via ext2_setattr. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>	2010-04-12 21:11:25 +02:00
Trond Myklebust	0df5dd4aae	NFSv4: fix delegated locking Arnaud Giersch reports that NFSv4 locking is broken when we hold a delegation since commit `8e469ebd6d` (NFSv4: Don't allow posix locking against servers that don't support it). According to Arnaud, the lock succeeds the first time he opens the file (since we cannot do a delegated open) but then fails after we start using delegated opens. The following patch fixes it by ensuring that locking behaviour is governed by a per-filesystem capability flag that is initially set, but gets cleared if the server ever returns an OPEN without the NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set. Reported-by: Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2010-04-12 07:55:15 -04:00
Ryusuke Konishi	be3bd2223b	nilfs2: fix typo "numer" -> "number" in alloc.c Fixes the typo found in a warning message of a persistent object allocator function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-04-12 01:51:03 +09:00
Trond Myklebust	2c61be0a94	NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible We always want to ensure that WRITE and COMMIT completes, whether or not the user presses ^C. Do this by making the call asynchronous, and allowing the user to do an interruptible wait for rpc_task completion. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-04-09 19:54:50 -04:00
Trond Myklebust	a6305ddb08	NFS: Fix a race with the new commit code This patch fixes a race which occurs due to the fact that we release the PG_writeback flag while still holding the nfs_page locked. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-04-09 19:08:17 -04:00
Trond Myklebust	b80c3cb628	NFS: Ensure that writeback_single_inode() calls write_inode() when syncing Since writeback_single_inode() checks the inode->i_state flags _before_ it flushes out the data, we need to ensure that the I_DIRTY_DATASYNC flag is already set. Otherwise we risk not seeing a call to write_inode(), which again means that we break fsync() et al... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-04-09 19:08:17 -04:00

1 2 3 4 5 ...

17642 commits