redonkable/alistair23-linux

Author	SHA1	Message	Date
Andi Kleen	2724b6db66	[PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems vfat implements compat handlers for these ioctls, but when they were executed on other file systems the kernel would still complain about an unknown compat ioctl. Just declare them as compatible and let them be rejected when not needed by the normal path. This makes wine runs a lot quieter Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:21 +02:00
Andi Kleen	a106009bdf	[PATCH] x86-64: Print type and size correctly for unknown compat ioctls Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:21 +02:00
Andi Kleen	9d016dd43b	[PATCH] x86-64: Shut up 32bit emulation for SIOCGIFCOUNT The kernel doesn't implement it, but some programs like java use it anyways. Shut the code up. Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Andi Kleen	421f028100	[PATCH] x86-64: Define IGNORE_IOCTL() macro for compat_ioctls Define a new IGNORE_IOCTL() to let a compat ioctl not be warned about even when it is not implemented. This is the same as COMPATIBLE_IOCTL internally, but better self documentng. Valid reasons to use this: - It is implemented with ->compat_ioctl on some device, but programs call it on others too. - The ioctl is not implemented in the native kernel, but programs call it commonly anyways. Most other reasons are not valid. Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Ian Campbell	79e030114a	[PATCH] i386: Allow i386 crash kernels to handle x86_64 dumps The specific case I am encountering is kdump under Xen with a 64 bit hypervisor and 32 bit kernel/userspace. The dump created is 64 bit due to the hypervisor but the dump kernel is 32 bit for maximum compatibility. It's possibly less likely to be useful in a purely native scenario but I see no reason to disallow it. [akpm@linux-foundation.org: build fix] Signed-off-by: Ian Campbell <ian.campbell@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Vivek Goyal <vgoyal@in.ibm.com> Cc: Horms <horms@verge.net.au> Cc: Magnus Damm <magnus.damm@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:09 +02:00
Jason Uhlenkott	a19b89cad5	NFS: Clean up nfs_create_request comments Remove some stale comments about hard limits which went away in 2.5. Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-02 07:37:29 -07:00
J. Bruce Fields	08efa202eb	NFS4: invalidate cached acl on setacl The ACL that the server sets may not be exactly the one we set--for example, it may silently turn off bits that it does not support. So we should remove any cached ACL so that any subsequent request for the ACL will go to the server. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-02 07:36:09 -07:00
Steven Whitehouse	37fde8ca6c	[GFS2] Uncomment sprintf_symbol calling code Now that the patch from -mm has gone upstream, we can uncomment the code in GFS2 which uses sprintf_symbol. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Robert Peterson <rpeterso@redhat.com>	2007-05-01 09:51:39 +01:00
David Teigland	617e82e10c	[DLM] lowcomms style Replace some printk with log_print, and fix some simple cases of lines over 80. Also, return -ENOTCONN if lowcomms_start fails due to no local IP address being available. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:51 +01:00
akpm@linux-foundation.org	f391a4ead6	[GFS2] printk warning fixes alpha: fs/gfs2/dir.c: In function 'gfs2_dir_read_leaf': fs/gfs2/dir.c:1322: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'sector_t' fs/gfs2/dir.c: In function 'gfs2_dir_read': fs/gfs2/dir.c:1455: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type '__u64' Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:48 +01:00
Steven Whitehouse	bf126aee6d	[GFS2] Patch to fix mmap of stuffed files If a stuffed file is mmaped and a page fault is generated at some offset above the initial page, we need to create a zero page to hang the buffer heads off before we can unstuff the file. This is a fix for bz #236087 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:46 +01:00
Josef Bacik	476c006be0	[GFS2] use lib/parser for parsing mount options This patch converts the mount option parsing to use the kernels lib/parser stuff like all of the other filesystems. I tested this and it works well. Thank you, Signed-off-by: Josef Bacik <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:43 +01:00
Patrick Caulfield	30d3a2373f	[DLM] Lowcomms nodeid range & initialisation fixes Fix a few range & initialization bugs in lowcomms. - max_nodeid is really the highest nodeid encountered, so all loops must include it in their iterations. - clean dlm_local_count & connection_idr so we can do a clean restart. - Remove a spurious BUG_ON Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:41 +01:00
Josef Bacik	2439fe5072	[DLM] Fix dlm_lowcoms_stop hang When you attempt to release a lockspace in DLM, it will hang trying to down a semaphore that has already been downed. The attached patch fixes the problem. Signed-off-by: Josef Bacik <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Patrick Caulfield <pcaulfie@redhat.com>	2007-05-01 09:11:38 +01:00
David Teigland	7d3c1feb80	[DLM] fix mode munging There are flags to enable two specialized features in the dlm: 1. CONVDEADLK causes the dlm to resolve conversion deadlocks internally by changing the granted mode of locks to NL. 2. ALTPR/ALTCW cause the dlm to change the requested mode of locks to PR or CW to grant them if the normal requested mode can't be granted. GFS direct i/o exercises both of these features, especially when mixed with buffered i/o. The dlm has problems with them. The first problem is on the master node. If it demotes a lock as a part of converting it, the actual step of converting the lock isn't being done after the demotion, the lock is just left sitting on the granted queue with a granted mode of NL. I think the mistaken assumption was that the call to grant_pending_locks() would grant it, but that function naturally doesn't look at locks on the granted queue. The second problem is on the process node. If the master either demotes or gives an altmode, the munging of the gr/rq modes is never done in the process copy of the lock, leaving the master/process copies out of sync. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:36 +01:00
Robert Peterson	5f8820960c	[GFS2] lockdump improvements The patch below consists of the following changes (in code order): 1. I fixed a minor compiler warning regarding the printing of a kernel symbol address. 2. I implemented a suggestion from Dave Teigland that moves the debugfs information for gfs2 into a subdirectory so we can easily expand our use of debugfs in the future. The current code keeps the glock information in: /debug/gfs2/<fs> With the patch, the new code keeps the glock information in: /debug/gfs2/<fs>/glock That will allow us to create more debugfs files in the future. 3. This fixes a bug whereby a failed mount attempt causes the debugfs file to not be deleted. Failed mount attempts should always clean up after themselves, including deleting the debugfs file and/or directory. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:33 +01:00
Steven Whitehouse	bdd19a22f8	[GFS2] Patch to detect corrupt number of dir entries in leaf and/or inode blocks This patch detects when the number of entries in a leaf block or inode block (in the case of stuffed directories) is corrupt and informs the user. It prevents us from running off the end of the array thats been allocated for the sorting in this case, Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:30 +01:00
Robert Peterson	7a0079d9e3	[GFS2] bz 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump) This is for Bugzilla Bug 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump) seen at the "gfs2 summit". This also fixes the bug that caused garbage to be printed by the "initialized at" field. I apologize for the kludge, but that code will all be ripped out anyway when the official sprint_symbol function becomes available in the Linux kernel. I also changed some formatting so that spaces are replaced by proper tabs. Signed-off-by: Robert Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:28 +01:00
Adrian Bunk	8fa1de386f	[DLM] fs/dlm/ast.c should #include "ast.h" Every file should include the headers containing the prototypes for it's global functions. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:25 +01:00
Patrick Caulfield	6ed7257b46	[DLM] Consolidate transport protocols This patch consolidates the TCP & SCTP protocols for the DLM into a single file and makes it switchable at run-time (well, at least before the DLM actually starts up!) For RHEL5 this patch requires Neil Horman's patch that expands the in-kernel socket API but that has already been twice ACKed so it should be OK. The patch adds a new lowcomms.c file that replaces the existing lowcomms-sctp.c & lowcomms-tcp.c files. Signed-off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:23 +01:00
Patrick Caulfield	fc7c44f03d	[DLM] Remove redundant assignment This patch removes a redundant (and incorrect) assignment from compat_output Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:20 +01:00
Steven Whitehouse	a43a49066d	[GFS2] Fix bz 234168 (ignoring rgrp flags) Ths following patch makes GFS2 use the rgrp flags properly. Although there are also separate flags for both data and metadata as well, I've not implemented these as there seems little use for them. On the otherhand, the "noalloc" flag is generally useful for future changes we might which to make, so this ensures that we interpret it correctly. In addition I fixed the comment above the function which was incorrect. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:17 +01:00
David Teigland	ce03f12b37	[DLM] change lkid format A lock id is a uint32 and is used as an opaque reference to the lock. For userland apps, the lkid is passed up, through libdlm, as the return value from a write() on the dlm device. This created a problem when the high bit was 1, making the lkid look like an error. This is fixed by changing how the lkid is composed. The low 16 bits identified the hash bucket for the lock and the high 16 bits were a per-bucket counter (which eventually hit 0x8000 causing the problem). These are simply swapped around; the number of hash table buckets is far below 0x8000, making all lkid's positive when viewed as signed. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:15 +01:00
David Teigland	72c2be776b	[DLM] interface for purge (2/2) Add code to accept purge commands from userland. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:12 +01:00
David Teigland	8499137d4e	[DLM] add orphan purging code (1/2) Add code for purging orphan locks. A process can also purge all of its own non-orphan locks by passing a pid of zero. Code already exists for processes to create persistent locks that become orphans when the process exits, but the complimentary capability for another process to then purge these orphans has been missing. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:10 +01:00
David Teigland	7e4dac3359	[DLM] split create_message function This splits the current create_message() function into two parts so that later patches can call the new lower-level _create_message() function when they don't have an rsb struct. No functional change in this patch. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:07 +01:00
Steven Whitehouse	f01963f264	[GFS2] Set drop_count to 0 (off) by default This sets the drop_count to 0 by default which is a better default for most people. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:05 +01:00
David Teigland	b9af8a788a	[GFS2] use log_error before LM_OUT_ERROR We always want to see the details of the error returned to gfs, but log_debug is often turned off, so use log_error (printk). Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:02 +01:00
David Teigland	ef0c2bb05f	[DLM] overlapping cancel and unlock Full cancel and force-unlock support. In the past, cancel and force-unlock wouldn't work if there was another operation in progress on the lock. Now, both cancel and unlock-force can overlap an operation on a lock, meaning there may be 2 or 3 operations in progress on a lock in parallel. This support is important not only because cancel and force-unlock are explicit operations that an app can use, but both are used implicitly when a process exits while holding locks. Summary of changes: - add-to and remove-from waiters functions were rewritten to handle situations with more than one remote operation outstanding on a lock - validate_unlock_args detects when an overlapping cancel/unlock-force can be sent and when it needs to be delayed until a request/lookup reply is received - processing request/lookup replies detects when cancel/unlock-force occured during the op, and carries out the delayed cancel/unlock-force - manipulation of the "waiters" (remote operation) state of a lock moved under the standard rsb mutex that protects all the other lock state - the two recovery routines related to locks on the waiters list changed according to the way lkb's are now locked before accessing waiters state - waiters recovery detects when lkb's being recovered have overlapping cancel/unlock-force, and may not recover such locks - revert_lock (cancel) returns a value to distinguish cases where it did nothing vs cases where it actually did a cancel; the cancel completion ast should only be done when cancel did something - orphaned locks put on new list so they can be found later for purging - cancel must be called on a lock when making it an orphan - flag user locks (ENDOFLIFE) at the end of their useful life (to the application) so we can return an error for any further cancel/unlock-force - we weren't setting COMP/BAST ast flags if one was already set, so we'd lose either a completion or blocking ast - clear an unread bast on a lock that's become unlocked Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:00 +01:00
Patrick Caulfield	0320672702	[DLM] fix coverity-spotted stupidity Replacement patch to remove redundant code rather than moving it around. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:57 +01:00
Robert Peterson	04b933f27b	[GFS2] Red Hat bz 228540: owner references In Testing the previously posted and accepted patch for https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228540 I uncovered some gfs2 badness. It turns out that the current gfs2 code saves off a process pointer when glocks is taken in both the glock and glock holder structures. Those structures will persist in memory long after the process has ended; pointers to poisoned memory. This problem isn't caused by the 228540 fix; the new capability introduced by the fix just uncovered the problem. I wrote this patch that avoids saving process pointers and instead saves off the process pid. Rather than referencing the bad pointers, it now does process lookups. There is special code that makes the output nicer for printing holder information for processes that have ended. This patch also adds a stub for the new "sprint_symbol" function that exists in Andrew Morton's -mm patch set, but won't go into the base kernel until 2.6.22, since it adds functionality but doesn't fix a bug. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:55 +01:00
Benjamin Marzinski	172e045a7f	[GFS2] flush the log if a transaction can't allocate space This is a fix for bz #208514. When GFS2 frees up space, the freed blocks aren't available for reuse until the resource group is successfully written to the ondisk journal. So in rare cases, GFS2 operations will fail, saying that the filesystem is out of space, when in reality, you are just waiting for a log flush. For instance, on a 1Gig filesystem, if I continually write 10 Mb to a file, and then truncate it, after a hundred interations, the write will fail with -ENOSPC, even though the filesystem is just 1% full. The attached patch calls a log flush in these cases. I tested this patch fairly heavily to check if there were any locking issues that I missed, and it seems to work just fine. Also, this patch only does the log flush if get_local_rgrp makes a complete loop of resource groups without skipping any do to locking issues. The code would be slightly simpler if it just always did the log flush after the first failed pass, and you could only ever have to go through the loop twice, instead of up to three times. However, I guessed that failing to find a rg simply do to locking issues would be common enough to skip the log flush in that case, but I'm not certain that this is the right way to go. Either way, I don't suppose this code will be hit all that often. Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:52 +01:00
Benjamin Marzinski	6883562588	[GFS2] Fix log entry list corruption When glock_lo_add and rg_lo_add attempt to add an element to the log, they check to see if has already been added before locking the log. If another process adds that element to the log in this window between the check and locking the log, the element will be added to the list twice. This causes the log element list to become corrupted in such a way that the log element can never be successfully removed from the list. This patch pulls the list_empty() check inside the log lock, to remove this window. Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:50 +01:00
Steven Whitehouse	f35ac346bc	[GFS2] Speed up lock_dlm's locking (move sprintf) The following patch speeds up lock_dlm's locking by moving the sprintf out from the lock acquisition path and into the lock creation path. This reduces the amount of CPU time used in acquiring locks by a fair amount. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: David Teigland <teigland@redhat.com>	2007-05-01 09:10:47 +01:00
Patrick Caulfield	254da030df	[DLM] Don't delete misc device if lockspace removal fails Currently if the lockspace removal fails the misc device associated with a lockspace is left deleted. After that there is no way to access the orphaned lockspace from userland. This patch recreates the misc device if th dlm_release_lockspace fails. I believe this is better than attempting to remove the lockspace first because that leaves an unattached device lying around. The potential gap in which there is no access to the lockspace between removing the misc device and recreating it is acceptable ... after all the application is trying to remove it, and only new users of the lockspace will be affected. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:44 +01:00
Steven Whitehouse	420d2a1028	[GFS2] Fix a bug on i386 due to evaluation order Since gcc didn't evaluate the last two terms of the expression in glock.c:1881 as a constant expression, it resulted in an error on i386 due to the lack of a 64bit divide instruction. This adds some brackets to fix the problem. This was reported by Andrew Morton. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org>	2007-05-01 09:10:42 +01:00
Steven Whitehouse	3b8249f617	[GFS2] Fix bz 224480 and cleanup glock demotion code This patch prevents the printing of a warning message in cases where the fs is functioning normally by handing off responsibility for unlinked, but still open inodes, to another node for eventual deallocation. Also, there is now an improved system for ensuring that such requests to other nodes do not get lost. The callback on the iopen lock is only ever called when i_nlink == 0 and when a node is unable to deallocate it due to it still being in use on another node. When a node receives the callback therefore, it knows that i_nlink must be zero, so we mark it as such (in gfs2_drop_inode) in order that it will then attempt deallocation of the inode itself. As an additional benefit, queuing a demote request no longer requires a memory allocation. This simplifies the code for dealing with gfs2_holders as it removes one special case. There are two new fields in struct gfs2_glock. gl_demote_state is the state which the remote node has requested and gl_demote_time is the time when the request came in. Both fields are only valid when the GLF_DEMOTE flag is set in gl_flags. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:39 +01:00
Josef Whiter	1de9139092	[GFS2] Fix bz 231380, unlock page before dequeing glocks in gfs2_commit_write If we are writing a file, and in the middle of writing the file another node attempts to get a shared lock on that file (by doing a du for example) the process doing the writing will hang waiting on lock_page. The reason for this is because when we have waiters on a exclusive glock, we will go through and flush out all dirty pages associated with that inode and release the lock. The problem is that when we flush the dirty pages, we could hit a page that we have locked durring the generic_file_buffered_write part of this operation. This patch unlocks the page before we go to dequeue the lock and locks it immediatly afterwards, since generic_file_buffered_write needs the page locked when the commit_write is completed. This patch resolves the problem, however if somebody sees a better way to do this please don't hesistate to yell. Signed-off-by: Josef Whiter <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:37 +01:00
Patrick Caulfield	89adc934f3	[DLM] Fix uninitialised variable in receiving The length of the second element of the kvec array was not initialised before being added to the first one. This could cause invalid lengths to be passed to kernel_recvmsg Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:34 +01:00
Josef Whiter	5c7342d894	[GFS2] fix bz 231369, gfs2 will oops if you specify an invalid mount option If you specify an invalid mount option when trying to mount a gfs2 filesystem, gfs2 will oops. The attached patch resolves this problem. Signed-off-by: Josef Whiter <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:32 +01:00
Robert Peterson	7c52b166c5	[GFS2] Add gfs2_tool lockdump support to gfs2 (bz 228540) The attached patch resolves bz 228540. This adds the capability for gfs2 to dump gfs2 locks through the debugfs file system. This used to exist in gfs1 as "gfs_tool lockdump" but it's missing from gfs2 because all the ioctls were stripped out. Please see the bugzilla for more history about the fix. This patch is also attached to the bugzilla record. The patch is against Steve Whitehouse's latest nmw git tree kernel (2.6.21-rc1) and has been tested on system trin-10. Signed-off-by: Robert Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:29 +01:00
Neil Brown	83672d392f	NFS: Fix directory caching problem - with test case and patch. Try running this script in an NFS mounted directory (Client relatively recent - 2.6.18 has the problem as does 2.6.20). ------------------------------------------------------ #!/bin/bash # # This script will produce the following errormessage from tar: # # tar: newdir/innerdir/innerfile: file changed as we read it # create dirs rm -rf nfstest mkdir -p nfstest/dir/innerdir # create files (should not be empty) echo "Hello World!" >nfstest/dir/file echo "Hello World!" >nfstest/dir/innerdir/innerfile # problem only happens if we sleep before chmod sleep 1 # change file modes chmod -R a+r nfstest # rename dir mv nfstest/dir nfstest/newdir # tar it tar -cf nfstest/nfstest.tar -C nfstest newdir # restore old dir name mv nfstest/newdir nfstest/dir -------------------------------------------------------- What happens: The 'chmod -R' does a readdir_plus in each directory and the results get cached in the page cache. It then updates the ctime on each file by one second. When this happens, the post-op attributes are used to update the ctime stored on the client to match the value in the kernel. The 'mv' calls shrink_dcache_parent on the directory tree which flushes all the dentries (so a new lookup will be required) but doesn't flush the inodes or pagecache. The 'tar' does a readdir on each directory, but (in the case of 'innerdir' at least) satisfies it from the pagecache and uses the READDIRPLUS data to update all the inodes. In the case of 'innerdir/innerfile', the ctime is out of date. 'tar' then calls 'lstat' on innerdir/innerfile getting an old ctime. It then opens the file (triggering a GETATTR), reads the content, and then calls fstat to see if anything has changed. It finds that ctime has changed and so complains. The problem seems to be that the cache readdirplus info is kept around for too long. My patch below discards pagecache data for directories when dentry_iput is called on them. This effectively removes the symptom which convinces me that I correctly understand the problem. However I'm not convinced that is a proper solution, as there could easily be other races that trigger the same problem without being affected by this 'fix'. One possibility would be to require that readdirplus pagecache data be only used once to instantiate an inode. Somehow it should then be invalidated so that if the dentry subsequently disappears, it will cause a new request to the server to fill in the stat data. Another possibility is to compare the cache_change_attribute on the inode with something similar for the readdirplus info and reject the info from readdirplus if it is too old. I haven't tried to implement these and would value other opinions before I do. Thanks, NeilBrown Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:19 -07:00
Neil Brown	1f4eab7e7c	NFS: Set meaningful value for fattr->time_start in readdirplus results. Don't use uninitialsed value for fattr->time_start in readdirplus results. The 'fattr' structure filled in by nfs3_decode_direct does not get a value for ->time_start set. Thus if an entry is for an inode that we already have in cache, when nfs_readdir_lookup calls nfs_fhget, it will call nfs_refresh_inode and may update the inode with out-of-date information. Directories are read a page at a time, so each page could have a different timestamp that "should" be used to set the time_start for the fattr for info in that page. However storing the timestamp per page is awkward. (We could stick in the first 4 bytes and only read 4092 bytes, but that is a bigger code change than I am interested it). This patch ignores the readdir_plus attributes if a readdir finds the information already in cache, and otherwise sets ->time_start to the time the readdir request was sent to the server. It might be nice to store - in the directory inode - the time stamp for the earliest readdir request that is still in the page cache, so that we don't ignore attribute data that we don't have to. This patch doesn't do that. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:18 -07:00
Steve Dickson	74dd34e6e8	NFS: Added support to turn off the NFSv3 READDIRPLUS RPC. READDIRPLUS can be a performance hindrance when the client is working with large directories. In addition, some servers still have bugs in their implementations (e.g. Tru64 returns wrong values for the fsid). Add a mount flag to enable users to turn it off at mount time following the implementation in Apple's NFS client. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:16 -07:00
Chuck Lever	00a6e7bbf9	SUNRPC: RPC client should retry with different versions of rpcbind Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:16 -07:00
Chuck Lever	df8b172a88	NFS: switch NFSROOT to use new rpcbind client It is arguable whether NFSROOT will support IPv6, and thus whether rpcb_getport_external needs to support rpcbind versions greater than 2. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:14 -07:00
Chuck Lever	2bea90d43a	SUNRPC: RPC buffer size estimates are too large The RPC buffer size estimation logic in net/sunrpc/clnt.c always significantly overestimates the requirements for the buffer size. A little instrumentation demonstrated that in fact rpc_malloc was never allocating the buffer from the mempool, but almost always called kmalloc. To compute the size of the RPC buffer more precisely, split p_bufsiz into two fields; one for the argument size, and one for the result size. Then, compute the sum of the exact call and reply header sizes, and split the RPC buffer precisely between the two. That should keep almost all RPC buffers within the 2KiB buffer mempool limit. And, we can finally be rid of RPC_SLACK_SPACE! Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:10 -07:00
Chuck Lever	511d2e8855	NLM: Shrink the maximum request size of NLM4 requests NLM version 4 requests estimate the call and reply header sizes rather conservatively, using the very maximum size allowed in the protocol even though Linux always uses only a small fraction of the allowable space. Reduce the size of caller and lock arguments to conserve RPC buffer space while XDR encoding NLM4 arguments. Add compile-time checks to ensure the hostname string won't overflow NLM protocol maximums. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:09 -07:00
Trond Myklebust	ca52fec152	NFS: Use pgoff_t in structures and functions that pass page cache offsets Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:09 -07:00
Trond Myklebust	724c439c20	NFS: Clean up nfs_sync_mapping_wait() It has no business touching wbc->pages_skipped. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:08 -07:00
Trond Myklebust	8d5658c949	NFS: Fix a buffer overflow in the allocation of struct nfs_read/writedata Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:07 -07:00
Trond Myklebust	c63c7b0513	NFS: Fix a race when doing NFS write coalescing Currently we do write coalescing in a very inefficient manner: one pass in generic_writepages() in order to lock the pages for writing, then one pass in nfs_flush_mapping() and/or nfs_sync_mapping_wait() in order to gather the locked pages for coalescing into RPC requests of size "wsize". In fact, it turns out there is actually a deadlock possible here since we only start I/O on the second pass. If the user signals the process while we're in nfs_sync_mapping_wait(), for instance, then we may exit before starting I/O on all the requests that have been queued up. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:06 -07:00
Trond Myklebust	8b09bee308	NFS: Cleanup for nfs_readpages() Do the coalescing of read requests into block sized requests at start of I/O as we scan through the pages instead of going through a second pass. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:05 -07:00
Trond Myklebust	bcb71bba7e	NFS: Another cleanup of the read/write request coalescing code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:04 -07:00
Trond Myklebust	d8a5ad75cc	NFS: Cleanup the coalescing code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:04 -07:00
Trond Myklebust	91e59c368c	NFS: Don't wait for congestion in nfs_update_request() It is redundant, and will interfere with the call to balance_dirty_pages_ratelimited_nr in generic_file_write(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:03 -07:00
Amnon Aaronsohn	1a0ba9ae48	NFS: statfs error-handling fix The nfs statfs function returns a success code on error, and fills the output buffer with invalid values. The attached patch makes it return a correct error code instead. Signed-off-by: Amnon Aaronsohn <amnonaar@gmail.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> (Modified patch to reinstate the dprintk())	2007-04-30 22:17:02 -07:00
Trond Myklebust	d585158b60	NFS: Fix nfs_set_page_dirty() Be more careful about testing page->mapping. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:02 -07:00
Jeff Mahoney	1173a729fc	reiserfs: suppress lockdep warning We're getting lockdep warnings due to a post-2.6.21-rc7 bugfix. The xattr_sem can never be taken in the manner described. Internal inodes are protected by I_PRIVATE. Add the appropriate annotation. Cc: <stable@kernel.org> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-30 16:40:40 -07:00
Steve French	4523cc3044	[CIFS] UID/GID override on CIFS mounts to Samba When CIFS Unix Extensions are negotiated we get the Unix uid and gid owners of the file from the server (on the Unix Query Path Info levels), but if the server's uids don't match the client uid's users were having to disable the Unix Extensions (which turned off features they still wanted). The changeset patch allows users to override uid and/or gid for file/directory owner with a default uid and/or gid specified at mount (as is often done when mounting from Linux cifs client to Windows server). This changeset also displays the uid and gid used by default in /proc/mounts (if applicable). Also cleans up code by adding some of the missing spaces after "if" keywords per-kernel style guidelines (as suggested by Randy Dunlap when he reviewed the patch). Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-04-30 20:13:06 +00:00
Linus Torvalds	cd9bb7e736	Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block * 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block: [PATCH] elevator: elv_list_lock does not need irq disabling [BLOCK] Don't pin lots of memory in mempools cfq-iosched: speedup cic rb lookup ll_rw_blk: add io_context private pointer cfq-iosched: get rid of cfqq hash cfq-iosched: tighten queue request overlap condition cfq-iosched: improve sync vs async workloads cfq-iosched: never allow an async queue idling cfq-iosched: get rid of ->dispatch_slice cfq-iosched: don't pass unused preemption variable around cfq-iosched: get rid of ->cur_rr and ->cfq_list cfq-iosched: slice offset should take ioprio into account [PATCH] cfq-iosched: style cleanups and comments cfq-iosched: sort IDLE queues into the rbtree cfq-iosched: sort RT queues into the rbtree [PATCH] cfq-iosched: speed up rbtree handling cfq-iosched: rework the whole round-robin list concept cfq-iosched: minor updates cfq-iosched: development update cfq-iosched: improve preemption for cooperating tasks	2007-04-30 08:12:39 -07:00
Jens Axboe	5972511b77	[BLOCK] Don't pin lots of memory in mempools Currently we scale the mempool sizes depending on memory installed in the machine, except for the bio pool itself which sits at a fixed 256 entry pre-allocation. There's really no point in "optimizing" this OOM path, we just need enough preallocated to make progress. A single unit is enough, lets scale it down to 2 just to be on the safe side. This patch saves ~150kb of pinned kernel memory on a 32-bit box. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:08:17 +02:00
Paul Mackerras	49e1900d4c	Merge branch 'linux-2.6' into for-2.6.22	2007-04-30 12:38:01 +10:00
Linus Torvalds	42fae7fb1c	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NET]: Fix networking compilation errors [AF_RXRPC/AFS]: Arch-specific fixes. [AFS]: Fix VLocation record update wakeup [NET]: Revert sk_buff walker cleanups.	2007-04-27 16:20:37 -07:00
Linus Torvalds	f00546363f	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: (46 commits) [MTD] [MAPS] drivers/mtd/maps/ck804xrom.c: convert pci_module_init() [MTD] [NAND] CM-x270 MTD driver [MTD] [NAND] Wrong calculation of page number in nand_block_bad() [MTD] [MAPS] fix plat-ram printk format [JFFS2] Fix compr_rubin.c build after include file elimination. [JFFS2] Handle inodes with only a single metadata node with non-zero isize [JFFS2] Tidy up licensing/copyright boilerplate. [MTD] [OneNAND] Exit loop only when column start with 0 [MTD] [OneNAND] Fix access the past of the real oobfree array [MTD] [OneNAND] Update Samsung OneNAND official URL [JFFS2] Better fix for all-zero node headers [JFFS2] Improve read_inode memory usage, v2. [JFFS2] Improve failure mode if inode checking leaves unchecked space. [JFFS2] Fix cross-endian build. [MTD] Finish conversion mtd_blkdevs to use the kthread API [JFFS2] Obsolete dirent nodes immediately on unlink, where possible. Use menuconfig objects: MTD [MTD] mtd_blkdevs: Convert to use the kthread API [MTD] Fix fwh_lock locking [JFFS2] Speed up mount for directly-mapped NOR flash ...	2007-04-27 15:34:57 -07:00
David Howells	b1bdb691c3	[AF_RXRPC/AFS]: Arch-specific fixes. Fixes for various arch compilation problems: () Missing module exports. () Variable name collision when rxkad and af_rxrpc both built in (rxrpc_debug). () Large constant representation problem (AFS_UUID_TO_UNIX_TIME). () Configuration dependencies. (*) printk() format warnings. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-27 15:28:45 -07:00
David Howells	47051a2152	[AFS]: Fix VLocation record update wakeup Fix the wakeup transitions after a VLocation record update completes one way or another. This builds on Dave Miller's partial fix. Also move wakeups outside the spinlocked sections. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-27 15:26:30 -07:00
Linus Torvalds	d868772fff	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (46 commits) dev_dbg: check dev_dbg() arguments drivers/base/attribute_container.c: use mutex instead of binary semaphore mod_sysfs_setup() doesn't return errno when kobject_add_dir() failure occurs s2ram: add arch irq disable/enable hooks define platform wakeup hook, use in pci_enable_wake() security: prevent permission checking of file removal via sysfs_remove_group() device_schedule_callback() needs a module reference s390: cio: Delay uevents for subchannels sysfs: bin.c printk fix Driver core: use mutex instead of semaphore in DMA pool handler driver core: bus_add_driver should return an error if no bus debugfs: Add debugfs_create_u64() the overdue removal of the mount/umount uevents kobject: Comment and warning fixes to kobject.c Driver core: warn when userspace writes to the uevent file in a non-supported way Driver core: make uevent-environment available in uevent-file kobject core: remove rwsem from struct subsystem qeth: Remove usage of subsys.rwsem PHY: remove rwsem use from phy core IEEE1394: remove rwsem use from ieee1394 core ...	2007-04-27 12:58:54 -07:00
David Woodhouse	d1da4e50e5	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/mtd/Kconfig Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-27 19:16:19 +01:00
James Morris	057f6c019f	security: prevent permission checking of file removal via sysfs_remove_group() Prevent permission checking from being performed when the kernel wants to unconditionally remove a sysfs group, by introducing an kernel-only variant of lookup_one_len(), lookup_one_len_kern(). Additionally, as sysfs_remove_group() does not check the return value of the lookup before using it, a BUG_ON has been added to pinpoint the cause of any problems potentially caused by this (and as a form of annotation). Signed-off-by: James Morris <jmorris@namei.org> Cc: Nagendra Singh Tomar <nagendra_tomar@adaptec.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-04-27 10:57:33 -07:00
Alan Stern	523ded71de	device_schedule_callback() needs a module reference This patch (as896b) fixes an oversight in the design of device_schedule_callback(). It is necessary to acquire a reference to the module owning the callback routine, to prevent the module from being unloaded before the callback can run. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Cc: Satyam Sharma <satyam.sharma@gmail.com> Cc: Neil Brown <neilb@suse.de> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-04-27 10:57:32 -07:00
Andrew Morton	45cd8d8e1e	sysfs: bin.c printk fix fs/sysfs/bin.c: In function 'read': fs/sysfs/bin.c:77: warning: format '%zd' expects type 'signed size_t', but argument 4 has type 'int' Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-04-27 10:57:32 -07:00
Michael Ellerman	8447891fe8	debugfs: Add debugfs_create_u64() I went to use this the other day, only to find it didn't exist. It's a straight copy of the debugfs u32 code, then s/u32/u64/. A quick test shows it seems to be working. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-04-27 10:57:31 -07:00
Adrian Bunk	3106d46f51	the overdue removal of the mount/umount uevents This patch contains the overdue removal of the mount/umount uevents. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-04-27 10:57:31 -07:00
Linus Torvalds	b928ed5618	Merge branch 'for-linus' of git://git.infradead.org/ubi-2.6 * 'for-linus' of git://git.infradead.org/ubi-2.6: UBI: remove unused variable UBI: add me to MAINTAINERS JFFS2: add UBI support UBI: Unsorted Block Images	2007-04-27 10:42:35 -07:00
Linus Torvalds	ea6db58f3e	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (27 commits) ocfs2: Cache extent records ocfs2: Remember rw lock level during direct io ocfs2: Fix up i_blocks calculation to know about holes ocfs2: Fix extent lookup to return true size of holes ocfs2: Read from an unwritten extent returns zeros ocfs2: make room for unwritten extents flag ocfs2: Use own splice write actor ocfs2: Use do_sync_mapping_range() in ocfs2_zero_tail_for_truncate() [PATCH] Turn do_sync_file_range() into do_sync_mapping_range() ocfs2: zero tail of sparse files on truncate ocfs2: Teach ocfs2_get_block() about holes ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write() ocfs2: teach ocfs2_file_aio_write() about sparse files ocfs2: Turn off shared writeable mmap for local files systems with holes. ocfs2: abstract out allocation locking ocfs2: teach extend/truncate about sparse files ocfs2: temporarily remove extent map caching ocfs2: sparse b-tree support ocfs2: small cleanup of ocfs2_request_delete() ocfs2: remove unused code ...	2007-04-27 10:29:56 -07:00
Artem Bityutskiy	0029da3bf4	JFFS2: add UBI support This patch make JFFS2 able to work with UBI volumes via the emulated MTD devices which are directly mapped to these volumes. Signed-off-by: Artem Bityutskiy <dedekind@infradead.org>	2007-04-27 14:24:08 +03:00
David S. Miller	39bf094930	[AFS]: Eliminate cmpxchg() usage in vlocation code. cmpxchg() is not available on every processor so can't be used in generic code. Replace with spinlock protection on the ->state changes, wakeups, and wait loops. Add what appears to be a missing wakeup on transition to AFS_VL_VALID state in afs_vlocation_updater(). Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 20:39:14 -07:00
David S. Miller	ba3e0e1acc	[AFS]: Fix u64 printing in debug logging. Need 'unsigned long long' casts to quiet warnings on 64-bit platforms when using %ll on a u64. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 16:06:22 -07:00
David Howells	260a980317	[AFS]: Add "directory write" support. Add support for the create, link, symlink, unlink, mkdir, rmdir and rename VFS operations to the in-kernel AFS filesystem. Also: (1) Fix dentry and inode revalidation. d_revalidate should only look at state of the dentry. Revalidation of the contents of an inode pointed to by a dentry is now separate. (2) Fix afs_lookup() to hash negative dentries as well as positive ones. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:59:35 -07:00
David Howells	c35eccb1f6	[AFS]: Implement the CB.InitCallBackState3 operation. Implement the CB.InitCallBackState3 operation for the fileserver to call. This reduces the amount of network traffic because if this op is aborted, the fileserver will then attempt an CB.InitCallBackState operation. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:58:49 -07:00
David Howells	b908fe6b2d	[AFS]: Add support for the CB.GetCapabilities operation. Add support for the CB.GetCapabilities operation with which the fileserver can ask the client for the following information: (1) The list of network interfaces it has available as IPv4 address + netmask plus the MTUs. (2) The client's UUID. (3) The extended capabilities of the client, for which the only current one is unified error mapping (abort code interpretation). To support this, the patch adds the following routines to AFS: (1) A function to iterate through all the network interfaces using RTNETLINK to extract IPv4 addresses and MTUs. (2) A function to iterate through all the network interfaces using RTNETLINK to pull out the MAC address of the lowest index interface to use in UUID construction. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:58:17 -07:00
David Howells	00d3b7a453	[AFS]: Add security support. Add security support to the AFS filesystem. Kerberos IV tickets are added as RxRPC keys are added to the session keyring with the klog program. open() and other VFS operations then find this ticket with request_key() and either use it immediately (eg: mkdir, unlink) or attach it to a file descriptor (open). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:57:07 -07:00
David Howells	436058a49e	[AFS]: Handle multiple mounts of an AFS superblock correctly. Handle multiple mounts of an AFS superblock correctly, checking to see whether the superblock is already initialised after calling sget() rather than just unconditionally stamping all over it. Also delete the "silent" parameter to afs_fill_super() as it's not used and can, in any case, be obtained from sb->s_flags. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:56:24 -07:00
David Howells	63b6be55e8	[AF_RXRPC]: Delete the old RxRPC code. Delete the old RxRPC code as it's now no longer used. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:55:48 -07:00
David Howells	08e0e7c82e	[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC. Make the in-kernel AFS filesystem use AF_RXRPC instead of the old RxRPC code. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:55:03 -07:00
David Howells	ec26815ad8	[AFS]: Clean up the AFS sources Clean up the AFS sources. Also remove references to AFS keys. RxRPC keys are used instead. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:49:28 -07:00
Mark Fasheh	8341897882	ocfs2: Cache extent records The extent map code was ripped out earlier because of an inability to deal with holes. This patch adds back a simpler caching scheme requiring far less code. Our old extent map caching was designed back when meta data block caching in Ocfs2 didn't work very well, resulting in many disk reads. These days our metadata caching is much better, resulting in no un-necessary disk reads. As a result, extent caching doesn't have to be as fancy, nor does it have to cache as many extents. Keeping the last 3 extents seen should be sufficient to give us a small performance boost on some streaming workloads. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:10:40 -07:00
Mark Fasheh	7cdfc3a1c3	ocfs2: Remember rw lock level during direct io Cluster locking might have been redone because a direct write won't complete, so this needs to be reflected in the iocb. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:07:45 -07:00
Mark Fasheh	8110b073a9	ocfs2: Fix up i_blocks calculation to know about holes Older file systems which didn't support holes did a dumb calculation of i_blocks based on i_size. This is no longer accurate, so fix things up to take actual allocation into account. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:07:40 -07:00
Mark Fasheh	4f902c3772	ocfs2: Fix extent lookup to return true size of holes Initially, we had wired things to return a size '1' of holes. Cook up a small amount of code to find the next extent and calculate the number of clusters between the virtual offset and the next allocated extent. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:45 -07:00
Mark Fasheh	49cb8d2d49	ocfs2: Read from an unwritten extent returns zeros Return an optional extent flags field from our lookup functions and wire up callers to treat unwritten regions as holes for the purpose of returning zeros to the user. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:41 -07:00
Mark Fasheh	e48edee2d8	ocfs2: make room for unwritten extents flag Due to the size of our group bitmaps, we'll never have a leaf node extent record with more than 16 bits worth of clusters. Split e_clusters up so that leaf nodes can get a flags field where we can mark unwritten extents. Interior nodes whose length references all the child nodes beneath it can't split their e_clusters field, so we use a union to preserve sizing there. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:37 -07:00
Mark Fasheh	6af67d8205	ocfs2: Use own splice write actor We need to fill holes during a splice write. Provide our own splice write actor which can call ocfs2_file_buffered_write() with a splice-specific callback. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:34 -07:00
Mark Fasheh	fa41045fcb	ocfs2: Use do_sync_mapping_range() in ocfs2_zero_tail_for_truncate() Do this instead of filemap_fdatawrite() - this way we sync only the range between i_size and the cluster boundary. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:30 -07:00
Mark Fasheh	5b04aa3a64	[PATCH] Turn do_sync_file_range() into do_sync_mapping_range() do_sync_file_range() accepts a file * from which it takes an address_space to sync. Abstract out the bulk of the function into do_sync_mapping_range() which takes the address_space directly. This way callers who want to sync an address_space directly can take advantage of the functionality provided. do_sync_file_range() is preserved as a small wrapper around do_sync_mapping_range(). Ocfs2 in particular would like to use this to initiate a sync of a specific inode range during truncate, where a file * may not be available. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-04-26 15:02:26 -07:00
Mark Fasheh	60b11392f1	ocfs2: zero tail of sparse files on truncate Since we don't zero on extend anymore, truncate needs to be fixed up to zero the part of a file between i_size and and end of it's cluster. Otherwise a subsequent extend could expose bad data. This introduced a new helper, which can be used in ocfs2_write(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:20 -07:00
Mark Fasheh	25baf2da14	ocfs2: Teach ocfs2_get_block() about holes ocfs2_get_block() didn't understand sparse files, fix that. Also remove some code that isn't really useful anymore. We can fix up ocfs2_direct_IO_get_blocks() at the same time. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:16 -07:00
Mark Fasheh	5069120b72	ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write() These are no longer used, and can't handle file systems with sparse file allocation. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:12 -07:00
Mark Fasheh	9517bac6cc	ocfs2: teach ocfs2_file_aio_write() about sparse files Unfortunately, ocfs2 can no longer make use of generic_file_aio_write_nlock() because allocating writes will require zeroing of pages adjacent to the I/O for cluster sizes greater than page size. Implement a custom file write here, which can order page locks for zeroing. This also has the advantage that cluster locks can easily be ordered outside of the page locks. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:08 -07:00
Mark Fasheh	89488984ac	ocfs2: Turn off shared writeable mmap for local files systems with holes. This will be turned back on once we can do allocation in ->page_mkwrite(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:01 -07:00
Mark Fasheh	abf8b15694	ocfs2: abstract out allocation locking Right now, file allocation for ocfs2 is done within ocfs2_extend_file(), which is either called from ->setattr() (for an i_size change), or at the top of ocfs2_file_aio_write(). Inodes on file systems with sparse file support will want to do their allocation during the actual write call. In either case the cluster locking decisions are the same. We abstract out that code into a new function, ocfs2_lock_allocators() which will be used by a later patch to enable writing to sparse files. This also provides a nice cleanup of ocfs2_extend_allocation(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:01:58 -07:00
Mark Fasheh	3a0782d09c	ocfs2: teach extend/truncate about sparse files For ocfs2_truncate_file(), we eliminate the "simple" truncate case which no longer exists since i_size is not tied to i_clusters. In ocfs2_extend_file(), we skip the allocation / page zeroing code for file systems which understand sparse files. The core truncate code is changed to do a bottom up tree traversal. This gets abstracted out into it's own function. To make things more readable, most of the special case handling for in-inode extents from ocfs2_do_truncate() is also removed. Though write support for sparse files comes in a later patch, we at least update ocfs2_prepare_inode_for_write() to skip allocation for sparse files. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:01:56 -07:00
Mark Fasheh	363041a5f7	ocfs2: temporarily remove extent map caching The code in extent_map.c is not prepared to deal with a subtree being rotated between lookups. This can happen when filling holes in sparse files. Instead of a lengthy patch to update the code (which would likely lose the benefit of caching subtree roots), we remove most of the algorithms and implement a simple path based lookup. A less ambitious extent caching scheme will be added in a later patch. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:01:31 -07:00
Mark Fasheh	dcd0538ff4	ocfs2: sparse b-tree support Introduce tree rotations into the b-tree code. This will allow ocfs2 to support sparse files. Much of the added code is designed to be generic (in the ocfs2 sense) so that it can later be re-used to implement large extended attributes. This patch only adds the rotation code and does minimal updates to callers of the extent api. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 14:44:03 -07:00
Mark Fasheh	6f16bf655c	ocfs2: small cleanup of ocfs2_request_delete() There are two checks in there (one for inode newness, one for other mounted nodes) which are unnecessary, so remove them. The DLM will allow the trylock in either case without any messaging overhead. Removing these makes ocfs2_request_delete() a one liner function, so just move the trylock out one level into ocfs2_query_inode_wipe(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 14:40:55 -07:00
Tiger Yang	68e2b740c4	ocfs2: remove unused code Remove node messaging code that becomes unused with the delete inode vote removal. [Removed even more cruft which I spotted during review --Mark] Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 14:40:16 -07:00
Tiger Yang	500086300e	ocfs2: Remove delete inode vote Ocfs2 currently does cluster-wide node messaging to check the open state of an inode during delete. This patch removes that mechanism in favor of an inode cluster lock which is taken at shared read when an inode is first read and dropped in clear_inode(). This allows a deleting node to test the liveness of an inode by attempting to take an exclusive lock. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 14:39:48 -07:00
Mark Fasheh	a9f5f70739	ocfs2: filter more error prints We don't want to print anything at all in ocfs2_lookup() when getting an error from ocfs2_iget() - it could be something as innocuous as a signal being detected in the dlm. ocfs2_permission() should filter on -ENOENT which ocfs2_meta_lock() can return if the inode was deleted on another node. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 13:39:08 -07:00
Sunil Mushran	bebe6f120b	ocfs2: Replace panic() with emergency_restart() when fencing We have noticed panic() hanging leading us to a situation in which the node, while otherwise dead, is still disk heartbeating. This leads to a hung cluster as the other nodes are waiting for this node to stop disk heartbeating. This situation is only resolved by power resetting the box. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 13:39:02 -07:00
Sunil Mushran	5d262cc7dd	ocfs2: Silence compiler warnings Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 13:38:55 -07:00
Mark Fasheh	be9e986b82	ocfs2: Local mounts should skip inode updates We don't want the extent map and uptodate cache destruction in ocfs2_meta_lock_update() on a local mount, so skip that. This fixes several bugs with uptodate being cleared on buffers and extent maps being corrupted. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 13:35:21 -07:00
Sunil Mushran	0d01af6e5d	ocfs2_dlm: Call cond_resched_lock() once per hash bucket scan In dlm_migrate_all_locks(), we currently call cond_resched_lock() after processing each lockres in a hash bucket. Move it outside the loop so as to call it only after the entire hash bucket has been processed. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 13:33:11 -07:00
Srinivas Eeda	756a1501dd	ocfs2_dlm: fix race in dlm_remaster_locks There is a possibility that dlm_remaster_locks could overwride node->state with DLM_RECO_NODE_DATA_REQUESTED after dlm_reco_data_done_handler sets the node->state to DLM_RECO_NODE_DATA_DONE. This could lead to recovery getting stuck and requires a cluster reboot. Synchronize with dlm_reco_state_lock spinlock. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 13:33:02 -07:00
Steve French	984acfe1cf	[CIFS] prefixpath mounts to servers supporting posix paths used wrong slash Acked-by: Alexander Bokovoy <abokovoy@ru.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-04-26 16:42:50 +00:00
Steve French	deb0420c6f	[CIFS] Update cifs version to 1.49 Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-04-26 14:35:54 +00:00
David Woodhouse	ef2e58ea6b	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6	2007-04-26 09:31:28 +01:00
Andrew Morton	f6449f4ece	[JFFS2] Fix compr_rubin.c build after include file elimination. It seems to be silly season lately. (Oops, test builds are more useful if the file in question is actually configured on. dwmw2). Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-26 07:27:04 +01:00
Patrick McHardy	af65bdfce9	[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it Switch cb_lock to mutex and allow netlink kernel users to override it with a subsystem specific mutex for consistent locking in dump callbacks. All netlink_dump_start users have been audited not to rely on any side-effects of the previously used spinlock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:03 -07:00
Arnaldo Carvalho de Melo	b529ccf279	[NETLINK]: Introduce nlmsg_hdr() helper For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the number of direct accesses to skb->data and for consistency with all the other cast skb member helpers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:34 -07:00
Eric Dumazet	ae40eb1ef3	[NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution Now network timestamps use ktime_t infrastructure, we can add a new ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'. User programs can thus access to nanosecond resolution. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:04 -07:00
David Woodhouse	61c4b23770	[JFFS2] Handle inodes with only a single metadata node with non-zero isize This should never happen unless there's corruption on the medium and the actual data nodes go missing. But the failure mode (an oops when we assume the fragtree isn't empty and go looking for its last node) isn't useful. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-25 17:04:23 +01:00
David Woodhouse	c00c310eac	[JFFS2] Tidy up licensing/copyright boilerplate. In particular, remove the bit in the LICENCE file about contacting Red Hat for alternative arrangements. Their errant IS department broke that arrangement a long time ago -- the policy of collecting copyright assignments from contributors came to an end when the plug was pulled on the servers hosting the project, without notice or reason. We do still dual-license it for use with eCos, with the GPL+exception licence approved by the FSF as being GPL-compatible. It's just that nobody has the right to license it differently. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-25 14:16:47 +01:00
vignesh	eaa33a9ac0	[CIFS] Replace kmalloc/memset combination with kzalloc Signed-off-by: Vignesh Babu <vignesh.babu@wipro.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-04-25 12:13:48 +00:00
Steve French	5858ae44e2	[CIFS] Add IPv6 support IPv6 support was started a few years ago in the cifs client, but lacked a kernel helper function for parsing the ascii form of the ipv6 address. Now that that is added (and now IPv6 is the default that some OS use now) it was fairly easy to finish the cifs ipv6 support. This requires that CIFS_EXPERIMENTAL be enabled and (at least until the mount.cifs module is modified to use a new ipv6 friendly call instead of gethostbyname) and the ipv6 address be passed on the mount as "ip=" mount option. Thanks Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-04-25 11:59:10 +00:00
Steve French	cbac3cba66	[CIFS] New CIFS POSIX mkdir performance improvement (part 2) Fix incorrect parsing of return data Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-04-25 11:46:06 +00:00
Joakim Tjernlund	0dec4c8bc6	[JFFS2] Better fix for all-zero node headers No need to check for all-zero header since the header cannot be zero due to other checks. Replace the all-zero header check in readinode.c with a check for the magic word. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-25 04:13:06 +01:00
David Woodhouse	df8e96f391	[JFFS2] Improve read_inode memory usage, v2. We originally used to read every node and allocate a jffs2_tmp_dnode_info structure for each, before processing them in (reverse) version order and discarding the ones which are obsoleted by later nodes. With huge logfiles, this behaviour caused memory problems. For example, a file involved in OLPC trac #1292 has 1822391 nodes, and would cause the XO machine to run out of memory during the first stage of read_inode(). Instead of just inserting nodes into a tree in version order as we find them, we now put them into a tree in order of their offset within the file, which allows us to immediately discard nodes which are completely obsoleted. We don't use a full tree with 'fragments' pointing to the real data structure, as we do in the normal fragtree. We sort only on the start address, and add an 'overlapped' flag to the tmp_dnode_info to indicate that the node in question is (partially) overlapped by another. When the scan is complete, we start at the end of the file, adding each node to a real fragtree as before. Where the node is non-overlapped, we just add it (it doesn't matter that it's not the latest version; there is no overlap). When the node at the end of the tree _is_ overlapped, we sort it and all its overlapping nodes into version order and then add them to the fragtree in that order. This 'early discard' reduces the peak allocation of tmp_dnode_info structures from 1.8M to a mere 62872 (3.5%) in the degenerate case referenced above. This version of the patch also correctly rememembers the highest node version# seen for an inode when it's scanned. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-25 03:23:42 +01:00
Jeff Mahoney	9b7f375505	reiserfs: fix xattr root locking/refcount bug The listxattr() and getxattr() operations are only protected by a read lock. As a result, if either of these operations run in parallel, a race condition exists where the xattr_root will end up being cached twice, which results in the leaking of a reference and a BUG() on umount. This patch refactors get_xa_root(), __get_xa_root(), and create_xa_root(), into one get_xa_root() function that takes the appropriate locking around the entire critical section. Reported, diagnosed and tested by Andrea Righi <a.righi@cineca.it> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Andrea Righi <a.righi@cineca.it> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Edward Shishkin <edward@namesys.com> Cc: Alex Zarochentsev <zam@namesys.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-24 08:23:09 -07:00
Latchesar Ionkov	c959df9f01	v9fs: don't use primary fid when removing file v9fs_insert uses v9fs_fid_lookup (which also locks the fid) to get the primary fid associated with the dentry and destroys the v9fs_fid struct after removing the file. If another process called v9fs_fid_lookup on the same dentry, it may wait undefinitely for the fid's lock (as the struct is freed). This patch changes v9fs_remove to use a cloned fid, so the primary fid is not locked and freed. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Cc: Eric Van Hensbergen <ericvh@hera.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-24 08:23:08 -07:00
Steve French	2dd29d3133	[CIFS] New CIFS POSIX mkdir performance improvement Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-04-23 22:07:35 +00:00
David Woodhouse	44b998e1eb	[JFFS2] Improve failure mode if inode checking leaves unchecked space. We should never find the unchecked size is non-zero after we've finished checking all inodes. If it happens, used to BUG(), leaving the alloc_sem held and deadlocking. Instead, just return -ENOSPC after complaining. The GC thread will die, but read-only operation should be able to continue and the file system should be unmountable. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-23 12:11:46 +01:00
David Woodhouse	566865a2a4	[JFFS2] Fix cross-endian build. When compiling a LE-capable JFFS2 on PowerPC, wbuf.c fails to compile: fs/jffs2/wbuf.c:973: error: braced-group within expression allowed only inside a function fs/jffs2/wbuf.c:973: error: initializer element is not constant fs/jffs2/wbuf.c:973: error: (near initialization for ‘oob_cleanmarker.magic’) fs/jffs2/wbuf.c:974: error: braced-group within expression allowed only inside a function fs/jffs2/wbuf.c:974: error: initializer element is not constant fs/jffs2/wbuf.c:974: error: (near initialization for ‘oob_cleanmarker.nodetype’) fs/jffs2/wbuf.c:975: error: braced-group within expression allowed only inside a function fs/jffs2/wbuf.c:976: error: initializer element is not constant fs/jffs2/wbuf.c:976: error: (near initialization for ‘oob_cleanmarker.totlen’) Provide constant_cpu_to_je{16,32} functions, and use them for initialising the offending structure. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-23 12:07:17 +01:00
Trond Myklebust	2b82f190c8	NFS: Fix race in nfs_set_page_dirty Protect nfs_set_page_dirty() against races with nfs_inode_add_request. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-20 22:56:30 -07:00
Trond Myklebust	612c9384fd	NFS: Fix the 'desynchronized value of nfs_i.ncommit' error Redirtying a request that is already marked for commit will screw up the accounting for NR_UNSTABLE_NFS as well as nfs_i.ncommit. Ensure that all requests on the commit queue are labelled with the PG_NEED_COMMIT flag, and avoid moving them onto the dirty list inside nfs_page_mark_flush(). Also inline nfs_mark_request_dirty() into nfs_page_mark_flush() for atomicity reasons. Avoid dropping the spinlock until we're done marking the request in the radix tree and have added it to the ->dirty list. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-20 22:56:29 -07:00
Trond Myklebust	6d677e3504	NFS: Don't clear PG_writeback until after we've processed unstable writes Ensure that we don't release the PG_writeback lock until after the page has either been redirtied, or queued on the nfs_inode 'commit' list. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-20 22:56:29 -07:00
Trond Myklebust	8e821cad12	NFS: clean up the unstable write code Get rid of the inlined #ifdefs. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-20 22:56:29 -07:00
Joakim Tjernlund	a491486a20	[JFFS2] Obsolete dirent nodes immediately on unlink, where possible. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-20 23:09:28 -04:00
Evgeniy Dushistov	07a0cfec30	ufs proper handling of zero link case This patch should fix or partly fix this bug: http://bugzilla.kernel.org/show_bug.cgi?id=8276 The problem is: - if we see "zero link case" during reading inode operation, we call ufs_error(which remount fs readonly), but not "mark" inode as bad (1) - in readonly case we do not fill some data structures, which are used in read and write case (2) - VFS call ufs_delete_inode if link count is zero (3) so (1)->(3)->(2) cause oops, this patch should fix such scenario Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Cc: Jim Paris <jim@jtan.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-17 16:36:27 -07:00
Alan Cox	c4bbafda70	exec.c: fix coredump to pipe problem and obscure "security hole" The patch checks for "\|" in the pattern not the output and doesn't nail a pid on to a piped name (as it is a program name not a file) Also fixes a very very obscure security corner case. If you happen to have decided on a core pattern that starts with the program name then the user can run a program called "\|myevilhack" as it stands. I doubt anyone does this. Signed-off-by: Alan Cox <alan@redhat.com> Confirmed-by: Christopher S. Aker <caker@theshore.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-17 16:36:26 -07:00
Joakim Tjernlund	c2aecda79c	[JFFS2] Speed up mount for directly-mapped NOR flash Remove excessive scanning of empty flash after a clean marker for users of the point/unpoint method. cfi_cmdset_0001 uses point/unpoint by default iff flash mapping is linear. The speedup is several orders of magnitude if FS is less than half full. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-17 14:07:34 -04:00
Artem Bityutskiy	10731f8300	[JFFS2] fix buffer sise calculations in jffs2_get_inode_nodes() In read inode we have an optimization which prevents one min. I/O unit (e.g. NAND page) to be read more then once. Namely, at the beginning we do not know which node type we read, so we read so we assume we read the directory entry, because it has the smallest node header. When we read it, we read up to the next min. I/O unit, just because if later we'll need to read more, we already have this data. If it turns out to be that the node is not directory entry, and we need more data, and we did not read it because it sits in the next min. I/O unit, we read the whole next (or several next) min. I/O unit(s). And if it happens to be that we read a data node, and we've read part of its data, we calculate partial CRC. So if later we need to check data CRC, we'll only read the rest of the data from further min. I/O units and continue CRC checking. This code was a bit messy and buggy. The bug was that it assumed relatively large min. I/O unit, so that the largest node header could overlap only one min. I/O unit boundary. This parch clean-ups the code a bit and fixes this bug. The patch was not tested on flash with small min. I/O unit, like NOR-ECC, nut it was tested on NAND with 512 bytes NAND page, so it at least does not break NAND. It was also tested with mtdram so it should not break NOR. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-17 14:05:48 -04:00
Adrian Hunter	7f762ab24c	[JFFS2] Disable summary after wbuf recovery After a write error, any data in the write buffer must be relocated. This is handled by the jffs2_wbuf_recover function. This function does not fix up the erase block summary information that is collected for writing at the end of the block, which results in an incorrect summary (or BUG if the summary was found to be empty). As the summary is not essential (it is an optimisation), it may be disabled for the current erase block when this situation arises. This patch does that. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-17 13:56:44 -04:00
Adrian Hunter	99c2594f0e	[JFFS2] Prevent list corruption when handling write errors If a write error occurs, the affected block is placed on the bad_used_list. In the case that the write error occured when writing summary data the block was also being placed on the dirty_list, which caused list corruption and ultimately a soft lockup in jffs2_mark_node_obsolete. This fixes that. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-17 13:56:23 -04:00
Artem Bityutskiy	b0afbbec49	[JFFS2] fix deadlock on error path When the MTD driver returns write failure, the following deadlock occurs: We are in __jffs2_flush_wbuf(), we hold &c->wbuf_sem. Write failure. jffs2_wbuf_recover()->jffs2_reserve_space_gc()->jffs2_do_reserve_space() ->jffs2_erase_pending_blocks()->jffs2_flash_read() and it tries to lock &c->wbuf_sem again. Deadlock. Reported-by: Adrian Hunter <ext-adrian.hunter@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-17 13:53:51 -04:00
Thomas Gleixner	53043002ef	[JFFS2] check node crc before doing anything else Check the node CRC on scan before doing anything else with the node. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-17 18:26:18 +01:00
J. Bruce Fields	c2fa1b8a6c	locks: create posix-to-flock helper functions Factor out a bit of messy code by creating posix-to-flock counterparts to the existing flock-to-posix helper functions. Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-04-16 13:40:37 -04:00
J. Bruce Fields	226a998dbf	locks: trivial removal of unnecessary parentheses Remove some unnecessary parentheses. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-04-16 13:40:37 -04:00
Trond Myklebust	eb4cac10d9	NFS: Fix a list corruption problem We must remove the request from whatever list it is currently on before we can add it to the dirty list. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-15 16:48:11 -07:00
Trond Myklebust	5a6d41b32a	NFS: Ensure PG_writeback is cleared when writeback fails If the writebacks are cancelled via nfs_cancel_dirty_list, or due to the memory allocation failing in nfs_flush_one/nfs_flush_multi, then we must ensure that the PG_writeback flag is cleared. Also ensure that we actually own the PG_writeback flag whenever we schedule a new writeback by making nfs_set_page_writeback() return the value of test_set_page_writeback(). The PG_writeback page flag ends up replacing the functionality of the PG_FLUSHING nfs_page flag, so we rip that out too. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-14 21:46:48 -07:00

1 2 3 4 5 ...

5363 commits