redonkable/alistair23-linux

Author	SHA1	Message	Date
Huajun Li	d08ab08d14	f2fs: fix a typo in f2fs documentation In f2fs_fs.h, one f2fs inode contains 923 data block pointers, while f2fs documentation says it is 929. Fix this inconsistence. Signed-off-by: Huajun Li <huajun.li.lee@gmail.com>	2012-12-11 13:43:44 +09:00
Wei Yongjun	705f814e34	f2fs: remove unused variable The variables node_page and page_offset are initialized but never used otherwise, so remove those unused variables. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>	2012-12-11 13:43:44 +09:00
Namjae Jeon	61412b64b9	f2fs: move error condition for mkdir at proper place In function f2fs_mkdir, err is being initialized without even checking if there was any error in new inode creation. So, instead check the inode error and make use of error/return condition. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>	2012-12-11 13:43:44 +09:00
Namjae Jeon	1042d60f91	f2fs: remove unneeded initialization No need to initialize "struct f2fs_gc_kthread *gc_th = NULL", as gc_th = NULL, will be taken care by the return values of kmalloc(). And fix codes in other places. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>	2012-12-11 13:43:44 +09:00
Namjae Jeon	1fa95b0b67	f2fs: check read only condition before beginning write out If the filesystem is mounted as read-only then return from that point itself instead of first doing a writeout/wait and then checking for read-only condition. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>	2012-12-11 13:43:43 +09:00
Namjae Jeon	154a086529	f2fs: remove unneeded memset from init_once Since, __GFP_ZERO is used while f2fs inode allocation, so we do not need memset for f2fs_inode_info, as this is already zeroed out. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>	2012-12-11 13:43:43 +09:00
Namjae Jeon	72ce6094c0	f2fs: show error in case of invalid mount arguments print the invalid argument/value from parse_options in case of mount failure. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>	2012-12-11 13:43:43 +09:00
Namjae Jeon	be4124f872	f2fs: fix the compiler warning for uninitialized use of variable When CONFIG_CC_OPTIMIZE_FOR_SIZE is enabled in the kernel, -Os optimisation flag is passed to gcc for compilation, and somehow while trying to optimize the code, compiler is might not able to see the initialisation of variable ne struct variable inside the get_node_info() function and results into following warning: fs/f2fs/node.c: In function 'get_node_info': fs/f2fs/node.c:175:3: warning: 'ne.block_addr' may be used uninitialized in this function [-Wuninitialized] fs/f2fs/node.c:265:24: note: 'ne.block_addr' was declared here fs/f2fs/node.c:176:3: warning: 'ne.ino' may be used uninitialized in this function [-Wuninitialized] fs/f2fs/node.c:265:24: note: 'ne.ino' was declared here fs/f2fs/node.c:177:3: warning: 'ne.version' may be used uninitialized in this function [-Wuninitialized] fs/f2fs/node.c:265:24: note: 'ne.version' was declared here Hence, lets initialise the ne struct variable to zero, which will remove this warning and also doing this does not seems to making any impact on the code behavior. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>	2012-12-11 13:43:43 +09:00
Jaegeuk Kim	573ea5fcf0	f2fs: resolve build failures There exist two build failures reported by Randy Dunlap as follows. (on i386) a. (config-r8857) ERROR: "f2fs_xattr_advise_handler" [fs/f2fs/f2fs.ko] undefined! Key configs in (config-r8857) are as follows. CONFIG_F2FS_FS=m # CONFIG_F2FS_STAT_FS is not set CONFIG_F2FS_FS_XATTR=y # CONFIG_F2FS_FS_POSIX_ACL is not set The error was occurred due to the function location that we made a mistake. Recently we added a new functionality for users to indicate cold files explicitly through xattr operations (i.e., f2fs_xattr_advise_handler). This handler should have been added in xattr.c instead of acl.c in order to avoid an undefined operation like in this case where XATTR is set and ACL is not set. b. (config-r8855) fs/f2fs/file.c: In function 'f2fs_vm_page_mkwrite': fs/f2fs/file.c:97:2: error: implicit declaration of function 'block_page_mkwrite_return' Key config in (config-r8855) is CONFIG_BLOCK. Obviously, f2fs works on top of the block device so that we should consider carefully a sort of config dependencies. The reason why this error was occurred was that f2fs_vm_page_mkwrite() calls block_page_mkwrite_return() which is enalbed only if CONFIG_BLOCK is set. Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net>	2012-12-11 13:43:43 +09:00
Jaegeuk Kim	0a8165d7c2	f2fs: adjust kernel coding style As pointed out by Randy Dunlap, this patch removes all usage of "/*" for comment blocks. Instead, just use "/". Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:42 +09:00
Jaegeuk Kim	25ca923b2a	f2fs: fix endian conversion bugs reported by sparse This patch should resolve the bugs reported by the sparse tool. Initial reports were written by "kbuild test robot" managed by fengguang.wu. In my local machines, I've tested also by running: > make C=2 CF="-D__CHECK_ENDIAN__" Accordingly, I've found lots of warnings and bugs related to the endian conversion. And I've fixed all at this moment. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:42 +09:00
Sachin Kamat	cf0e3a64ca	f2fs: remove unneeded version.h header file from f2fs.h Including <linux/version.h> is not necessary. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>	2012-12-11 13:43:42 +09:00
Jaegeuk Kim	5bb446a289	f2fs: update the f2fs document I moved the f2fs-tools.git into kernel.org. And I added a new mailing list, linux-f2fs-devel@lists.sourceforge.net. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:42 +09:00
Jaegeuk Kim	a14d53937c	f2fs: update Kconfig and Makefile This adds Makefile and Kconfig for f2fs, and updates Makefile and Kconfig files in the fs directory. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:42 +09:00
Greg Kroah-Hartman	902829aa0b	f2fs: move proc files to debugfs This moves all of the f2fs debugging files into debugfs. The files are located in /sys/kernel/debug/f2fs/ Note, I think we are generating all of the same information in each of the files for every unique f2fs filesystem in the machine. This copies the functionality that was present in the proc files, but this should be fixed up in the future. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [jaegeuk.kim@samsung.com: merged 3 debugfs entries into a status entry] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:42 +09:00
Jaegeuk Kim	d624c96fb3	f2fs: add recovery routines for roll-forward This adds roll-forward routines to recover fsynced data. - F2FS uses basically roll-back model with checkpointing. - In order to implement fsync(), there are two approaches as follows. 1. A roll-back model with checkpointing at every fsync() : This is a naive method, but suffers from very low performance. 2. A roll-forward model : F2FS adopts this model where all the fsynced data should be recovered, which were written after checkpointing was done. In order to figure out the data, F2FS keeps a "fsync" mark in direct node blocks. In addition, F2FS remains the location of next node block in each direct node block for reconstructing the chain of node blocks during the recovery. - In order to enhance the performance, F2FS keeps a "dentry" mark also in direct node blocks. If this is set during the recovery, F2FS replays adding a dentry. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:42 +09:00
Jaegeuk Kim	7bc0900347	f2fs: add garbage collection functions This adds on-demand and background cleaning functions. - The basic background cleaning policy is trying to do cleaning jobs as much as possible whenever the system is idle. Once the background cleaning is done, the cleaner sleeps an amount of time not to interfere with VFS calls. The time is dynamically adjusted according to the status of whole segments, which is decreased when the following conditions are satisfied. . GC is not conducted currently, and . IO subsystem is idle by checking the number of requets in bdev's request list, and . There are enough dirty segments. Otherwise, the time is increased incrementally until to the maximum time. Note that, min and max times are 10 secs and 30 secs by default. - F2FS adopts a default victim selection policy where background cleaning uses a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm. - The method of moving data during the cleaning is slightly different between background and on-demand cleaning schemes. In the case of background cleaning, F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should move the data right away. - In order to identify valid blocks in a victim segment, F2FS scans the bitmap of the segment managed as an SIT entry. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:41 +09:00
Jaegeuk Kim	af48b85b8c	f2fs: add xattr and acl functionalities This implements xattr and acl functionalities. - F2FS uses a node page to contain use extended attributes. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:41 +09:00
Jaegeuk Kim	6b4ea0160a	f2fs: add core directory operations this adds core functions to find, add, delete, and link dentries. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:41 +09:00
Jaegeuk Kim	57397d86c6	f2fs: add inode operations for special inodes This adds inode operations for directory, symlink, and special inodes. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:41 +09:00
Jaegeuk Kim	19f99cee20	f2fs: add core inode operations This adds core functions to get, read, write, and evict an inode. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:41 +09:00
Jaegeuk Kim	eb47b8009d	f2fs: add address space operations for data This adds address space operations for data. - F2FS supports readpages(), writepages(), and direct_IO(). - Because of out-of-place writes, f2fs_direct_IO() does not write data in place. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:41 +09:00
Jaegeuk Kim	fbfa2cc58d	f2fs: add file operations This adds memory operations and file/file_inode operations. - F2FS supports fallocate(), mmap(), fsync(), and basic ioctl(). Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:41 +09:00
Jaegeuk Kim	351df4b201	f2fs: add segment operations This adds specific functions not only to manage dirty/free segments, SIT pages, a cache for SIT entries, and summary entries, but also to allocate free blocks and write three types of pages: data, node, and meta. - F2FS maintains three types of bitmaps in memory, which indicate free, prefree, and dirty segments respectively. - The key information of an SIT entry consists of a segment number, the number of valid blocks in the segment, a bitmap to identify there-in valid or invalid blocks. - An SIT page is composed of a certain range of SIT entries, which is maintained by the address space of meta_inode. - To cache SIT entries, a simple array is used. The index for the array is the segment number. - A summary entry for data contains the parent node information. A summary entry for node contains its node offset from the inode. - F2FS manages information about six active logs and those summary entries in memory. Whenever one of them is changed, its summary entries are flushed to its SIT page maintained by the address space of meta_inode. - This patch adds a default block allocation function which supports heap-based allocation policy. - This patch adds core functions to write data, node, and meta pages. Since LFS basically produces a series of sequential writes, F2FS merges sequential bios with a single one as much as possible to reduce the IO scheduling overhead. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:40 +09:00
Jaegeuk Kim	e05df3b115	f2fs: add node operations This adds specific functions to manage NAT pages, a cache for NAT entries, free nids, direct/indirect node blocks for indexing data, and address space for node pages. - The key information of an NAT entry consists of a node id and a block address. - An NAT page is composed of block addresses covered by a certain range of NAT entries, which is maintained by the address space of meta_inode. - A radix tree structure is used to cache NAT entries. The index for the tree is a node id. - When there is no free nid, F2FS should scan NAT entries to find new one. In order to avoid scanning frequently, F2FS manages a list containing a number of free nids in memory. Only when free nids in the list are exhausted, scanning process, build_free_nids(), is triggered. - F2FS has direct and indirect node blocks for indexing data. This patch adds fuctions related to the node block management such as getting, allocating, and truncating node blocks to index data. - In order to cache node blocks in memory, F2FS has a node_inode with an address space for node pages. This patch also adds the address space operations for node_inode. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:40 +09:00
Jaegeuk Kim	127e670abf	f2fs: add checkpoint operations This adds functions required by the checkpoint operations. Basically, f2fs adopts a roll-back model with checkpoint blocks written in the CP area. The checkpoint procedure includes as follows. - write_checkpoint() 1. block_operations() freezes VFS calls. 2. submit cached bios. 3. flush_nat_entries() writes NAT pages updated by dirty NAT entries. 4. flush_sit_entries() writes SIT pages updated by dirty SIT entries. 5. do_checkpoint() writes, - checkpoint block (#0) - orphan inode blocks - summary blocks made by active logs - checkpoint block (copy of #0) 6. unblock_opeations() In order to provide an address space for meta pages, f2fs_sb_info has a special inode, namely meta_inode. This patch also adds the address space operations for meta_inode. Signed-off-by: Chul Lee <chur.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:40 +09:00
Jaegeuk Kim	aff063e266	f2fs: add super block operations This adds the implementation of superblock operations for f2fs, which includes - init_f2fs_fs/exit_f2fs_fs - f2fs_mount - super_operations of f2fs Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:40 +09:00
Jaegeuk Kim	39a53e0ce0	f2fs: add superblock and major in-memory structure This adds the following major in-memory structures in f2fs. - f2fs_sb_info: contains f2fs-specific information, two special inode pointers for node and meta address spaces, and orphan inode management. - f2fs_inode_info: contains vfs_inode and other fs-specific information. - f2fs_nm_info: contains node manager information such as NAT entry cache, free nid list, and NAT page management. - f2fs_node_info: represents a node as node id, inode number, block address, and its version. - f2fs_sm_info: contains segment manager information such as SIT entry cache, free segment map, current active logs, dirty segment management, and segment utilization. The specific structures are sit_info, free_segmap_info, dirty_seglist_info, curseg_info. In addition, add F2FS_SUPER_MAGIC in magic.h. Signed-off-by: Chul Lee <chur.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:40 +09:00
Jaegeuk Kim	dd31866b0d	f2fs: add on-disk layout This adds a header file describing the on-disk layout of f2fs. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Chul Lee <chur.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:40 +09:00
Jaegeuk Kim	98e4da8ca3	f2fs: add document This adds a document describing the mount options, proc entries, usage, and design of Flash-Friendly File System, namely F2FS. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:39 +09:00
Linus Torvalds	29594404d7	Linux 3.7	2012-12-10 19:30:57 -08:00
Florian Fainelli	55220bb3e5	Input: matrix-keymap - provide proper module license The matrix-keymap module is currently lacking a proper module license, add one so we don't have this module tainting the entire kernel. This issue has been present since commit `1932811f42` ("Input: matrix-keymap - uninline and prepare for device tree support") Signed-off-by: Florian Fainelli <florian@openwrt.org> CC: stable@vger.kernel.org # v3.5+ Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-10 16:10:05 -08:00
Linus Torvalds	2c68bc72dc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Netlink socket dumping had several missing verifications and checks. In particular, address comparisons in the request byte code interpreter could access past the end of the address in the inet_request_sock. Also, address family and address prefix lengths were not validated properly at all. This means arbitrary applications can read past the end of certain kernel data structures. Fixes from Neal Cardwell. 2) ip_check_defrag() operates in contexts where we're in the process of, or about to, input the packet into the real protocols (specifically macvlan and AF_PACKET snooping). Unfortunately, it does a pskb_may_pull() which can modify the backing packet data which is not legal if the SKB is shared. It very much can be shared in this context. Deal with the possibility that the SKB is segmented by using skb_copy_bits(). Fix from Johannes Berg based upon a report by Eric Leblond. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: ipv4: ip_check_defrag must not modify skb before unsharing inet_diag: validate port comparison byte code to prevent unsafe reads inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run() inet_diag: validate byte code to prevent oops in inet_diag_bc_run() inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state	2012-12-10 16:07:11 -08:00
Linus Torvalds	caf491916b	Revert "revert "Revert "mm: remove __GFP_NO_KSWAPD""" and associated damage This reverts commits `a50915394f` and `d7c3b937bd`. This is a revert of a revert of a revert. In addition, it reverts the even older i915 change to stop using the __GFP_NO_KSWAPD flag due to the original commits in linux-next. It turns out that the original patch really was bogus, and that the original revert was the correct thing to do after all. We thought we had fixed the problem, and then reverted the revert, but the problem really is fundamental: waking up kswapd simply isn't the right thing to do, and direct reclaim sometimes simply _is_ the right thing to do. When certain allocations fail, we simply should try some direct reclaim, and if that fails, fail the allocation. That's the right thing to do for THP allocations, which can easily fail, and the GPU allocations want to do that too. So starting kswapd is sometimes simply wrong, and removing the flag that said "don't start kswapd" was a mistake. Let's hope we never revisit this mistake again - and certainly not this many times ;) Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-10 11:03:05 -08:00
Johannes Berg	1bf3751ec9	ipv4: ip_check_defrag must not modify skb before unsharing ip_check_defrag() might be called from af_packet within the RX path where shared SKBs are used, so it must not modify the input SKB before it has unshared it for defragmentation. Use skb_copy_bits() to get the IP header and only pull in everything later. The same is true for the other caller in macvlan as it is called from dev->rx_handler which can also get a shared SKB. Reported-by: Eric Leblond <eric@regit.org> Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-10 13:51:44 -05:00
Linus Torvalds	31f8d42d44	Revert "mm: avoid waking kswapd for THP allocations when compaction is deferred or contended" This reverts commit `782fd30406`. We are going to reinstate the __GFP_NO_KSWAPD flag that has been removed, the removal reverted, and then removed again. Making this commit a pointless fixup for a problem that was caused by the removal of __GFP_NO_KSWAPD flag. The thing is, we really don't want to wake up kswapd for THP allocations (because they fail quite commonly under any kind of memory pressure, including when there is tons of memory free), and these patches were just trying to fix up the underlying bug: the original removal of __GFP_NO_KSWAPD in commit `c654345924` ("mm: remove __GFP_NO_KSWAPD") was simply bogus. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-10 10:47:45 -08:00
Neal Cardwell	5e1f54201c	inet_diag: validate port comparison byte code to prevent unsafe reads Add logic to verify that a port comparison byte code operation actually has the second inet_diag_bc_op from which we read the port for such operations. Previously the code blindly referenced op[1] without first checking whether a second inet_diag_bc_op struct could fit there. So a malicious user could make the kernel read 4 bytes beyond the end of the bytecode array by claiming to have a whole port comparison byte code (2 inet_diag_bc_op structs) when in fact the bytecode was not long enough to hold both. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-09 19:00:48 -05:00
Neal Cardwell	f67caec906	inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run() Add logic to check the address family of the user-supplied conditional and the address family of the connection entry. We now do not do prefix matching of addresses from different address families (AF_INET vs AF_INET6), except for the previously existing support for having an IPv4 prefix match an IPv4-mapped IPv6 address (which this commit maintains as-is). This change is needed for two reasons: (1) The addresses are different lengths, so comparing a 128-bit IPv6 prefix match condition to a 32-bit IPv4 connection address can cause us to unwittingly walk off the end of the IPv4 address and read garbage or oops. (2) The IPv4 and IPv6 address spaces are semantically distinct, so a simple bit-wise comparison of the prefixes is not meaningful, and would lead to bogus results (except for the IPv4-mapped IPv6 case, which this commit maintains). Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-09 18:59:37 -05:00
Neal Cardwell	405c005949	inet_diag: validate byte code to prevent oops in inet_diag_bc_run() Add logic to validate INET_DIAG_BC_S_COND and INET_DIAG_BC_D_COND operations. Previously we did not validate the inet_diag_hostcond, address family, address length, and prefix length. So a malicious user could make the kernel read beyond the end of the bytecode array by claiming to have a whole inet_diag_hostcond when the bytecode was not long enough to contain a whole inet_diag_hostcond of the given address family. Or they could make the kernel read up to about 27 bytes beyond the end of a connection address by passing a prefix length that exceeded the length of addresses of the given family. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-09 18:59:37 -05:00
Neal Cardwell	1c95df85ca	inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state Fix inet_diag to be aware of the fact that AF_INET6 TCP connections instantiated for IPv4 traffic and in the SYN-RECV state were actually created with inet_reqsk_alloc(), instead of inet6_reqsk_alloc(). This means that for such connections inet6_rsk(req) returns a pointer to a random spot in memory up to roughly 64KB beyond the end of the request_sock. With this bug, for a server using AF_INET6 TCP sockets and serving IPv4 traffic, an inet_diag user like `ss state SYN-RECV` would lead to inet_diag_fill_req() causing an oops or the export to user space of 16 bytes of kernel memory as a garbage IPv6 address, depending on where the garbage inet6_rsk(req) pointed. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-09 18:59:37 -05:00
Johannes Weiner	ed23ec4f0a	mm: vmscan: fix inappropriate zone congestion clearing commit `c702418f8a` ("mm: vmscan: do not keep kswapd looping forever due to individual uncompactable zones") removed zone watermark checks from the compaction code in kswapd but left in the zone congestion clearing, which now happens unconditionally on higher order reclaim. This messes up the reclaim throttling logic for zones with dirty/writeback pages, where zones should only lose their congestion status when their watermarks have been restored. Remove the clearing from the zone compaction section entirely. The preliminary zone check and the reclaim loop in kswapd will clear it if the zone is considered balanced. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-08 08:41:18 -08:00
Linus Torvalds	684c9aaebb	vfs: fix O_DIRECT read past end of block device The direct-IO write path already had the i_size checks in mm/filemap.c, but it turns out the read path did not, and removing the block size checks in fs/block_dev.c (commit `bbec0270bd`: "blkdev_max_block: make private to fs/buffer.c") removed the magic "shrink IO to past the end of the device" code there. Fix it by truncating the IO to the size of the block device, like the write path already does. NOTE! I suspect the write path would be much better off doing it this way in fs/block_dev.c, rather than hidden deep in mm/filemap.c. The mm/filemap.c code is extremely hard to follow, and has various conditionals on the target being a block device (ie the flag passed in to 'generic_write_checks()', along with a conditional update of the inode timestamp etc). It is also quite possible that we should treat this whole block device size as a "s_maxbytes" issue, and try to make the logic even more generic. However, in the meantime this is the fairly minimal targeted fix. Noted by Milan Broz thanks to a regression test for the cryptsetup reencrypt tool. Reported-and-tested-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-08 08:28:26 -08:00
Linus Torvalds	1b3c393cd4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "Two stragglers: 1) The new code that adds new flushing semantics to GRO can cause SKB pointer list corruption, manage the lists differently to avoid the OOPS. Fix from Eric Dumazet. 2) When TCP fast open does a retransmit of data in a SYN-ACK or similar, we update retransmit state that we shouldn't triggering a WARN_ON later. Fix from Yuchung Cheng." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: gro: fix possible panic in skb_gro_receive() tcp: bug fix Fast Open client retransmission	2012-12-07 17:00:57 -08:00
Eric Dumazet	c3c7c254b2	net: gro: fix possible panic in skb_gro_receive() commit `2e71a6f808` (net: gro: selective flush of packets) added a bug for skbs using frag_list. This part of the GRO stack is rarely used, as it needs skb not using a page fragment for their skb->head. Most drivers do use a page fragment, but some of them use GFP_KERNEL allocations for the initial fill of their RX ring buffer. napi_gro_flush() overwrite skb->prev that was used for these skb to point to the last skb in frag_list. Fix this using a separate field in struct napi_gro_cb to point to the last fragment. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-07 14:39:29 -05:00
Yuchung Cheng	93b174ad71	tcp: bug fix Fast Open client retransmission If SYN-ACK partially acks SYN-data, the client retransmits the remaining data by tcp_retransmit_skb(). This increments lost recovery state variables like tp->retrans_out in Open state. If loss recovery happens before the retransmission is acked, it triggers the WARN_ON check in tcp_fastretrans_alert(). For example: the client sends SYN-data, gets SYN-ACK acking only ISN, retransmits data, sends another 4 data packets and get 3 dupacks. Since the retransmission is not caused by network drop it should not update the recovery state variables. Further the server may return a smaller MSS than the cached MSS used for SYN-data, so the retranmission needs a loop. Otherwise some data will not be retransmitted until timeout or other loss recovery events. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-07 14:39:28 -05:00
Linus Torvalds	1afa471706	MMC fixes for 3.7: - sdhci-s3c: Fix runtime PM regression against 3.7-rc1 - sh-mmcif: Fix oops against 3.6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQwOsQAAoJEHNBYZ7TNxYMkiAQAKfrOAFIe2eH+Z651iowbZxs WUV5eocfZPDnahxpkOQmK7Oa+BA7Kqqa8bql0+8CXkrXNh5zDZ6uuK4M8fWWYIAp uRgLetjC3/jB9dBXP9oxNBJ6oXcq/fuwx33Zvxln1FVDGy15glCeJ9O2lHhcwiB2 I5ezzOTyU1CogqofuSAh27/9Y6CagDuTBceoeW7ADYD+4nGNX/HET68T9WTaaR2p CM22ogAyMUyCBfACCT50m1LkzmhXq1cn6AptdZzHOhGuQ7QP0L2Ej1LBgTAIIoV+ fJ4Nd7Ht8Xk5arnLBrQYtP8YOg5J0P6bt/Y8Dbqv4gM+vtsBIQ5mHRIFebMpz0to 3zoH8m+Qq1/Ko+zT063FQNC3gRHQLyKurEIPTJO/IxhzIiw8MBT3bwR7rEGO1uCt 8CW633dzu0SjG9Le4A5dLRjQ8s93HKhfCiqXm4/xkdPNgBLRQYDf09Z+G6RTIoaQ O7D6JjbgnGVdzy6HwuWK35LN74kyc8+rACDfbE9KMmtfYt94x6TMBfhQ/rTP5grv pBfLWHyRqScBLpE6RTXFm2FEAR5y1iCWRrZUtJwjJ4rqfgc4li3odvn08+5O2m7v PPEk+OKJSrsSu3f5kJSk4KKYPWFB9MqK+V6dKDjPvJ6u3afmPvx7USgt/zTxtp9c 7Met6CmklOgokwhnFlrN =+gTB -----END PGP SIGNATURE----- Merge tag 'mmc-fixes-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc Pull MMC fixes from Chris Ball: "Two small regression fixes: - sdhci-s3c: Fix runtime PM regression against 3.7-rc1 - sh-mmcif: Fix oops against 3.6" * tag 'mmc-fixes-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: mmc: sh-mmcif: avoid oops on spurious interrupts (second try) Revert misapplied "mmc: sh-mmcif: avoid oops on spurious interrupts" mmc: sdhci-s3c: fix missing clock for gpio card-detect	2012-12-07 09:15:20 -08:00
Mel Gorman	18a2f371f5	tmpfs: fix shared mempolicy leak This fixes a regression in 3.7-rc, which has since gone into stable. Commit `00442ad04a` ("mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went on expecting alloc_page_vma() to drop the refcount it had acquired. This deserves a rework: but for now fix the leak in shmem_alloc_page(). Hugh: shmem_swapin() did not need a fix, but surely it's clearer to use the same refcounting there as in shmem_alloc_page(), delete its onstack mempolicy, and the strange mpol_cond_copy() and __mpol_cond_copy() - those were invented to let swapin_readahead() make an unknown number of calls to alloc_pages_vma() with one mempolicy; but since `00442ad04a`, alloc_pages_vma() has kept refcount in balance, so now no problem. Reported-and-tested-by: Tommi Rantala <tt.rantala@gmail.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-06 11:56:43 -08:00
Johannes Weiner	c702418f8a	mm: vmscan: do not keep kswapd looping forever due to individual uncompactable zones When a zone meets its high watermark and is compactable in case of higher order allocations, it contributes to the percentage of the node's memory that is considered balanced. This requirement, that a node be only partially balanced, came about when kswapd was desparately trying to balance tiny zones when all bigger zones in the node had plenty of free memory. Arguably, the same should apply to compaction: if a significant part of the node is balanced enough to run compaction, do not get hung up on that tiny zone that might never get in shape. When the compaction logic in kswapd is reached, we know that at least 25% of the node's memory is balanced properly for compaction (see zone_balanced and pgdat_balanced). Remove the individual zone checks that restart the kswapd cycle. Otherwise, we may observe more endless looping in kswapd where the compaction code loops back to reclaim because of a single zone and reclaim does nothing because the node is considered balanced overall. See for example https://bugzilla.redhat.com/show_bug.cgi?id=866988 Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-and-tested-by: Thorsten Leemhuis <fedora@leemhuis.info> Reported-by: Jiri Slaby <jslaby@suse.cz> Tested-by: John Ellson <john.ellson@comcast.net> Tested-by: Zdenek Kabelac <zkabelac@redhat.com> Tested-by: Bruno Wolff III <bruno@wolff.to> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-06 11:29:57 -08:00
Mel Gorman	60177d31d2	mm: compaction: validate pfn range passed to isolate_freepages_block Commit `0bf380bc70` ("mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for migration") added a check for pfn_valid() when isolating pages for migration as the scanner does not necessarily start pageblock-aligned. Since commit `c89511ab2f` ("mm: compaction: Restart compaction from near where it left off"), the free scanner has the same problem. This patch makes sure that the pfn range passed to isolate_freepages_block() is within the same block so that pfn_valid() checks are unnecessary. In answer to Henrik's wondering why others have not reported this: reproducing this requires a large enough hole with the right aligment to have compaction walk into a PFN range with no memmap. Size and alignment depends in the memory model - 4M for FLATMEM and 128M for SPARSEMEM on x86. It needs a "lucky" machine. Reported-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-06 11:17:33 -08:00
Guennadi Liakhovetski	91ab252ac5	mmc: sh-mmcif: avoid oops on spurious interrupts (second try) On some systems, e.g., kzm9g, MMCIF interfaces can produce spurious interrupts without any active request. To prevent the Oops, that results in such cases, don't dereference the mmc request pointer until we make sure, that we are indeed processing such a request. Reported-by: Tetsuyuki Kobayashi <koba@kmckk.co.jp> Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Tested-by: Tetsuyuki Kobayashi <koba@kmckk.co.jp> Cc: stable@vger.kernel.org Signed-off-by: Chris Ball <cjb@laptop.org>	2012-12-06 13:54:35 -05:00

1 2 3 4 5 ...

336296 commits