redonkable/alistair23-linux

Author	SHA1	Message	Date
Michael Rubin	79da826aee	writeback: report dirty thresholds in /proc/vmstat The kernel already exposes the user desired thresholds in /proc/sys/vm with dirty_background_ratio and background_ratio. But the kernel may alter the number requested without giving the user any indication that is the case. Knowing the actual ratios the kernel is honoring can help app developers understand how their buffered IO will be sent to the disk. $ grep threshold /proc/vmstat nr_dirty_threshold 409111 nr_dirty_background_threshold 818223 Signed-off-by: Michael Rubin <mrubin@google.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:06 -07:00
Michael Rubin	ea941f0e2a	writeback: add nr_dirtied and nr_written to /proc/vmstat To help developers and applications gain visibility into writeback behaviour adding two entries to vm_stat_items and /proc/vmstat. This will allow us to track the "written" and "dirtied" counts. # grep nr_dirtied /proc/vmstat nr_dirtied 3747 # grep nr_written /proc/vmstat nr_written 3618 Signed-off-by: Michael Rubin <mrubin@google.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:06 -07:00
Michael Rubin	f629d1c9bd	mm: add account_page_writeback() To help developers and applications gain visibility into writeback behaviour this patch adds two counters to /proc/vmstat. # grep nr_dirtied /proc/vmstat nr_dirtied 3747 # grep nr_written /proc/vmstat nr_written 3618 These entries allow user apps to understand writeback behaviour over time and learn how it is impacting their performance. Currently there is no way to inspect dirty and writeback speed over time. It's not possible for nr_dirty/nr_writeback. These entries are necessary to give visibility into writeback behaviour. We have /proc/diskstats which lets us understand the io in the block layer. We have blktrace for more in depth understanding. We have e2fsprogs and debugsfs to give insight into the file systems behaviour, but we don't offer our users the ability understand what writeback is doing. There is no way to know how active it is over the whole system, if it's falling behind or to quantify it's efforts. With these values exported users can easily see how much data applications are sending through writeback and also at what rates writeback is processing this data. Comparing the rates of change between the two allow developers to see when writeback is not able to keep up with incoming traffic and the rate of dirty memory being sent to the IO back end. This allows folks to understand their io workloads and track kernel issues. Non kernel engineers at Google often use these counters to solve puzzling performance problems. Patch #4 adds a pernode vmstat file with nr_dirtied and nr_written Patch #5 add writeback thresholds to /proc/vmstat Currently these values are in debugfs. But they should be promoted to /proc since they are useful for developers who are writing databases and file servers and are not debugging the kernel. The output is as below: # grep threshold /proc/vmstat nr_pages_dirty_threshold 409111 nr_pages_dirty_background_threshold 818223 This patch: This allows code outside of the mm core to safely manipulate page writeback state and not worry about the other accounting. Not using these routines means that some code will lose track of the accounting and we get bugs. Modify nilfs2 to use interface. Signed-off-by: Michael Rubin <mrubin@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Jiro SEKIBA <jir@unicus.jp> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:06 -07:00
Vasiliy Kulikov	0def08e3ac	mm/mempolicy.c: check return code of check_range Function check_range may return ERR_PTR(...). Check for it. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Christoph Lameter <cl@linux.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:06 -07:00
Minchan Kim	74e3f3c339	vmscan: prevent background aging of anon page in no swap system Ying Han reported that backing aging of anon pages in no swap system causes unnecessary TLB flush. When I sent a patch(`69c8548175`), I wanted this patch but Rik pointed out and allowed aging of anon pages to give a chance to promote from inactive to active LRU. It has a two problem. 1) non-swap system Never make sense to age anon pages. 2) swap configured but still doesn't swapon It doesn't make sense to age anon pages until swap-on time. But it's arguable. If we have aged anon pages by swapon, VM have moved anon pages from active to inactive. And in the time swapon by admin, the VM can't reclaim hot pages so we can protect hot pages swapout. But let's think about it. When does swap-on happen? It depends on admin. we can't expect it. Nonetheless, we have done aging of anon pages to protect hot pages swapout. It means we lost run time overhead when below high watermark but gain hot page swap-[in/out] overhead when VM decide swapout. Is it true? Let's think more detail. We don't promote anon pages in case of non-swap system. So even though VM does aging of anon pages, the pages would be in inactive LRU for a long time. It means many of pages in there would mark access bit again. So access bit hot/code separation would be pointless. This patch prevents unnecessary anon pages demotion in not-yet-swapon and non-configured swap system. Even, in non-configuared swap system inactive_anon_is_low can be compiled out. It could make side effect that hot anon pages could swap out when admin does swap on. But I think sooner or later it would be steady state. So it's not a big problem. We could lose someting but gain more thing(TLB flush and unnecessary function call to demote anon pages). Signed-off-by: Ying Han <yinghan@google.com> Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:06 -07:00
KAMEZAWA Hiroyuki	49ac825587	memory hotplug: unify is_removable and offline detection code Now, sysfs interface of memory hotplug shows whether the section is removable or not. But it checks only migrateype of pages and doesn't check details of cluster of pages. Next, memory hotplug's set_migratetype_isolate() has the same kind of check, too. This patch adds the function __count_unmovable_pages() and makes above 2 checks to use the same logic. Then, is_removable and hotremove code uses the same logic. No changes in the hotremove logic itself. TODO: need to find a way to check RECLAMABLE. But, considering bit, calling shrink_slab() against a range before starting memory hotremove sounds better. If so, this patch's logic doesn't need to be changed. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reported-by: Michal Hocko <mhocko@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:06 -07:00
KAMEZAWA Hiroyuki	4b20477f58	memory hotplug: fix notifier's return value check Even if notifier cannot find any pages, it doesn't mean no pages are available...And, if there are no notifiers registered, this condition will be always true and memory hotplug will show -EBUSY. This is a bug but not critical. In most case, a pageblock which will be offlined is MIGRATE_MOVABLE This "notifier" is called only when the pageblock is _not_ MIGRATE_MOVABLE. But if not MIGRATE_MOVABLE, it's common case that memory hotplug will fail. So, no one notice this bug. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:06 -07:00
Minchan Kim	cf608ac19c	mm: compaction: fix COMPACTPAGEFAILED counting Presently update_nr_listpages() doesn't have a role. That's because lists passed is always empty just after calling migrate_pages. The migrate_pages cleans up page list which have failed to migrate before returning by `aaa994b3`. [PATCH] page migration: handle freeing of pages in migrate_pages() Do not leave pages on the lists passed to migrate_pages(). Seems that we will not need any postprocessing of pages. This will simplify the handling of pages by the callers of migrate_pages(). At that time, we thought we don't need any postprocessing of pages. But the situation is changed. The compaction need to know the number of failed to migrate for COMPACTPAGEFAILED stat This patch makes new rule for caller of migrate_pages to call putback_lru_pages. So caller need to clean up the lists so it has a chance to postprocess the pages. [suggested by Christoph Lameter] Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andi Kleen <andi@firstfloor.org> Reviewed-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:06 -07:00
Thadeu Lima de Souza Cascardo	e4455abb50	mm: only build per-node scan_unevictable functions when NUMA is enabled Non-NUMA systems do never create these files anyway, since they are only created by driver subsystem when NUMA is configured. [akpm@linux-foundation.org: cleanup] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:05 -07:00
Wu Fengguang	1b430beee5	writeback: remove nonblocking/encountered_congestion references This removes more dead code that was somehow missed by commit `0d99519efe` (writeback: remove unused nonblocking and congestion checks). There are no behavior change except for the removal of two entries from one of the ext4 tracing interface. The nonblocking checks in ->writepages are no longer used because the flusher now prefer to block on get_request_wait() than to skip inodes on IO congestion. The latter will lead to more seeky IO. The nonblocking checks in ->writepage are no longer used because it's redundant with the WB_SYNC_NONE check. We no long set ->nonblocking in VM page out and page migration, because a) it's effectively redundant with WB_SYNC_NONE in current code b) it's old semantic of "Don't get stuck on request queues" is mis-behavior: that would skip some dirty inodes on congestion and page out others, which is unfair in terms of LRU age. Inspired by Christoph Hellwig. Thanks! Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: David Howells <dhowells@redhat.com> Cc: Sage Weil <sage@newdream.net> Cc: Steve French <sfrench@samba.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:05 -07:00
David Rientjes	1e99bad0d9	oom: kill all threads sharing oom killed task's mm It's necessary to kill all threads that share an oom killed task's mm if the goal is to lead to future memory freeing. This patch reintroduces the code removed in `8c5cd6f3` (oom: oom_kill doesn't kill vfork parent (or child)) since it is obsoleted. It's now guaranteed that any task passed to oom_kill_task() does not share an mm with any thread that is unkillable. Thus, we're safe to issue a SIGKILL to any thread sharing the same mm. This is especially necessary to solve an mm->mmap_sem livelock issue whereas an oom killed thread must acquire the lock in the exit path while another thread is holding it in the page allocator while trying to allocate memory itself (and will preempt the oom killer since a task was already killed). Since tasks with pending fatal signals are now granted access to memory reserves, the thread holding the lock may quickly allocate and release the lock so that the oom killed task may exit. This mainly is for threads that are cloned with CLONE_VM but not CLONE_THREAD, so they are in a different thread group. Non-NPTL threads exist in the wild and this change is necessary to prevent the livelock in such cases. We care more about preventing the livelock than incurring the additional tasklist in the oom killer when a task has been killed. Systems that are sufficiently large to not want the tasklist scan in the oom killer in the first place already have the option of enabling /proc/sys/vm/oom_kill_allocating_task, which was designed specifically for that purpose. This code had existed in the oom killer for over eight years dating back to the 2.4 kernel. [akpm@linux-foundation.org: add nice comment] Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:05 -07:00
David Rientjes	e18641e19a	oom: avoid killing a task if a thread sharing its mm cannot be killed The oom killer's goal is to kill a memory-hogging task so that it may exit, free its memory, and allow the current context to allocate the memory that triggered it in the first place. Thus, killing a task is pointless if other threads sharing its mm cannot be killed because of its /proc/pid/oom_adj or /proc/pid/oom_score_adj value. This patch checks whether any other thread sharing p->mm has an oom_score_adj of OOM_SCORE_ADJ_MIN. If so, the thread cannot be killed and oom_badness(p) returns 0, meaning it's unkillable. Signed-off-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:05 -07:00
Mel Gorman	b7f50cfa36	mm, page-allocator: do not check the state of a non-existant buddy during free There is a bug in commit `6dda9d55` ("page allocator: reduce fragmentation in buddy allocator by adding buddies that are merging to the tail of the free lists") that means a buddy at order MAX_ORDER is checked for merging. A page of this order never exists so at times, an effectively random piece of memory is being checked. Alan Curry has reported that this is causing memory corruption in userspace data on a PPC32 platform (http://lkml.org/lkml/2010/10/9/32). It is not clear why this is happening. It could be a cache coherency problem where pages mapped in both user and kernel space are getting different cache lines due to the bad read from kernel space (http://lkml.org/lkml/2010/10/13/179). It could also be that there are some special registers being io-remapped at the end of the memmap array and that a read has special meaning on them. Compiler bugs have been ruled out because the assembly before and after the patch looks relatively harmless. This patch fixes the problem by ensuring we are not reading a possibly invalid location of memory. It's not clear why the read causes corruption but one way or the other it is a buggy read. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Corrado Zoccolo <czoccolo@gmail.com> Reported-by: Alan Curry <pacman@kosh.dhis.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:03 -07:00
KAMEZAWA Hiroyuki	f8f72ad539	mm: fix return value of scan_lru_pages in memory unplug scan_lru_pages returns pfn. So, it's type should be "unsigned long" not "int". Note: I guess this has been work until now because memory hotplug tester's machine has not very big memory.... physical address < 32bit << PAGE_SHIFT. Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:03 -07:00
Linus Torvalds	f1ebdd60cc	Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (22 commits) Add _addr_lsb field to ia64 siginfo Fix migration.c compilation on s390 HWPOISON: Remove retry loop for try_to_unmap HWPOISON: Turn addr_valid from bitfield into char HWPOISON: Disable DEBUG by default HWPOISON: Convert pr_debugs to pr_info HWPOISON: Improve comments in memory-failure.c x86: HWPOISON: Report correct address granuality for huge hwpoison faults Encode huge page size for VM_FAULT_HWPOISON errors Fix build error with !CONFIG_MIGRATION hugepage: move is_hugepage_on_freelist inside ifdef to avoid warning Clean up __page_set_anon_rmap HWPOISON, hugetlb: fix unpoison for hugepage HWPOISON, hugetlb: soft offlining for hugepage HWPOSION, hugetlb: recover from free hugepage error when !MF_COUNT_INCREASED hugetlb: move refcounting in hugepage allocation inside hugetlb_lock HWPOISON, hugetlb: add free check to dequeue_hwpoison_huge_page() hugetlb: hugepage migration core hugetlb: redefine hugepage copy functions hugetlb: add allocate function for hugepage migration ...	2010-10-26 10:13:10 -07:00
Nick Piggin	7ccf19a804	fs: inode split IO and LRU lists The use of the same inode list structure (inode->i_list) for two different list constructs with different lifecycles and purposes makes it impossible to separate the locking of the different operations. Therefore, to enable the separation of the locking of the writeback and reclaim lists, split the inode->i_list into two separate lists dedicated to their specific tracking functions. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:15 -04:00
Christoph Hellwig	85fe4025c6	fs: do not assign default i_ino in new_inode Instead of always assigning an increasing inode number in new_inode move the call to assign it into those callers that actually need it. For now callers that need it is estimated conservatively, that is the call is added to all filesystems that do not assign an i_ino by themselves. For a few more filesystems we can avoid assigning any inode number given that they aren't user visible, and for others it could be done lazily when an inode number is actually needed, but that's left for later patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:11 -04:00
Al Viro	7de9c6ee3e	new helper: ihold() Clones an existing reference to inode; caller must already hold one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:11 -04:00
Al Viro	1d3382cbf0	new helper: inode_unhashed() note: for race-free uses you inode_lock held Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:24:15 -04:00
Martin Schwidefsky	e2b8d7af0e	[S390] add support for nonquiescing sske Improve performance of the sske operation by using the nonquiescing variant if the affected page has no mappings established. On machines with no support for the new sske variant the mask bit will be ignored. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2010-10-25 16:10:15 +02:00
Linus Torvalds	229aebb873	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) Update broken web addresses in arch directory. Update broken web addresses in the kernel. Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget Revert "Fix typo: configuation => configuration" partially ida: document IDA_BITMAP_LONGS calculation ext2: fix a typo on comment in ext2/inode.c drivers/scsi: Remove unnecessary casts of private_data drivers/s390: Remove unnecessary casts of private_data net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data drivers/infiniband: Remove unnecessary casts of private_data drivers/gpu/drm: Remove unnecessary casts of private_data kernel/pm_qos_params.c: Remove unnecessary casts of private_data fs/ecryptfs: Remove unnecessary casts of private_data fs/seq_file.c: Remove unnecessary casts of private_data arm: uengine.c: remove C99 comments arm: scoop.c: remove C99 comments Fix typo configue => configure in comments Fix typo: configuation => configuration Fix typo interrest[ing\|ed] => interest[ing\|ed] Fix various typos of valid in comments ... Fix up trivial conflicts in: drivers/char/ipmi/ipmi_si_intf.c drivers/usb/gadget/rndis.c net/irda/irnet/irnet_ppp.c	2010-10-24 13:41:39 -07:00
Linus Torvalds	76c39e4fef	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: (27 commits) SLUB: Fix memory hotplug with !NUMA slub: Move functions to reduce #ifdefs slub: Enable sysfs support for !CONFIG_SLUB_DEBUG SLUB: Optimize slab_free() debug check slub: Move NUMA-related functions under CONFIG_NUMA slub: Add lock release annotation slub: Fix signedness warnings slub: extract common code to remove objects from partial list without locking SLUB: Pass active and inactive redzone flags instead of boolean to debug functions slub: reduce differences between SMP and NUMA Revert "Slub: UP bandaid" percpu: clear memory allocated with the km allocator percpu: use percpu allocator on UP too percpu: reduce PCPU_MIN_UNIT_SIZE to 32k vmalloc: pcpu_get/free_vm_areas() aren't needed on UP SLUB: Fix merged slab cache names Slub: UP bandaid slub: fix SLUB_RESILIENCY_TEST for dynamic kmalloc caches slub: Fix up missing kmalloc_cache -> kmem_cache_node case for memoryhotplug slub: Add dummy functions for the !SLUB_DEBUG case ...	2010-10-24 12:47:55 -07:00
Linus Torvalds	1765a1fe5d	Merge branch 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (321 commits) KVM: Drop CONFIG_DMAR dependency around kvm_iommu_map_pages KVM: Fix signature of kvm_iommu_map_pages stub KVM: MCE: Send SRAR SIGBUS directly KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED KVM: fix typo in copyright notice KVM: Disable interrupts around get_kernel_ns() KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address KVM: MMU: move access code parsing to FNAME(walk_addr) function KVM: MMU: audit: check whether have unsync sps after root sync KVM: MMU: audit: introduce audit_printk to cleanup audit code KVM: MMU: audit: unregister audit tracepoints before module unloaded KVM: MMU: audit: fix vcpu's spte walking KVM: MMU: set access bit for direct mapping KVM: MMU: cleanup for error mask set while walk guest page table KVM: MMU: update 'root_hpa' out of loop in PAE shadow path KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn() KVM: x86: Fix constant type in kvm_get_time_scale KVM: VMX: Add AX to list of registers clobbered by guest switch KVM guest: Move a printk that's using the clock before it's ready KVM: x86: TSC catchup mode ...	2010-10-24 12:47:25 -07:00
Pekka Enberg	6d4121f6c2	Merge branch 'master' into for-linus Conflicts: include/linux/percpu.h mm/percpu.c	2010-10-24 19:57:05 +03:00
Xiao Guangrong	45888a0c6e	export __get_user_pages_fast() function This function is used by KVM to pin process's page in the atomic context. Define the 'weak' function to avoid other architecture not support it Acked-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:24 +02:00
Linus Torvalds	0fc0531e0a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: update comments to reflect that percpu allocations are always zero-filled percpu: Optimize __get_cpu_var() x86, percpu: Optimize this_cpu_ptr percpu: clear memory allocated with the km allocator percpu: fix build breakage on s390 and cleanup build configuration tests percpu: use percpu allocator on UP too percpu: reduce PCPU_MIN_UNIT_SIZE to 32k vmalloc: pcpu_get/free_vm_areas() aren't needed on UP Fixed up trivial conflicts in include/linux/percpu.h	2010-10-22 17:31:36 -07:00
Linus Torvalds	91b745016c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: remove in_workqueue_context() workqueue: Clarify that schedule_on_each_cpu is synchronous memory_hotplug: drop spurious calls to flush_scheduled_work() shpchp: update workqueue usage pciehp: update workqueue usage isdn/eicon: don't call flush_scheduled_work() from diva_os_remove_soft_isr() workqueue: add and use WQ_MEM_RECLAIM flag workqueue: fix HIGHPRI handling in keep_working() workqueue: add queue_work and activate_work trace points workqueue: prepare for more tracepoints workqueue: implement flush[_delayed]_work_sync() workqueue: factor out start_flush_work() workqueue: cleanup flush/cancel functions workqueue: implement alloc_ordered_workqueue() Fix up trivial conflict in fs/gfs2/main.c as per Tejun	2010-10-22 17:13:10 -07:00
Linus Torvalds	a2887097f2	Merge branch 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits) xen-blkfront: disable barrier/flush write support Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c block: remove BLKDEV_IFL_WAIT aic7xxx_old: removed unused 'req' variable block: remove the BH_Eopnotsupp flag block: remove the BLKDEV_IFL_BARRIER flag block: remove the WRITE_BARRIER flag swap: do not send discards as barriers fat: do not send discards as barriers ext4: do not send discards as barriers jbd2: replace barriers with explicit flush / FUA usage jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier jbd: replace barriers with explicit flush / FUA usage nilfs2: replace barriers with explicit flush / FUA usage reiserfs: replace barriers with explicit flush / FUA usage gfs2: replace barriers with explicit flush / FUA usage btrfs: replace barriers with explicit flush / FUA usage xfs: replace barriers with explicit flush / FUA usage block: pass gfp_mask and flags to sb_issue_discard dm: convey that all flushes are processed as empty ...	2010-10-22 17:07:18 -07:00
Andi Kleen	46e387bbd8	Merge branch 'hwpoison-hugepages' into hwpoison Conflicts: mm/memory-failure.c	2010-10-22 17:40:48 +02:00
Andi Kleen	e9d08567ef	Merge branch 'hwpoison-cleanups' into hwpoison	2010-10-22 17:40:11 +02:00
Linus Torvalds	3044100e58	Merge branch 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (74 commits) x86-64: Only set max_pfn_mapped to 512 MiB if we enter via head_64.S xen: Cope with unmapped pages when initializing kernel pagetable memblock, bootmem: Round pfn properly for memory and reserved regions memblock: Annotate memblock functions with __init_memblock memblock: Allow memblock_init to be called early memblock/arm: Fix memblock_region_is_memory() typo x86, memblock: Remove __memblock_x86_find_in_range_size() memblock: Fix wraparound in find_region() x86-32, memblock: Make add_highpages honor early reserved ranges x86, memblock: Fix crashkernel allocation arm, memblock: Fix the sparsemem build memblock: Fix section mismatch warnings powerpc, memblock: Fix memblock API change fallout memblock, microblaze: Fix memblock API change fallout x86: Remove old bootmem code x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve x86: Remove not used early_res code x86, memblock: Replace e820_/_early string with memblock_ x86: Use memblock to replace early_res x86, memblock: Use memblock_debug to control debug message print out ... Fix up trivial conflicts in arch/x86/kernel/setup.c and kernel/Makefile	2010-10-21 18:52:11 -07:00
Linus Torvalds	c3b86a2942	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32, percpu: Correct the ordering of the percpu readmostly section x86, mm: Enable ARCH_DMA_ADDR_T_64BIT with X86_64 \|\| HIGHMEM64G x86: Spread tlb flush vector between nodes percpu: Introduce a read-mostly percpu API x86, mm: Fix incorrect data type in vmalloc_sync_all() x86, mm: Hold mm->page_table_lock while doing vmalloc_sync x86, mm: Fix bogus whitespace in sync_global_pgds() x86-32: Fix sparse warning for the __PHYSICAL_MASK calculation x86, mm: Add RESERVE_BRK_ARRAY() helper mm, x86: Saving vmcore with non-lazy freeing of vmas x86, kdump: Change copy_oldmem_page() to use cached addressing x86, mm: fix uninitialized addr in kernel_physical_mapping_init() x86, kmemcheck: Remove double test x86, mm: Make spurious_fault check explicitly check the PRESENT bit x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions x86, mm: Avoid unnecessary TLB flush	2010-10-21 13:47:29 -07:00
Tejun Heo	10ccd84695	memory_hotplug: drop spurious calls to flush_scheduled_work() lru_add_drain_all() uses schedule_on_each_cpu() which is synchronous. There is no reason to call flush_scheduled_work() after lru_add_drain_all(). Drop the spurious calls. This is to prepare for the deprecation and removal of flush_scheduled_work(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Mel Gorman <mel@csn.ul.ie>	2010-10-19 11:08:41 +02:00
Jens Axboe	fa251f8990	Merge branch 'v2.6.36-rc8' into for-2.6.37/barrier Conflicts: block/blk-core.c drivers/block/loop.c mm/swapfile.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-19 09:13:04 +02:00
H. Peter Anvin	8e4029ee35	Merge branch 'x86/urgent' into core/memblock Reason for merge: Forward-port urgent change to arch/x86/mm/srat_64.c to the memblock tree. Resolved Conflicts: arch/x86/mm/srat_64.c Originally-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-11 17:05:11 -07:00
Yinghai Lu	cd79481d27	memblock: Annotate memblock functions with __init_memblock Stephen found WARNING: mm/built-in.o(.text+0x25ab8): Section mismatch in reference from the function memblock_find_base() to the function .init.text:memblock_find_region() The function memblock_find_base() references the function __init memblock_find_region(). This is often because memblock_find_base lacks a __init annotation or the annotation of memblock_find_region is wrong. So let memblock_find_region() to use __init_memblock instead of __init directly. Also fix one function that did not have __init* to be __init_memblock. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CB366B1.40405@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-11 16:00:52 -07:00
Jeremy Fitzhardinge	236260b90d	memblock: Allow memblock_init to be called early The Xen setup code needs to call memblock_x86_reserve_range() very early, so allow it to initialize the memblock subsystem before doing so. The second memblock_init() is ignored. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> LKML-Reference: <4CACFDAD.3090900@goop.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-11 15:59:01 -07:00
Andi Kleen	3ef8fd7f72	Fix migration.c compilation on s390 31bit s390 doesn't have huge pages and failed with: > mm/migrate.c: In function 'remove_migration_pte': > mm/migrate.c:143:3: error: implicit declaration of function 'pte_mkhuge' > mm/migrate.c:143:7: error: incompatible types when assigning to type 'pte_t' from type 'int' Put that code into a ifdef. Reported by Heiko Carstens Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-11 16:57:39 +02:00
Andi Kleen	a08c80ebb6	HWPOISON: Remove retry loop for try_to_unmap We don't reply in other temporary failure cases and there were no reports of replies happening. I think the original reason it was added was also just an early bug, not an observation of the race. So remove the loop for now, but keep a warning message. Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:33:01 +02:00
Andi Kleen	9033ae1640	HWPOISON: Turn addr_valid from bitfield into char The addr_valid flag is the only flag in "to_kill" and it's slightly more efficient to have it as char instead of a bitfield. Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:33:01 +02:00
Andi Kleen	898e70d1e5	HWPOISON: Disable DEBUG by default Now that only a few obscure messages are left as pr_debug disable outputting of pr_debug in memory-failure.c by default. Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:33:00 +02:00
Andi Kleen	fb46e73520	HWPOISON: Convert pr_debugs to pr_info Convert a lot of pr_debugs in memory-failure.c that are generally useful to pr_info. It's reasonable to print at least one message why offlining succeeded or failed by default. Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:33:00 +02:00
Andi Kleen	1c80b990a3	HWPOISON: Improve comments in memory-failure.c Clean up and improve the overview comment in memory-failure.c Tidy some grammar issues in other comments. Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:33:00 +02:00
Andi Kleen	aa50d3a7aa	Encode huge page size for VM_FAULT_HWPOISON errors This fixes a problem introduced with the hugetlb hwpoison handling The user space SIGBUS signalling wants to know the size of the hugepage that caused a HWPOISON fault. Unfortunately the architecture page fault handlers do not have easy access to the struct page. Pass the information out in the fault error code instead. I added a separate VM_FAULT_HWPOISON_LARGE bit for this case and encode the hpage index in some free upper bits of the fault code. The small page hwpoison keeps stays with the VM_FAULT_HWPOISON name to minimize changes. Also add code to hugetlb.h to convert that index into a page shift. Will be used in a further patch. Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: fengguang.wu@intel.com Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:32:46 +02:00
Andi Kleen	d5bd910696	hugepage: move is_hugepage_on_freelist inside ifdef to avoid warning Fixes warning reported by Stephen Rothwell mm/hugetlb.c:2950: warning: 'is_hugepage_on_freelist' defined but not used for the !CONFIG_MEMORY_FAILURE case. Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:32:46 +02:00
Andi Kleen	4e1c19750a	Clean up __page_set_anon_rmap Linus asked for a cleanup of __page_set_anon_rmap to make it look more like the cleaner huge pages version. Factor out the duplicated PageAnon check into a single check at the beginning of the function. Remove obsolete comments and rewrite them into standard English. No functional changes. Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:32:46 +02:00
Naoya Horiguchi	6a90181c7b	HWPOISON, hugetlb: fix unpoison for hugepage Currently unpoisoning hugepages doesn't work correctly because clearing PG_HWPoison is done outside if (TestClearPageHWPoison). This patch fixes it. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:32:45 +02:00
Naoya Horiguchi	d950b95882	HWPOISON, hugetlb: soft offlining for hugepage This patch extends soft offlining framework to support hugepage. When memory corrected errors occur repeatedly on a hugepage, we can choose to stop using it by migrating data onto another hugepage and disabling the original (maybe half-broken) one. ChangeLog since v4: - branch soft_offline_page() for hugepage ChangeLog since v3: - remove comment about "ToDo: hugepage soft-offline" ChangeLog since v2: - move refcount handling into isolate_lru_page() ChangeLog since v1: - add double check in isolating hwpoisoned hugepage - define free/non-free checker for hugepage - postpone calling put_page() for hugepage in soft_offline_page() Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:32:45 +02:00
Naoya Horiguchi	8c6c2ecb44	HWPOSION, hugetlb: recover from free hugepage error when !MF_COUNT_INCREASED Currently error recovery for free hugepage works only for MF_COUNT_INCREASED. This patch enables !MF_COUNT_INCREASED case. Free hugepages can be handled directly by alloc_huge_page() and dequeue_hwpoisoned_huge_page(), and both of them are protected by hugetlb_lock, so there is no race between them. Note that this patch defines the refcount of HWPoisoned hugepage dequeued from freelist is 1, deviated from present 0, thereby we can avoid race between unpoison and memory failure on free hugepage. This is reasonable because unlikely to free buddy pages, free hugepage is governed by hugetlbfs even after error handling finishes. And it also makes unpoison code added in the later patch cleaner. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:32:45 +02:00
Naoya Horiguchi	a9869b837c	hugetlb: move refcounting in hugepage allocation inside hugetlb_lock Currently alloc_huge_page() raises page refcount outside hugetlb_lock. but it causes race when dequeue_hwpoison_huge_page() runs concurrently with alloc_huge_page(). To avoid it, this patch moves set_page_refcounted() in hugetlb_lock. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:32:45 +02:00
Naoya Horiguchi	6de2b1aab9	HWPOISON, hugetlb: add free check to dequeue_hwpoison_huge_page() This check is necessary to avoid race between dequeue and allocation, which can cause a free hugepage to be dequeued twice and get kernel unstable. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:32:45 +02:00
Naoya Horiguchi	290408d4a2	hugetlb: hugepage migration core This patch extends page migration code to support hugepage migration. One of the potential users of this feature is soft offlining which is triggered by memory corrected errors (added by the next patch.) Todo: - there are other users of page migration such as memory policy, memory hotplug and memocy compaction. They are not ready for hugepage support for now. ChangeLog since v4: - define migrate_huge_pages() - remove changes on isolation/putback_lru_page() ChangeLog since v2: - refactor isolate/putback_lru_page() to handle hugepage - add comment about race on unmap_and_move_huge_page() ChangeLog since v1: - divide migration code path for hugepage - define routine checking migration swap entry for hugetlb - replace "goto" with "if/else" in remove_migration_pte() Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:32:45 +02:00
Naoya Horiguchi	0ebabb416f	hugetlb: redefine hugepage copy functions This patch modifies hugepage copy functions to have only destination and source hugepages as arguments for later use. The old ones are renamed from copy_{gigantic,huge}_page() to copy_user_{gigantic,huge}_page(). This naming convention is consistent with that between copy_highpage() and copy_user_highpage(). ChangeLog since v4: - add blank line between local declaration and code - remove unnecessary might_sleep() ChangeLog since v2: - change copy_huge_page() from macro to inline dummy function to avoid compile warning when !CONFIG_HUGETLB_PAGE. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:32:44 +02:00
Naoya Horiguchi	bf50bab2b3	hugetlb: add allocate function for hugepage migration We can't use existing hugepage allocation functions to allocate hugepage for page migration, because page migration can happen asynchronously with the running processes and page migration users should call the allocation function with physical addresses (not virtual addresses) as arguments. ChangeLog since v3: - unify alloc_buddy_huge_page() and alloc_buddy_huge_page_node() ChangeLog since v2: - remove unnecessary get/put_mems_allowed() (thanks to David Rientjes) ChangeLog since v1: - add comment on top of alloc_huge_page_no_vma() Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:32:44 +02:00
Naoya Horiguchi	998b4382c1	hugetlb: fix metadata corruption in hugetlb_fault() Since the PageHWPoison() check is for avoiding hwpoisoned page remained in pagecache mapping to the process, it should be done in "found in pagecache" branch, not in the common path. Otherwise, metadata corruption occurs if memory failure happens between alloc_huge_page() and lock_page() because page fault fails with metadata changes remained (such as refcount, mapcount, etc.) This patch moves the check to "found in pagecache" branch and fix the problem. ChangeLog since v2: - remove retry check in "new allocation" path. - make description more detailed - change patch name from "HWPOISON, hugetlb: move PG_HWPoison bit check" Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:32:44 +02:00
Ingo Molnar	153db80f8c	Merge commit 'v2.6.36-rc7' into core/memblock Merge reason: Update from -rc3 to -rc7. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-08 09:15:00 +02:00
Linus Torvalds	6b0cd00bc3	Merge branch 'hwpoison-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: HWPOISON: Stop shrinking at right page count HWPOISON: Report correct address granuality for AO huge page errors HWPOISON: Copy si_addr_lsb to user page-types.c: fix name of unpoison interface	2010-10-07 13:59:32 -07:00
Kirill A. Shutemov	ad4ca5f4b7	memcg: fix thresholds with use_hierarchy == 1 We need to check parent's thresholds if parent has use_hierarchy == 1 to be sure that parent's threshold events will be triggered even if parent itself is not active (no MEM_CGROUP_EVENTS). Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-07 13:31:21 -07:00
Robin Holt	f241e6607b	mm: alloc_large_system_hash() printk overflow on 16TB boot During boot of a 16TB system, the following is printed: Dentry cache hash table entries: -2147483648 (order: 22, 17179869184 bytes) Signed-off-by: Robin Holt <holt@sgi.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-07 13:31:21 -07:00
Andi Kleen	47f43e7efa	HWPOISON: Stop shrinking at right page count When we call the slab shrinker to free a page we need to stop at page count one because the caller always holds a single reference, not zero. This avoids useless looping over slab shrinkers and freeing too much memory. Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-07 09:47:10 +02:00
Andi Kleen	0d9ee6a2d4	HWPOISON: Report correct address granuality for AO huge page errors The SIGBUS user space signalling is supposed to report the address granuality of a corruption. Pass this information correctly for huge pages by querying the hpage order. Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-07 09:45:44 +02:00
Pekka Enberg	92a5bbc11f	SLUB: Fix memory hotplug with !NUMA This patch fixes the following build breakage when memory hotplug is enabled on UMA configurations: /home/test/linux-2.6/mm/slub.c: In function 'kmem_cache_init': /home/test/linux-2.6/mm/slub.c:3031:2: error: 'slab_memory_callback' undeclared (first use in this function) /home/test/linux-2.6/mm/slub.c:3031:2: note: each undeclared identifier is reported only once for each function it appears in make[2]: * [mm/slub.o] Error 1 make[1]: * [mm] Error 2 make: *** [sub-make] Error 2 Reported-by: Zimny Lech <napohybelskurwysynom2010@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-06 21:16:42 +03:00
Christoph Lameter	a5a84755c5	slub: Move functions to reduce #ifdefs There is a lot of #ifdef/#endifs that can be avoided if functions would be in different places. Move them around and reduce #ifdef. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-06 16:54:37 +03:00
Christoph Lameter	ab4d5ed5ee	slub: Enable sysfs support for !CONFIG_SLUB_DEBUG Currently disabling CONFIG_SLUB_DEBUG also disabled SYSFS support meaning that the slabs cannot be tuned without DEBUG. Make SYSFS support independent of CONFIG_SLUB_DEBUG Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-06 16:54:36 +03:00
Pekka Enberg	15b7c51420	SLUB: Optimize slab_free() debug check This patch optimizes slab_free() debug check to use "c->node != NUMA_NO_NODE" instead of "c->node >= 0" because the former generates smaller code on x86-64: Before: 4736: 48 39 70 08 cmp %rsi,0x8(%rax) 473a: 75 26 jne 4762 <kfree+0xa2> 473c: 44 8b 48 10 mov 0x10(%rax),%r9d 4740: 45 85 c9 test %r9d,%r9d 4743: 78 1d js 4762 <kfree+0xa2> After: 4736: 48 39 70 08 cmp %rsi,0x8(%rax) 473a: 75 23 jne 475f <kfree+0x9f> 473c: 83 78 10 ff cmpl $0xffffffffffffffff,0x10(%rax) 4740: 74 1d je 475f <kfree+0x9f> This patch also cleans up __slab_alloc() to use NUMA_NO_NODE instead of "-1" for enabling debugging for a per-CPU cache. Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-06 16:52:26 +03:00
Yinghai Lu	f1af98c762	memblock: Fix wraparound in find_region() When trying to find huge range for crashkernel, get [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at arch/x86/mm/memblock.c:248 memblock_x86_reserve_range+0x40/0x7a() [ 0.000000] Hardware name: Sun Fire x4800 [ 0.000000] memblock_x86_reserve_range: wrong range [0xffffffff37000000, 0x137000000) [ 0.000000] Modules linked in: [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.36-rc5-tip-yh-01876-g1cac214-dirty #59 [ 0.000000] Call Trace: [ 0.000000] [<ffffffff82816f7e>] ? memblock_x86_reserve_range+0x40/0x7a [ 0.000000] [<ffffffff81078c2d>] warn_slowpath_common+0x85/0x9e [ 0.000000] [<ffffffff81078d38>] warn_slowpath_fmt+0x6e/0x70 [ 0.000000] [<ffffffff8281e77c>] ? memblock_find_region+0x40/0x78 [ 0.000000] [<ffffffff8281eb1f>] ? memblock_find_base+0x9a/0xb9 [ 0.000000] [<ffffffff82816f7e>] memblock_x86_reserve_range+0x40/0x7a [ 0.000000] [<ffffffff8280452c>] setup_arch+0x99d/0xb2a [ 0.000000] [<ffffffff810a3e02>] ? trace_hardirqs_off+0xd/0xf [ 0.000000] [<ffffffff81cec7d8>] ? _raw_spin_unlock_irqrestore+0x3d/0x4c [ 0.000000] [<ffffffff827ffcec>] start_kernel+0xde/0x3f1 [ 0.000000] [<ffffffff827ff2d4>] x86_64_start_reservations+0xa0/0xa4 [ 0.000000] [<ffffffff827ff3de>] x86_64_start_kernel+0x106/0x10d [ 0.000000] ---[ end trace a7919e7f17c0a725 ]--- [ 0.000000] Reserving 8192MB of memory at 17592186041200MB for crashkernel (System RAM: 526336MB) This is caused by a wraparound in the test due to size > end; explicitly check for this condition and fail. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CAA4DD3.1080401@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-10-05 21:45:35 -07:00
Hugh Dickins	4e31635c36	ksm: fix bad user data when swapping Building under memory pressure, with KSM on 2.6.36-rc5, collapsed with an internal compiler error: typically indicating an error in swapping. Perhaps there's a timing issue which makes it now more likely, perhaps it's just a long time since I tried for so long: this bug goes back to KSM swapping in 2.6.33. Notice how reuse_swap_page() allows an exclusive page to be reused, but only does SetPageDirty if it can delete it from swap cache right then - if it's currently under Writeback, it has to be left in cache and we don't SetPageDirty, but the page can be reused. Fine, the dirty bit will get set in the pte; but notice how zap_pte_range() does not bother to transfer pte_dirty to page_dirty when unmapping a PageAnon. If KSM chooses to share such a page, it will look like a clean copy of swapcache, and not be written out to swap when its memory is needed; then stale data read back from swap when it's needed again. We could fix this in reuse_swap_page() (or even refuse to reuse a page under writeback), but it's more honest to fix my oversight in KSM's write_protect_page(). Several days of testing on three machines confirms that this fixes the issue they showed. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-04 11:09:53 -07:00
Hugh Dickins	4829b906cc	ksm: fix page_address_in_vma anon_vma oops 2.6.36-rc1 commit `21d0d443cd` "rmap: resurrect page_address_in_vma anon_vma check" was right to resurrect that check; but now that it's comparing anon_vma->roots instead of just anon_vmas, there's a danger of oopsing on a NULL anon_vma. In most cases no NULL anon_vma ever gets here; but it turns out that occasionally KSM, when enabled on a forked or forking process, will itself call page_address_in_vma() on a "half-KSM" page left over from an earlier failed attempt to merge - whose page_anon_vma() is NULL. It's my bug that those should be getting here at all: I thought they were already dealt with, this oops proves me wrong, I'll fix it in the next release - such pages are effectively pinned until their process exits, since rmap cannot find their ptes (though swapoff can). For now just work around it by making page_address_in_vma() safe (and add a comment on why that check is wanted anyway). A similar check in __page_check_anon_rmap() is safe because do_page_add_anon_rmap() already excluded KSM pages. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-04 11:09:53 -07:00
Namhyung Kim	5d1f57e4d3	slub: Move NUMA-related functions under CONFIG_NUMA Make kmalloc_cache_alloc_node_notrace(), kmalloc_large_node() and __kmalloc_node_track_caller() to be compiled only when CONFIG_NUMA is selected. Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-02 10:47:53 +03:00
Namhyung Kim	3478973ded	slub: Add lock release annotation The unfreeze_slab() releases page's PG_locked bit but was missing proper annotation. The deactivate_slab() needs to be marked also since it calls unfreeze_slab() without grabbing the lock. Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-02 10:47:53 +03:00
Namhyung Kim	a5dd5c117c	slub: Fix signedness warnings The bit-ops routines require its arg to be a pointer to unsigned long. This leads sparse to complain about different signedness as follows: mm/slub.c:2425:49: warning: incorrect type in argument 2 (different signedness) mm/slub.c:2425:49: expected unsigned long volatile addr mm/slub.c:2425:49: got long map Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-02 10:47:52 +03:00
Christoph Lameter	62e346a830	slub: extract common code to remove objects from partial list without locking There are a couple of places where repeat the same statements when removing a page from the partial list. Consolidate that into __remove_partial(). Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-02 10:44:10 +03:00
Christoph Lameter	f7cb193362	SLUB: Pass active and inactive redzone flags instead of boolean to debug functions Pass the actual values used for inactive and active redzoning to the functions that check the objects. Avoids a lot of the ? : things to lookup the values in the functions. Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-02 10:44:10 +03:00
Christoph Lameter	7340cc8414	slub: reduce differences between SMP and NUMA Reduce the #ifdefs and simplify bootstrap by making SMP and NUMA as much alike as possible. This means that there will be an additional indirection to get to the kmem_cache_node field under SMP. Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-02 10:44:10 +03:00
Pekka Enberg	ed59ecbf89	Revert "Slub: UP bandaid" This reverts commit 5249d039500f05a5ab379286b1d23ab9b04d3f2c. It's not needed after commit `bbddff0545` ("percpu: use percpu allocator on UP too").	2010-10-02 10:28:55 +03:00
Tejun Heo	ed6c1115c8	percpu: clear memory allocated with the km allocator Percpu allocator should clear memory before returning it but the km allocator forgot to do it. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra <peterz@infradead.org>	2010-10-02 10:28:42 +03:00
Tejun Heo	9b8327bb24	percpu: use percpu allocator on UP too On UP, percpu allocations were redirected to kmalloc. This has the following problems. * For certain amount of allocations (determined by PERCPU_DYNAMIC_EARLY_SLOTS and PERCPU_DYNAMIC_EARLY_SIZE), percpu allocator can be used before the usual kernel memory allocator is brought online. On SMP, this is used to initialize the kernel memory allocator. * percpu allocator honors alignment upto PAGE_SIZE but kmalloc() doesn't. For example, workqueue makes use of larger alignments for cpu_workqueues. Currently, users of percpu allocators need to handle UP differently, which is somewhat fragile and ugly. Other than small amount of memory, there isn't much to lose by enabling percpu allocator on UP. It can simply use kernel memory based chunk allocation which was added for SMP archs w/o MMUs. This patch removes mm/percpu_up.c, builds mm/percpu.c on UP too and makes UP build use percpu-km. As percpu addresses and kernel addresses are always identity mapped and static percpu variables don't need any special treatment, nothing is arch dependent and mm/percpu.c implements generic setup_per_cpu_areas() for UP. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi>	2010-10-02 10:26:05 +03:00
Tejun Heo	0bc1406241	vmalloc: pcpu_get/free_vm_areas() aren't needed on UP These functions are used only by percpu memory allocator on SMP. Don't build them on UP. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Nick Piggin <npiggin@kernel.dk>	2010-10-02 10:25:25 +03:00
Pekka Enberg	84c1cf6246	SLUB: Fix merged slab cache names As explained by Linus "I'm Proud to be an American" Torvalds: Looking at the merging code, I actually think it's totally buggy. If you have something like this: - load module A: create slab cache A - load module B: create slab cache B that can merge with A - unload module A - "cat /proc/slabinfo": BOOM. Oops. exactly because the name is not handled correctly, and you'll have module B holding open a slab cache that has a name pointer that points to module A that no longer exists. This patch fixes the problem by using kstrdup() to allocate dynamic memory for ->name of "struct kmem_cache" as suggested by Christoph Lameter. Acked-by: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@kernel.org> Conflicts: mm/slub.c	2010-10-02 10:24:29 +03:00
Christoph Lameter	db210e70e5	Slub: UP bandaid Since the percpu allocator does not provide early allocation in UP mode (only in SMP configurations) use __get_free_page() to improvise a compound page allocation that can be later freed via kfree(). Compound pages will be released when the cpu caches are resized. Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-02 10:24:29 +03:00
David Rientjes	a016471a16	slub: fix SLUB_RESILIENCY_TEST for dynamic kmalloc caches Now that the kmalloc_caches array is dynamically allocated at boot, SLUB_RESILIENCY_TEST needs to be fixed to pass the correct type. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-02 10:24:29 +03:00
Christoph Lameter	8de66a0c02	slub: Fix up missing kmalloc_cache -> kmem_cache_node case for memoryhotplug Memory hotplug allocates and frees per node structures. Use the correct name. Acked-by: David Rientjes <rientjes@google.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-02 10:24:28 +03:00
Christoph Lameter	7d550c56a2	slub: Add dummy functions for the !SLUB_DEBUG case On Wed, 25 Aug 2010, Randy Dunlap wrote: > mm/slub.c:1732: error: implicit declaration of function 'slab_pre_alloc_hook' > mm/slub.c:1751: error: implicit declaration of function 'slab_post_alloc_hook' > mm/slub.c:1881: error: implicit declaration of function 'slab_free_hook' > mm/slub.c:1886: error: implicit declaration of function 'slab_free_hook_irq' Empty functions are missing if the runtime debuggability option is compiled out. Provide the fall back functions to empty hooks if SLUB_DEBUG is not set. Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-02 10:24:28 +03:00
David Rientjes	8df275af8d	slob: fix gfp flags for order-0 page allocations kmalloc_node() may allocate higher order slob pages, but the __GFP_COMP bit is only passed to the page allocator and not represented in the tracepoint event. The bit should be passed to trace_kmalloc_node() as well. Acked-by: Matt Mackall <mpm@selenic.com> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-02 10:24:28 +03:00
Christoph Lameter	c1d508365e	slub: Move gfpflag masking out of the hotpath Move the gfpflags masking into the hooks for checkers and into the slowpaths. gfpflag masking requires access to a global variable and thus adds an additional cacheline reference to the hotpaths. If no hooks are active then the gfpflag masking will result in code that the compiler can toss out. Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-02 10:24:27 +03:00
Christoph Lameter	c016b0bdee	slub: Extract hooks for memory checkers from hotpaths Extract the code that memory checkers and other verification tools use from the hotpaths. Makes it easier to add new ones and reduces the disturbances of the hotpaths. Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-02 10:24:27 +03:00
Christoph Lameter	51df114281	slub: Dynamically size kmalloc cache allocations kmalloc caches are statically defined and may take up a lot of space just because the sizes of the node array has to be dimensioned for the largest node count supported. This patch makes the size of the kmem_cache structure dynamic throughout by creating a kmem_cache slab cache for the kmem_cache objects. The bootstrap occurs by allocating the initial one or two kmem_cache objects from the page allocator. C2->C3 - Fix various issues indicated by David - Make create kmalloc_cache return a kmem_cache * pointer. Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-02 10:24:27 +03:00
Christoph Lameter	6c182dc0de	slub: Remove static kmem_cache_cpu array for boot The percpu allocator can now handle allocations during early boot. So drop the static kmem_cache_cpu array. Cc: Tejun Heo <tj@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-02 10:24:26 +03:00
Christoph Lameter	55136592fe	slub: Remove dynamic dma slab allocation Remove the dynamic dma slab allocation since this causes too many issues with nested locks etc etc. The change avoids passing gfpflags into many functions. V3->V4: - Create dma caches in kmem_cache_init() instead of kmem_cache_init_late(). Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-02 10:24:26 +03:00
Christoph Lameter	1537066c69	slub: Force no inlining of debug functions Compiler folds the debgging functions into the critical paths. Avoid that by adding noinline to the functions that check for problems. Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2010-10-02 10:24:26 +03:00
Larry Woodman	5ec1055aa5	Avoid pgoff overflow in remap_file_pages Thomas Pollet noticed that the remap_file_pages() system call in fremap.c has a potential overflow in the first part of the if statement below, which could cause it to process bogus input parameters. Specifically the pgoff + size parameters could be wrap thereby preventing the system call from failing when it should. Reported-by: Thomas Pollet <thomas.pollet@gmail.com> Signed-off-by: Larry Woodman <lwoodman@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-25 09:34:58 -07:00
Linus Torvalds	e92b05dec8	fremap: get rid of broken 'end' variable Thomas Pollet points out that the 'end' variable is broken. It was computed based on start/size before they were page-aligned, and as such doesn't actually match any of the other actions we take. The overflow test on end was also redundant, since we had already tested it with the properly aligned version. So just get rid of it entirely. The one remaining use for that broken variable can just use 'start+size' like all the other cases already did. Reported-by: Thomas Pollet <thomas.pollet@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-24 14:13:57 -07:00
Naoya Horiguchi	a850ea3037	hugetlb, rmap: add BUG_ON(!PageLocked) in hugetlb_add_anon_rmap() Confirming page lock is held in hugetlb_add_anon_rmap() may be useful to detect possible future problems. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-23 17:29:19 -07:00
Naoya Horiguchi	56c9cfb13c	hugetlb, rmap: fix confusing page locking in hugetlb_cow() The "if (!trylock_page)" block in the avoidcopy path of hugetlb_cow() looks confusing and is buggy. Originally this trylock_page() was intended to make sure that old_page is locked even when old_page != pagecache_page, because then only pagecache_page is locked. This patch fixes it by moving page locking into hugetlb_fault(). Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-23 17:29:18 -07:00
Naoya Horiguchi	cd67f0d2a9	hugetlb, rmap: use hugepage_add_new_anon_rmap() in hugetlb_cow() Obviously, setting anon_vma for COWed hugepage should be done by hugepage_add_new_anon_rmap() to scan vmas faster. This patch fixes it. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-23 17:29:18 -07:00
Naoya Horiguchi	433abed6c6	hugetlb, rmap: always use anon_vma root pointer This patch applies Andrea's fix given by the following patch into hugepage rmapping code: commit `288468c334` Author: Andrea Arcangeli <aarcange@redhat.com> Date: Mon Aug 9 17:19:09 2010 -0700 This patch uses anon_vma->root and avoids unnecessary overwriting when anon_vma is already set up. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-23 17:29:18 -07:00
Linus Torvalds	57aebd7739	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: fix pcpu_last_unit_cpu	2010-09-23 08:06:55 -07:00
Andrea Arcangeli	2aeadc30de	mmap: call unlink_anon_vmas() in __split_vma() in case of error If __split_vma fails because of an out of memory condition the anon_vma_chain isn't teardown and freed potentially leading to rmap walks accessing freed vma information plus there's a memleak. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Johannes Weiner <jweiner@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-22 17:22:40 -07:00
David Rientjes	e85bfd3aa7	oom: filter unkillable tasks from tasklist dump /proc/sys/vm/oom_dump_tasks is enabled by default, so it's necessary to limit as much information as possible that it should emit. The tasklist dump should be filtered to only those tasks that are eligible for oom kill. This is already done for memcg ooms, but this patch extends it to both cpuset and mempolicy ooms as well as init. In addition to suppressing irrelevant information, this also reduces confusion since users currently don't know which tasks in the tasklist aren't eligible for kill (such as those attached to cpusets or bound to mempolicies with a disjoint set of mems or nodes, respectively) since that information is not shown. Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-22 17:22:39 -07:00
Minchan Kim	d1908362ae	vmscan: check all_unreclaimable in direct reclaim path M. Vefa Bicakci reported 2.6.35 kernel hang up when hibernation on his 32bit 3GB mem machine. (https://bugzilla.kernel.org/show_bug.cgi?id=16771). Also he bisected the regression to commit `bb21c7ce18` Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Date: Fri Jun 4 14:15:05 2010 -0700 vmscan: fix do_try_to_free_pages() return value when priority==0 reclaim failure At first impression, this seemed very strange because the above commit only chenged function return value and hibernate_preallocate_memory() ignore return value of shrink_all_memory(). But it's related. Now, page allocation from hibernation code may enter infinite loop if the system has highmem. The reasons are that vmscan don't care enough OOM case when oom_killer_disabled. The problem sequence is following as. 1. hibernation 2. oom_disable 3. alloc_pages 4. do_try_to_free_pages if (scanning_global_lru(sc) && !all_unreclaimable) return 1; If kswapd is not freozen, it would set zone->all_unreclaimable to 1 and then shrink_zones maybe return true(ie, all_unreclaimable is true). So at last, alloc_pages could go to _nopage_. If it is, it should have no problem. This patch adds all_unreclaimable check to protect in direct reclaim path, too. It can care of hibernation OOM case and help bailout all_unreclaimable case slightly. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Reported-by: M. Vefa Bicakci <bicave@superonline.com> Reported-by: <caiqian@redhat.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: <caiqian@redhat.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-22 17:22:39 -07:00

1 2 3 4 5 ...

4532 commits