thp: fix MADV_DONTNEED vs. numa balancing race
commit ced108037c
upstream.
In case prot_numa, we are under down_read(mmap_sem). It's critical to
not clear pmd intermittently to avoid race with MADV_DONTNEED which is
also under down_read(mmap_sem):
CPU0: CPU1:
change_huge_pmd(prot_numa=1)
pmdp_huge_get_and_clear_notify()
madvise_dontneed()
zap_pmd_range()
pmd_trans_huge(*pmd) == 0 (without ptl)
// skip the pmd
set_pmd_at();
// pmd is re-established
The race makes MADV_DONTNEED miss the huge pmd and don't clear it
which may break userspace.
Found by code analysis, never saw triggered.
Link: http://lkml.kernel.org/r/20170302151034.27829-3-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[jwang: adjust context for 4.9 ]
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
steinar/wifi_calib_4_9_kernel
parent
7bdd685cef
commit
c2edc33d4a
|
@ -1531,7 +1531,39 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
|
|||
if (prot_numa && pmd_protnone(*pmd))
|
||||
goto unlock;
|
||||
|
||||
entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
|
||||
/*
|
||||
* In case prot_numa, we are under down_read(mmap_sem). It's critical
|
||||
* to not clear pmd intermittently to avoid race with MADV_DONTNEED
|
||||
* which is also under down_read(mmap_sem):
|
||||
*
|
||||
* CPU0: CPU1:
|
||||
* change_huge_pmd(prot_numa=1)
|
||||
* pmdp_huge_get_and_clear_notify()
|
||||
* madvise_dontneed()
|
||||
* zap_pmd_range()
|
||||
* pmd_trans_huge(*pmd) == 0 (without ptl)
|
||||
* // skip the pmd
|
||||
* set_pmd_at();
|
||||
* // pmd is re-established
|
||||
*
|
||||
* The race makes MADV_DONTNEED miss the huge pmd and don't clear it
|
||||
* which may break userspace.
|
||||
*
|
||||
* pmdp_invalidate() is required to make sure we don't miss
|
||||
* dirty/young flags set by hardware.
|
||||
*/
|
||||
entry = *pmd;
|
||||
pmdp_invalidate(vma, addr, pmd);
|
||||
|
||||
/*
|
||||
* Recover dirty/young flags. It relies on pmdp_invalidate to not
|
||||
* corrupt them.
|
||||
*/
|
||||
if (pmd_dirty(*pmd))
|
||||
entry = pmd_mkdirty(entry);
|
||||
if (pmd_young(*pmd))
|
||||
entry = pmd_mkyoung(entry);
|
||||
|
||||
entry = pmd_modify(entry, newprot);
|
||||
if (preserve_write)
|
||||
entry = pmd_mkwrite(entry);
|
||||
|
|
Loading…
Reference in New Issue