alistair23-linux

redonkable

Author	SHA1	Message	Date
Roberto Sassu	f387759c2d	ima: Call ima_calc_boot_aggregate() in ima_eventdigest_init() commit `6cc7c266e5` upstream. If the template field 'd' is chosen and the digest to be added to the measurement entry was not calculated with SHA1 or MD5, it is recalculated with SHA1, by using the passed file descriptor. However, this cannot be done for boot_aggregate, because there is no file descriptor. This patch adds a call to ima_calc_boot_aggregate() in ima_eventdigest_init(), so that the digest can be recalculated also for the boot_aggregate entry. Cc: stable@vger.kernel.org # 3.13.x Fixes: `3ce1217d6c` ("ima: define template fields library and new helpers") Reported-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:15 +02:00
Roberto Sassu	64712383a1	ima: Directly assign the ima_default_policy pointer to ima_rules commit `067a436b1b` upstream. This patch prevents the following oops: [ 10.771813] BUG: kernel NULL pointer dereference, address: 0000000000000 [...] [ 10.779790] RIP: 0010:ima_match_policy+0xf7/0xb80 [...] [ 10.798576] Call Trace: [ 10.798993] ? ima_lsm_policy_change+0x2b0/0x2b0 [ 10.799753] ? inode_init_owner+0x1a0/0x1a0 [ 10.800484] ? _raw_spin_lock+0x7a/0xd0 [ 10.801592] ima_must_appraise.part.0+0xb6/0xf0 [ 10.802313] ? ima_fix_xattr.isra.0+0xd0/0xd0 [ 10.803167] ima_must_appraise+0x4f/0x70 [ 10.804004] ima_post_path_mknod+0x2e/0x80 [ 10.804800] do_mknodat+0x396/0x3c0 It occurs when there is a failure during IMA initialization, and ima_init_policy() is not called. IMA hooks still call ima_match_policy() but ima_rules is NULL. This patch prevents the crash by directly assigning the ima_default_policy pointer to ima_rules when ima_rules is defined. This wouldn't alter the existing behavior, as ima_rules is always set at the end of ima_init_policy(). Cc: stable@vger.kernel.org # 3.7.x Fixes: `07f6a79415` ("ima: add appraise action keywords and default rules") Reported-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:15 +02:00
Roberto Sassu	4ce29d9b19	ima: Evaluate error in init_ima() commit `e144d6b265` upstream. Evaluate error in init_ima() before register_blocking_lsm_notifier() and return if not zero. Cc: stable@vger.kernel.org # 5.3.x Fixes: `b169424551` ("ima: use the lsm policy update notifier") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:14 +02:00
Roberto Sassu	5f7272bd22	ima: Switch to ima_hash_algo for boot aggregate commit `6f1a1d103b` upstream. boot_aggregate is the first entry of IMA measurement list. Its purpose is to link pre-boot measurements to IMA measurements. As IMA was designed to work with a TPM 1.2, the SHA1 PCR bank was always selected even if a TPM 2.0 with support for stronger hash algorithms is available. This patch first tries to find a PCR bank with the IMA default hash algorithm. If it does not find it, it selects the SHA256 PCR bank for TPM 2.0 and SHA1 for TPM 1.2. Ultimately, it selects SHA1 also for TPM 2.0 if the SHA256 PCR bank is not found. If none of the PCR banks above can be found, boot_aggregate file digest is filled with zeros, as for TPM bypass, making it impossible to perform a remote attestation of the system. Cc: stable@vger.kernel.org # 5.1.x Fixes: `879b589210` ("tpm: retrieve digest size of unknown algorithms with PCR read") Reported-by: Jerry Snitselaar <jsnitsel@redhat.com> Suggested-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:14 +02:00
Krzysztof Struczynski	0698eacdfc	ima: Fix ima digest hash table key calculation commit `1129d31b55` upstream. Function hash_long() accepts unsigned long, while currently only one byte is passed from ima_hash_key(), which calculates a key for ima_htable. Given that hashing the digest does not give clear benefits compared to using the digest itself, remove hash_long() and return the modulus calculated on the first two bytes of the digest with the number of slots. Also reduce the depth of the hash table by doubling the number of slots. Cc: stable@vger.kernel.org Fixes: `3323eec921` ("integrity: IMA as an integrity service provider") Co-developed-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Krzysztof Struczynski <krzysztof.struczynski@huawei.com> Acked-by: David.Laight@aculab.com (big endian system concerns) Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:14 +02:00
Pavel Tatashin	13ae9eaae0	mm: call cond_resched() from deferred_init_memmap() commit `da97f2d56b` upstream. Now that deferred pages are initialized with interrupts enabled we can replace touch_nmi_watchdog() with cond_resched(), as it was before `3a2d7fa8a3`. For now, we cannot do the same in deferred_grow_zone() as it is still initializes pages with interrupts disabled. This change fixes RCU problem described in https://lkml.kernel.org/r/20200401104156.11564-2-david@redhat.com [ 60.474005] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 60.475000] rcu: 1-...0: (0 ticks this GP) idle=02a/1/0x4000000000000000 softirq=1/1 fqs=15000 [ 60.475000] rcu: (detected by 0, t=60002 jiffies, g=-1199, q=1) [ 60.475000] Sending NMI from CPU 0 to CPUs 1: [ 1.760091] NMI backtrace for cpu 1 [ 1.760091] CPU: 1 PID: 20 Comm: pgdatinit0 Not tainted 4.18.0-147.9.1.el8_1.x86_64 #1 [ 1.760091] Hardware name: Red Hat KVM, BIOS 1.13.0-1.module+el8.2.0+5520+4e5817f3 04/01/2014 [ 1.760091] RIP: 0010:__init_single_page.isra.65+0x10/0x4f [ 1.760091] Code: 48 83 cf 63 48 89 f8 0f 1f 40 00 48 89 c6 48 89 d7 e8 6b 18 80 ff 66 90 5b c3 31 c0 b9 10 00 00 00 49 89 f8 48 c1 e6 33 f3 ab <b8> 07 00 00 00 48 c1 e2 36 41 c7 40 34 01 00 00 00 48 c1 e0 33 41 [ 1.760091] RSP: 0000:ffffba783123be40 EFLAGS: 00000006 [ 1.760091] RAX: 0000000000000000 RBX: fffffad34405e300 RCX: 0000000000000000 [ 1.760091] RDX: 0000000000000000 RSI: 0010000000000000 RDI: fffffad34405e340 [ 1.760091] RBP: 0000000033f3177e R08: fffffad34405e300 R09: 0000000000000002 [ 1.760091] R10: 000000000000002b R11: ffff98afb691a500 R12: 0000000000000002 [ 1.760091] R13: 0000000000000000 R14: 000000003f03ea00 R15: 000000003e10178c [ 1.760091] FS: 0000000000000000(0000) GS:ffff9c9ebeb00000(0000) knlGS:0000000000000000 [ 1.760091] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.760091] CR2: 00000000ffffffff CR3: 000000a1cf20a001 CR4: 00000000003606e0 [ 1.760091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1.760091] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1.760091] Call Trace: [ 1.760091] deferred_init_pages+0x8f/0xbf [ 1.760091] deferred_init_memmap+0x184/0x29d [ 1.760091] ? deferred_free_pages.isra.97+0xba/0xba [ 1.760091] kthread+0x112/0x130 [ 1.760091] ? kthread_flush_work_fn+0x10/0x10 [ 1.760091] ret_from_fork+0x35/0x40 [ 89.123011] node 0 initialised, 1055935372 pages in 88650ms Fixes: `3a2d7fa8a3` ("mm: disable interrupts while initializing deferred pages") Reported-by: Yiqian Wei <yiwei@redhat.com> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: David Hildenbrand <david@redhat.com> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: James Morris <jmorris@namei.org> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Sasha Levin <sashal@kernel.org> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [4.17+] Link: http://lkml.kernel.org/r/20200403140952.17177-4-pasha.tatashin@soleen.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:14 +02:00
Daniel Jordan	5386d93bc5	mm/pagealloc.c: call touch_nmi_watchdog() on max order boundaries in deferred init commit `117003c327` upstream. Patch series "initialize deferred pages with interrupts enabled", v4. Keep interrupts enabled during deferred page initialization in order to make code more modular and allow jiffies to update. Original approach, and discussion can be found here: http://lkml.kernel.org/r/20200311123848.118638-1-shile.zhang@linux.alibaba.com This patch (of 3): deferred_init_memmap() disables interrupts the entire time, so it calls touch_nmi_watchdog() periodically to avoid soft lockup splats. Soon it will run with interrupts enabled, at which point cond_resched() should be used instead. deferred_grow_zone() makes the same watchdog calls through code shared with deferred init but will continue to run with interrupts disabled, so it can't call cond_resched(). Pull the watchdog calls up to these two places to allow the first to be changed later, independently of the second. The frequency reduces from twice per pageblock (init and free) to once per max order block. Fixes: `3a2d7fa8a3` ("mm: disable interrupts while initializing deferred pages") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: James Morris <jmorris@namei.org> Cc: Sasha Levin <sashal@kernel.org> Cc: Yiqian Wei <yiwei@redhat.com> Cc: <stable@vger.kernel.org> [4.17+] Link: http://lkml.kernel.org/r/20200403140952.17177-2-pasha.tatashin@soleen.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:14 +02:00
Pavel Tatashin	c388f173ed	mm: initialize deferred pages with interrupts enabled commit `3d060856ad` upstream. Initializing struct pages is a long task and keeping interrupts disabled for the duration of this operation introduces a number of problems. 1. jiffies are not updated for long period of time, and thus incorrect time is reported. See proposed solution and discussion here: lkml/20200311123848.118638-1-shile.zhang@linux.alibaba.com 2. It prevents farther improving deferred page initialization by allowing intra-node multi-threading. We are keeping interrupts disabled to solve a rather theoretical problem that was never observed in real world (See `3a2d7fa8a3`). Let's keep interrupts enabled. In case we ever encounter a scenario where an interrupt thread wants to allocate large amount of memory this early in boot we can deal with that by growing zone (see deferred_grow_zone()) by the needed amount before starting deferred_init_memmap() threads. Before: [ 1.232459] node 0 initialised, 12058412 pages in 1ms After: [ 1.632580] node 0 initialised, 12051227 pages in 436ms Fixes: `3a2d7fa8a3` ("mm: disable interrupts while initializing deferred pages") Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: James Morris <jmorris@namei.org> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Sasha Levin <sashal@kernel.org> Cc: Yiqian Wei <yiwei@redhat.com> Cc: <stable@vger.kernel.org> [4.17+] Link: http://lkml.kernel.org/r/20200403140952.17177-3-pasha.tatashin@soleen.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:14 +02:00
Andrea Arcangeli	a88d8aaf9b	mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked() commit `c444eb564f` upstream. Write protect anon page faults require an accurate mapcount to decide if to break the COW or not. This is implemented in the THP path with reuse_swap_page() -> page_trans_huge_map_swapcount()/page_trans_huge_mapcount(). If the COW triggers while the other processes sharing the page are under a huge pmd split, to do an accurate reading, we must ensure the mapcount isn't computed while it's being transferred from the head page to the tail pages. reuse_swap_cache() already runs serialized by the page lock, so it's enough to add the page lock around __split_huge_pmd_locked too, in order to add the missing serialization. Note: the commit in "Fixes" is just to facilitate the backporting, because the code before such commit didn't try to do an accurate THP mapcount calculation and it instead used the page_count() to decide if to COW or not. Both the page_count and the pin_count are THP-wide refcounts, so they're inaccurate if used in reuse_swap_page(). Reverting such commit (besides the unrelated fix to the local anon_vma assignment) would have also opened the window for memory corruption side effects to certain workloads as documented in such commit header. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Suggested-by: Jann Horn <jannh@google.com> Reported-by: Jann Horn <jannh@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Fixes: `6d0a07edd1` ("mm: thp: calculate the mapcount correctly for THP pages during WP faults") Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:14 +02:00
Christophe Leroy	e418045e25	powerpc/mm: Fix conditions to perform MMU specific management by blocks on PPC32. commit `4e3319c23a` upstream. Setting init mem to NX shall depend on sinittext being mapped by block, not on stext being mapped by block. Setting text and rodata to RO shall depend on stext being mapped by block, not on sinittext being mapped by block. Fixes: `63b2bc6195` ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7d565fb8f51b18a3d98445a830b2f6548cb2da2a.1589866984.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:13 +02:00
Filipe Manana	0ccfd7a531	btrfs: fix space_info bytes_may_use underflow during space cache writeout commit `2166e5edce` upstream. We always preallocate a data extent for writing a free space cache, which causes writeback to always try the nocow path first, since the free space inode has the prealloc bit set in its flags. However if the block group that contains the data extent for the space cache has been turned to RO mode due to a running scrub or balance for example, we have to fallback to the cow path. In that case once a new data extent is allocated we end up calling btrfs_add_reserved_bytes(), which decrements the counter named bytes_may_use from the data space_info object with the expection that this counter was previously incremented with the same amount (the size of the data extent). However when we started writeout of the space cache at cache_save_setup(), we incremented the value of the bytes_may_use counter through a call to btrfs_check_data_free_space() and then decremented it through a call to btrfs_prealloc_file_range_trans() immediately after. So when starting the writeback if we fallback to cow mode we have to increment the counter bytes_may_use of the data space_info again to compensate for the extent allocation done by the cow path. When this issue happens we are incorrectly decrementing the bytes_may_use counter and when its current value is smaller then the amount we try to subtract we end up with the following warning: ------------[ cut here ]------------ WARNING: CPU: 3 PID: 657 at fs/btrfs/space-info.h:115 btrfs_add_reserved_bytes+0x3d6/0x4e0 [btrfs] Modules linked in: btrfs blake2b_generic xor raid6_pq libcrc32c (...) CPU: 3 PID: 657 Comm: kworker/u8:7 Tainted: G W 5.6.0-rc7-btrfs-next-58 #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: writeback wb_workfn (flush-btrfs-1591) RIP: 0010:btrfs_add_reserved_bytes+0x3d6/0x4e0 [btrfs] Code: ff ff 48 (...) RSP: 0000:ffffa41608f13660 EFLAGS: 00010287 RAX: 0000000000001000 RBX: ffff9615b93ae400 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff9615b96ab410 RBP: fffffffffffee000 R08: 0000000000000001 R09: 0000000000000000 R10: ffff961585e62a40 R11: 0000000000000000 R12: ffff9615b96ab400 R13: ffff9615a1a2a000 R14: 0000000000012000 R15: ffff9615b93ae400 FS: 0000000000000000(0000) GS:ffff9615bb200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055cbbc2ae178 CR3: 0000000115794006 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: find_free_extent+0x4a0/0x16c0 [btrfs] btrfs_reserve_extent+0x91/0x180 [btrfs] cow_file_range+0x12d/0x490 [btrfs] btrfs_run_delalloc_range+0x9f/0x6d0 [btrfs] ? find_lock_delalloc_range+0x221/0x250 [btrfs] writepage_delalloc+0xe8/0x150 [btrfs] __extent_writepage+0xe8/0x4c0 [btrfs] extent_write_cache_pages+0x237/0x530 [btrfs] extent_writepages+0x44/0xa0 [btrfs] do_writepages+0x23/0x80 __writeback_single_inode+0x59/0x700 writeback_sb_inodes+0x267/0x5f0 __writeback_inodes_wb+0x87/0xe0 wb_writeback+0x382/0x590 ? wb_workfn+0x4a2/0x6c0 wb_workfn+0x4a2/0x6c0 process_one_work+0x26d/0x6a0 worker_thread+0x4f/0x3e0 ? process_one_work+0x6a0/0x6a0 kthread+0x103/0x140 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x3a/0x50 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020 softirqs last enabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace bd7c03622e0b0a52 ]--- ------------[ cut here ]------------ So fix this by incrementing the bytes_may_use counter of the data space_info when we fallback to the cow path. If the cow path is successful the counter is decremented after extent allocation (by btrfs_add_reserved_bytes()), if it fails it ends up being decremented as well when clearing the delalloc range (extent_clear_unlock_delalloc()). This could be triggered sporadically by the test case btrfs/061 from fstests. Fixes: `82d5902d9c` ("Btrfs: Support reading/writing on disk free ino cache") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:13 +02:00
Filipe Manana	248cdf7288	btrfs: fix space_info bytes_may_use underflow after nocow buffered write commit `467dc47ea9` upstream. When doing a buffered write we always try to reserve data space for it, even when the file has the NOCOW bit set or the write falls into a file range covered by a prealloc extent. This is done both because it is expensive to check if we can do a nocow write (checking if an extent is shared through reflinks or if there's a hole in the range for example), and because when writeback starts we might actually need to fallback to COW mode (for example the block group containing the target extents was turned into RO mode due to a scrub or balance). When we are unable to reserve data space we check if we can do a nocow write, and if we can, we proceed with dirtying the pages and setting up the range for delalloc. In this case the bytes_may_use counter of the data space_info object is not incremented, unlike in the case where we are able to reserve data space (done through btrfs_check_data_free_space() which calls btrfs_alloc_data_chunk_ondemand()). Later when running delalloc we attempt to start writeback in nocow mode but we might revert back to cow mode, for example because in the meanwhile a block group was turned into RO mode by a scrub or relocation. The cow path after successfully allocating an extent ends up calling btrfs_add_reserved_bytes(), which expects the bytes_may_use counter of the data space_info object to have been incremented before - but we did not do it when the buffered write started, since there was not enough available data space. So btrfs_add_reserved_bytes() ends up decrementing the bytes_may_use counter anyway, and when the counter's current value is smaller then the size of the allocated extent we get a stack trace like the following: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 20138 at fs/btrfs/space-info.h:115 btrfs_add_reserved_bytes+0x3d6/0x4e0 [btrfs] Modules linked in: btrfs blake2b_generic xor raid6_pq libcrc32c (...) CPU: 0 PID: 20138 Comm: kworker/u8:15 Not tainted 5.6.0-rc7-btrfs-next-58 #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: writeback wb_workfn (flush-btrfs-1754) RIP: 0010:btrfs_add_reserved_bytes+0x3d6/0x4e0 [btrfs] Code: ff ff 48 (...) RSP: 0018:ffffbda18a4b3568 EFLAGS: 00010287 RAX: 0000000000000000 RBX: ffff9ca076f5d800 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff9ca068470410 RBP: fffffffffffff000 R08: 0000000000000001 R09: 0000000000000000 R10: ffff9ca079d58040 R11: 0000000000000000 R12: ffff9ca068470400 R13: ffff9ca0408b2000 R14: 0000000000001000 R15: ffff9ca076f5d800 FS: 0000000000000000(0000) GS:ffff9ca07a600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005605dbfe7048 CR3: 0000000138570006 CR4: 00000000003606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: find_free_extent+0x4a0/0x16c0 [btrfs] btrfs_reserve_extent+0x91/0x180 [btrfs] cow_file_range+0x12d/0x490 [btrfs] run_delalloc_nocow+0x341/0xa40 [btrfs] btrfs_run_delalloc_range+0x1ea/0x6d0 [btrfs] ? find_lock_delalloc_range+0x221/0x250 [btrfs] writepage_delalloc+0xe8/0x150 [btrfs] __extent_writepage+0xe8/0x4c0 [btrfs] extent_write_cache_pages+0x237/0x530 [btrfs] ? btrfs_wq_submit_bio+0x9f/0xc0 [btrfs] extent_writepages+0x44/0xa0 [btrfs] do_writepages+0x23/0x80 __writeback_single_inode+0x59/0x700 writeback_sb_inodes+0x267/0x5f0 __writeback_inodes_wb+0x87/0xe0 wb_writeback+0x382/0x590 ? wb_workfn+0x4a2/0x6c0 wb_workfn+0x4a2/0x6c0 process_one_work+0x26d/0x6a0 worker_thread+0x4f/0x3e0 ? process_one_work+0x6a0/0x6a0 kthread+0x103/0x140 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x3a/0x50 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff94ebdedf>] copy_process+0x74f/0x2020 softirqs last enabled at (0): [<ffffffff94ebdedf>] copy_process+0x74f/0x2020 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace f9f6ef8ec4cd8ec9 ]--- So to fix this, when falling back into cow mode check if space was not reserved, by testing for the bit EXTENT_NORESERVE in the respective file range, and if not, increment the bytes_may_use counter for the data space_info object. Also clear the EXTENT_NORESERVE bit from the range, so that if the cow path fails it decrements the bytes_may_use counter when clearing the delalloc range (through the btrfs_clear_delalloc_extent() callback). Fixes: `7ee9e4405f` ("Btrfs: check if we can nocow if we don't have data space") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:13 +02:00
Filipe Manana	8076bdd4fe	btrfs: fix wrong file range cleanup after an error filling dealloc range commit `e2c8e92d11` upstream. If an error happens while running dellaloc in COW mode for a range, we can end up calling extent_clear_unlock_delalloc() for a range that goes beyond our range's end offset by 1 byte, which affects 1 extra page. This results in clearing bits and doing page operations (such as a page unlock) outside our target range. Fix that by calling extent_clear_unlock_delalloc() with an inclusive end offset, instead of an exclusive end offset, at cow_file_range(). Fixes: `a315e68f6e` ("Btrfs: fix invalid attempt to free reserved space on failure to cow range") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:13 +02:00
Omar Sandoval	c2c69ecb60	btrfs: fix error handling when submitting direct I/O bio commit `6d3113a193` upstream. In btrfs_submit_direct_hook(), if a direct I/O write doesn't span a RAID stripe or chunk, we submit orig_bio without cloning it. In this case, we don't increment pending_bios. Then, if btrfs_submit_dio_bio() fails, we decrement pending_bios to -1, and we never complete orig_bio. Fix it by initializing pending_bios to 1 instead of incrementing later. Fixing this exposes another bug: we put orig_bio prematurely and then put it again from end_io. Fix it by not putting orig_bio. After this change, pending_bios is really more of a reference count, but I'll leave that cleanup separate to keep the fix small. Fixes: `e65e153554` ("btrfs: fix panic caused by direct IO") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:13 +02:00
Josef Bacik	05c5e98bf4	btrfs: force chunk allocation if our global rsv is larger than metadata commit `9c343784c4` upstream. Nikolay noticed a bunch of test failures with my global rsv steal patches. At first he thought they were introduced by them, but they've been failing for a while with 64k nodes. The problem is with 64k nodes we have a global reserve that calculates out to 13MiB on a freshly made file system, which only has 8MiB of metadata space. Because of changes I previously made we no longer account for the global reserve in the overcommit logic, which means we correctly allow overcommit to happen even though we are already overcommitted. However in some corner cases, for example btrfs/170, we will allocate the entire file system up with data chunks before we have enough space pressure to allocate a metadata chunk. Then once the fs is full we ENOSPC out because we cannot overcommit and the global reserve is taking up all of the available space. The most ideal way to deal with this is to change our space reservation stuff to take into account the height of the tree's that we're modifying, so that our global reserve calculation does not end up so obscenely large. However that is a huge undertaking. Instead fix this by forcing a chunk allocation if the global reserve is larger than the total metadata space. This gives us essentially the same behavior that happened before, we get a chunk allocated and these tests can pass. This is meant to be a stop-gap measure until we can tackle the "tree height only" project. Fixes: `0096420adb` ("btrfs: do not account global reserve in can_overcommit") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:13 +02:00
Marcos Paulo de Souza	f63545770f	btrfs: send: emit file capabilities after chown commit `89efda52e6` upstream. Whenever a chown is executed, all capabilities of the file being touched are lost. When doing incremental send with a file with capabilities, there is a situation where the capability can be lost on the receiving side. The sequence of actions bellow shows the problem: $ mount /dev/sda fs1 $ mount /dev/sdb fs2 $ touch fs1/foo.bar $ setcap cap_sys_nice+ep fs1/foo.bar $ btrfs subvolume snapshot -r fs1 fs1/snap_init $ btrfs send fs1/snap_init \| btrfs receive fs2 $ chgrp adm fs1/foo.bar $ setcap cap_sys_nice+ep fs1/foo.bar $ btrfs subvolume snapshot -r fs1 fs1/snap_complete $ btrfs subvolume snapshot -r fs1 fs1/snap_incremental $ btrfs send fs1/snap_complete \| btrfs receive fs2 $ btrfs send -p fs1/snap_init fs1/snap_incremental \| btrfs receive fs2 At this point, only a chown was emitted by "btrfs send" since only the group was changed. This makes the cap_sys_nice capability to be dropped from fs2/snap_incremental/foo.bar To fix that, only emit capabilities after chown is emitted. The current code first checks for xattrs that are new/changed, emits them, and later emit the chown. Now, __process_new_xattr skips capabilities, letting only finish_inode_if_needed to emit them, if they exist, for the inode being processed. This behavior was being worked around in "btrfs receive" side by caching the capability and only applying it after chown. Now, xattrs are only emmited _after_ chown, making that workaround not needed anymore. Link: https://github.com/kdave/btrfs-progs/issues/202 CC: stable@vger.kernel.org # 4.4+ Suggested-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:12 +02:00
Anand Jain	20f260ed53	btrfs: include non-missing as a qualifier for the latest_bdev commit `998a067196` upstream. btrfs_free_extra_devids() updates fs_devices::latest_bdev to point to the bdev with greatest device::generation number. For a typical-missing device the generation number is zero so fs_devices::latest_bdev will never point to it. But if the missing device is due to alienation [1], then device::generation is not zero and if it is greater or equal to the rest of device generations in the list, then fs_devices::latest_bdev ends up pointing to the missing device and reports the error like [2]. [1] We maintain devices of a fsid (as in fs_device::fsid) in the fs_devices::devices list, a device is considered as an alien device if its fsid does not match with the fs_device::fsid Consider a working filesystem with raid1: $ mkfs.btrfs -f -d raid1 -m raid1 /dev/sda /dev/sdb $ mount /dev/sda /mnt-raid1 $ umount /mnt-raid1 While mnt-raid1 was unmounted the user force-adds one of its devices to another btrfs filesystem: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt-single $ btrfs dev add -f /dev/sda /mnt-single Now the original mnt-raid1 fails to mount in degraded mode, because fs_devices::latest_bdev is pointing to the alien device. $ mount -o degraded /dev/sdb /mnt-raid1 [2] mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg \| tail or so. kernel: BTRFS warning (device sdb): devid 1 uuid 072a0192-675b-4d5a-8640-a5cf2b2c704d is missing kernel: BTRFS error (device sdb): failed to read devices kernel: BTRFS error (device sdb): open_ctree failed Fix the root cause by checking if the device is not missing before it can be considered for the fs_devices::latest_bdev. CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:12 +02:00
Anand Jain	fd9720b8e9	btrfs: free alien device after device add commit `7f551d9690` upstream. When an old device has new fsid through 'btrfs device add -f <dev>' our fs_devices list has an alien device in one of the fs_devices lists. By having an alien device in fs_devices, we have two issues so far 1. missing device does not not show as missing in the userland 2. degraded mount will fail Both issues are caused by the fact that there's an alien device in the fs_devices list. (Alien means that it does not belong to the filesystem, identified by fsid, or does not contain btrfs filesystem at all, eg. due to overwrite). A device can be scanned/added through the control device ioctls SCAN_DEV, DEVICES_READY or by ADD_DEV. And device coming through the control device is checked against the all other devices in the lists, but this was not the case for ADD_DEV. This patch fixes both issues above by removing the alien device. CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-22 09:31:12 +02:00
Daniel Axtens	b008ae4cc7	string.h: fix incompatibility between FORTIFY_SOURCE and KASAN [ Upstream commit `47227d27e2` ] The memcmp KASAN self-test fails on a kernel with both KASAN and FORTIFY_SOURCE. When FORTIFY_SOURCE is on, a number of functions are replaced with fortified versions, which attempt to check the sizes of the operands. However, these functions often directly invoke __builtin_foo() once they have performed the fortify check. Using __builtins may bypass KASAN checks if the compiler decides to inline it's own implementation as sequence of instructions, rather than emit a function call that goes out to a KASAN-instrumented implementation. Why is only memcmp affected? ============================ Of the string and string-like functions that kasan_test tests, only memcmp is replaced by an inline sequence of instructions in my testing on x86 with gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2). I believe this is due to compiler heuristics. For example, if I annotate kmalloc calls with the alloc_size annotation (and disable some fortify compile-time checking!), the compiler will replace every memset except the one in kmalloc_uaf_memset with inline instructions. (I have some WIP patches to add this annotation.) Does this affect other functions in string.h? ============================================= Yes. Anything that uses __builtin_* rather than __real_* could be affected. This looks like: - strncpy - strcat - strlen - strlcpy maybe, under some circumstances? - strncat under some circumstances - memset - memcpy - memmove - memcmp (as noted) - memchr - strcpy Whether a function call is emitted always depends on the compiler. Most bugs should get caught by FORTIFY_SOURCE, but the missed memcmp test shows that this is not always the case. Isn't FORTIFY_SOURCE disabled with KASAN? ========================================- The string headers on all arches supporting KASAN disable fortify with kasan, but only when address sanitisation is _also_ disabled. For example from x86: #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__) /* * For files that are not instrumented (e.g. mm/slub.c) we * should use not instrumented version of mem* functions. / #define memcpy(dst, src, len) __memcpy(dst, src, len) #define memmove(dst, src, len) __memmove(dst, src, len) #define memset(s, c, n) __memset(s, c, n) #ifndef __NO_FORTIFY #define __NO_FORTIFY / FORTIFY_SOURCE uses __builtin_memcpy, etc. / #endif #endif This comes from commit `6974f0c455` ("include/linux/string.h: add the option of fortified string.h functions"), and doesn't work when KASAN is enabled and the file is supposed to be sanitised - as with test_kasan.c I'm pretty sure this is not wrong, but not as expansive it should be: we shouldn't use __builtin_memcpy etc in files where we don't have instrumentation - it could devolve into a function call to memcpy, which will be instrumented. Rather, we should use __memcpy which by convention is not instrumented. * we also shouldn't be using __builtin_memcpy when we have a KASAN instrumented file, because it could be replaced with inline asm that will not be instrumented. What is correct behaviour? ========================== Firstly, there is some overlap between fortification and KASAN: both provide some level of _runtime_ checking. Only fortify provides compile-time checking. KASAN and fortify can pick up different things at runtime: - Some fortify functions, notably the string functions, could easily be modified to consider sub-object sizes (e.g. members within a struct), and I have some WIP patches to do this. KASAN cannot detect these because it cannot insert poision between members of a struct. - KASAN can detect many over-reads/over-writes when the sizes of both operands are unknown, which fortify cannot. So there are a couple of options: 1) Flip the test: disable fortify in santised files and enable it in unsanitised files. This at least stops us missing KASAN checking, but we lose the fortify checking. 2) Make the fortify code always call out to real versions. Do this only for KASAN, for fear of losing the inlining opportunities we get from __builtin_*. (We can't use kasan_check_{read,write}: because the fortify functions are _extern inline_, you can't include _static_ inline functions without a compiler warning. kasan_check_{read,write} are static inline so we can't use them even when they would otherwise be suitable.) Take approach 2 and call out to real versions when KASAN is enabled. Use __underlying_foo to distinguish from __real_foo: __real_foo always refers to the kernel's implementation of foo, __underlying_foo could be either the kernel implementation or the __builtin_foo implementation. This is sometimes enough to make the memcmp test succeed with FORTIFY_SOURCE enabled. It is at least enough to get the function call into the module. One more fix is needed to make it reliable: see the next patch. Fixes: `6974f0c455` ("include/linux/string.h: add the option of fortified string.h functions") Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: David Gow <davidgow@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Link: http://lkml.kernel.org/r/20200423154503.5103-3-dja@axtens.net Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:12 +02:00
Daniel Axtens	d6c2b4d246	kasan: stop tests being eliminated as dead code with FORTIFY_SOURCE [ Upstream commit `adb72ae191` ] Patch series "Fix some incompatibilites between KASAN and FORTIFY_SOURCE", v4. 3 KASAN self-tests fail on a kernel with both KASAN and FORTIFY_SOURCE: memchr, memcmp and strlen. When FORTIFY_SOURCE is on, a number of functions are replaced with fortified versions, which attempt to check the sizes of the operands. However, these functions often directly invoke __builtin_foo() once they have performed the fortify check. The compiler can detect that the results of these functions are not used, and knows that they have no other side effects, and so can eliminate them as dead code. Why are only memchr, memcmp and strlen affected? ================================================ Of string and string-like functions, kasan_test tests: * strchr -> not affected, no fortified version * strrchr -> likewise * strcmp -> likewise * strncmp -> likewise * strnlen -> not affected, the fortify source implementation calls the underlying strnlen implementation which is instrumented, not a builtin * strlen -> affected, the fortify souce implementation calls a __builtin version which the compiler can determine is dead. * memchr -> likewise * memcmp -> likewise * memset -> not affected, the compiler knows that memset writes to its first argument and therefore is not dead. Why does this not affect the functions normally? ================================================ In string.h, these functions are not marked as __pure, so the compiler cannot know that they do not have side effects. If relevant functions are marked as __pure in string.h, we see the following warnings and the functions are elided: lib/test_kasan.c: In function `kasan_memchr': lib/test_kasan.c:606:2: warning: statement with no effect [-Wunused-value] memchr(ptr, '1', size + 1); ^~~~~~~~~~~~~~~~~~~~~~~~~~ lib/test_kasan.c: In function `kasan_memcmp': lib/test_kasan.c:622:2: warning: statement with no effect [-Wunused-value] memcmp(ptr, arr, size+1); ^~~~~~~~~~~~~~~~~~~~~~~~ lib/test_kasan.c: In function `kasan_strings': lib/test_kasan.c:645:2: warning: statement with no effect [-Wunused-value] strchr(ptr, '1'); ^~~~~~~~~~~~~~~~ ... This annotation would make sense to add and could be added at any point, so the behaviour of test_kasan.c should change. The fix ======= Make all the functions that are pure write their results to a global, which makes them live. The strlen and memchr tests now pass. The memcmp test still fails to trigger, which is addressed in the next patch. [dja@axtens.net: drop patch 3] Link: http://lkml.kernel.org/r/20200424145521.8203-2-dja@axtens.net Fixes: `0c96350a2d` ("lib/test_kasan.c: add tests for several string/memory API functions") Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: David Gow <davidgow@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Link: http://lkml.kernel.org/r/20200423154503.5103-1-dja@axtens.net Link: http://lkml.kernel.org/r/20200423154503.5103-2-dja@axtens.net Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:12 +02:00
Jakub Sitnicki	c48a842d8c	selftests/bpf, flow_dissector: Close TAP device FD after the test [ Upstream commit `b8215dce7d` ] test_flow_dissector leaves a TAP device after it's finished, potentially interfering with other tests that will run after it. Fix it by closing the TAP descriptor on cleanup. Fixes: `0905beec9f` ("selftests/bpf: run flow dissector tests in skb-less mode") Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200531082846.2117903-11-jakub@cloudflare.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:12 +02:00
John Fastabend	e7b1564a24	bpf: Fix running sk_skb program types with ktls [ Upstream commit `e91de6afa8` ] KTLS uses a stream parser to collect TLS messages and send them to the upper layer tls receive handler. This ensures the tls receiver has a full TLS header to parse when it is run. However, when a socket has BPF_SK_SKB_STREAM_VERDICT program attached before KTLS is enabled we end up with two stream parsers running on the same socket. The result is both try to run on the same socket. First the KTLS stream parser runs and calls read_sock() which will tcp_read_sock which in turn calls tcp_rcv_skb(). This dequeues the skb from the sk_receive_queue. When this is done KTLS code then data_ready() callback which because we stacked KTLS on top of the bpf stream verdict program has been replaced with sk_psock_start_strp(). This will in turn kick the stream parser again and eventually do the same thing KTLS did above calling into tcp_rcv_skb() and dequeuing a skb from the sk_receive_queue. At this point the data stream is broke. Part of the stream was handled by the KTLS side some other bytes may have been handled by the BPF side. Generally this results in either missing data or more likely a "Bad Message" complaint from the kTLS receive handler as the BPF program steals some bytes meant to be in a TLS header and/or the TLS header length is no longer correct. We've already broke the idealized model where we can stack ULPs in any order with generic callbacks on the TX side to handle this. So in this patch we do the same thing but for RX side. We add a sk_psock_strp_enabled() helper so TLS can learn a BPF verdict program is running and add a tls_sw_has_ctx_rx() helper so BPF side can learn there is a TLS ULP on the socket. Then on BPF side we omit calling our stream parser to avoid breaking the data stream for the KTLS receiver. Then on the KTLS side we call BPF_SK_SKB_STREAM_VERDICT once the KTLS receiver is done with the packet but before it posts the msg to userspace. This gives us symmetry between the TX and RX halfs and IMO makes it usable again. On the TX side we process packets in this order BPF -> TLS -> TCP and on the receive side in the reverse order TCP -> TLS -> BPF. Discovered while testing OpenSSL 3.0 Alpha2.0 release. Fixes: `d829e9c411` ("tls: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/159079361946.5745.605854335665044485.stgit@john-Precision-5820-Tower Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:12 +02:00
John Fastabend	d9cd7b8394	bpf: Refactor sockmap redirect code so its easy to reuse [ Upstream commit `ca2f5f21db` ] We will need this block of code called from tls context shortly lets refactor the redirect logic so its easy to use. This also cleans up the switch stmt so we have fewer fallthrough cases. No logic changes are intended. Fixes: `d829e9c411` ("tls: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/159079360110.5745.7024009076049029819.stgit@john-Precision-5820-Tower Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:11 +02:00
Anton Protopopov	215a256bc8	bpf: Fix map permissions check [ Upstream commit `1ea0f9120c` ] The map_lookup_and_delete_elem() function should check for both FMODE_CAN_WRITE and FMODE_CAN_READ permissions because it returns a map element to user space. Fixes: `bd513cd08f` ("bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscall") Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200527185700.14658-5-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:11 +02:00
Eelco Chaudron	0d55b7032a	libbpf: Fix perf_buffer__free() API for sparse allocs [ Upstream commit `601b05ca6e` ] In case the cpu_bufs are sparsely allocated they are not all free'ed. These changes will fix this. Fixes: `fb84b82246` ("libbpf: add perf buffer API") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/159056888305.330763.9684536967379110349.stgit@ebuild Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:11 +02:00
Chris Chiu	98545815cf	platform/x86: asus_wmi: Reserve more space for struct bias_args [ Upstream commit `7b91f1565f` ] On the ASUS laptop UX325JA/UX425JA, most of the media keys are not working due to the ASUS WMI driver fails to be loaded. The ACPI error as follows leads to the failure of asus_wmi_evaluate_method. ACPI BIOS Error (bug): AE_AML_BUFFER_LIMIT, Field [IIA3] at bit offset/length 96/32 exceeds size of target Buffer (96 bits) (20200326/dsopcode-203) No Local Variables are initialized for Method [WMNB] ACPI Error: Aborting method \_SB.ATKD.WMNB due to previous error (AE_AML_BUFFER_LIMIT) (20200326/psparse-531) The DSDT for the WMNB part shows that 5 DWORD required for local variables and the 3rd variable IIA3 hit the buffer limit. Method (WMNB, 3, Serialized) { .. CreateDWordField (Arg2, Zero, IIA0) CreateDWordField (Arg2, 0x04, IIA1) CreateDWordField (Arg2, 0x08, IIA2) CreateDWordField (Arg2, 0x0C, IIA3) CreateDWordField (Arg2, 0x10, IIA4) Local0 = (Arg1 & 0xFFFFFFFF) If ((Local0 == 0x54494E49)) .. } The limitation is determined by the input acpi_buffer size passed to the wmi_evaluate_method. Since the struct bios_args is the data structure used as input buffer by default for all ASUS WMI calls, the size needs to be expanded to fix the problem. Signed-off-by: Chris Chiu <chiu@endlessm.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:11 +02:00
Hans de Goede	4383a5dfbd	platform/x86: intel-vbtn: Only blacklist SW_TABLET_MODE on the 9 / "Laptop" chasis-type [ Upstream commit `cfae58ed68` ] The HP Stream x360 11-p000nd no longer report SW_TABLET_MODE state / events with recent kernels. This model reports a chassis-type of 10 / "Notebook" which is not on the recently introduced chassis-type whitelist Commit `de9647efea` ("platform/x86: intel-vbtn: Only activate tablet mode switch on 2-in-1's") added a chassis-type whitelist and only listed 31 / "Convertible" as being capable of generating valid SW_TABLET_MOD events. Commit `1fac39fd03` ("platform/x86: intel-vbtn: Also handle tablet-mode switch on "Detachable" and "Portable" chassis-types") extended the whitelist with chassis-types 8 / "Portable" and 32 / "Detachable". And now we need to exten the whitelist again with 10 / "Notebook"... The issue original fixed by the whitelist is really a ACPI DSDT bug on the Dell XPS 9360 where it has a VGBS which reports it is in tablet mode even though it is not a 2-in-1 at all, but a regular laptop. So since this is a workaround for a DSDT issue on that specific model, instead of extending the whitelist over and over again, lets switch to a blacklist and only blacklist the chassis-type of the model for which the chassis-type check was added. Note this also fixes the current version of the code no longer checking if dmi_get_system_info(DMI_CHASSIS_TYPE) returns NULL. Fixes: `1fac39fd03` ("platform/x86: intel-vbtn: Also handle tablet-mode switch on "Detachable" and "Portable" chassis-types") Cc: Mario Limonciello <mario.limonciello@dell.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Mario Limonciello <Mario.limonciello@dell.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:11 +02:00
Nickolai Kozachenko	5f3cba4bc2	platform/x86: intel-hid: Add a quirk to support HP Spectre X2 (2015) [ Upstream commit `8fe63eb757` ] HEBC method reports capabilities of 5 button array but HP Spectre X2 (2015) does not have this control method (the same was for Wacom MobileStudio Pro). Expand previous DMI quirk by Alex Hung to also enable 5 button array for this system. Signed-off-by: Nickolai Kozachenko <daemongloom@gmail.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:11 +02:00
Andy Shevchenko	176396ad05	platform/x86: hp-wmi: Convert simple_strtoul() to kstrtou32() [ Upstream commit `5cdc45ed39` ] First of all, unsigned long can overflow u32 value on 64-bit machine. Second, simple_strtoul() doesn't check for overflow in the input. Convert simple_strtoul() to kstrtou32() to eliminate above issues. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:11 +02:00
Qiushi Wu	b77412359c	cpuidle: Fix three reference count leaks [ Upstream commit `c343bf1ba5` ] kobject_init_and_add() takes reference even when it fails. If this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Previous commit "b8eb718348b8" fixed a similar problem. Signed-off-by: Qiushi Wu <wu000273@umn.edu> [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:10 +02:00
Serge Semin	cf33598698	spi: dw: Return any value retrieved from the dma_transfer callback [ Upstream commit `f0410bbf7d` ] DW APB SSI DMA-part of the driver may need to perform the requested SPI-transfer synchronously. In that case the dma_transfer() callback will return 0 as a marker of the SPI transfer being finished so the SPI core doesn't need to wait and may proceed with the SPI message trasnfers pumping procedure. This will be needed to fix the problem when DMA transactions are finished, but there is still data left in the SPI Tx/Rx FIFOs being sent/received. But for now make dma_transfer to return 1 as the normal dw_spi_transfer_one() method. Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Cc: Georgy Vlasov <Georgy.Vlasov@baikalelectronics.ru> Cc: Ramil Zaripov <Ramil.Zaripov@baikalelectronics.ru> Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: linux-mips@vger.kernel.org Cc: devicetree@vger.kernel.org Link: https://lore.kernel.org/r/20200529131205.31838-3-Sergey.Semin@baikalelectronics.ru Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:10 +02:00
Haibo Chen	2c95fc879a	mmc: sdhci-esdhc-imx: fix the mask for tuning start point [ Upstream commit `1194be8c94` ] According the RM, the bit[6~0] of register ESDHC_TUNING_CTRL is TUNING_START_TAP, bit[7] of this register is to disable the command CRC check for standard tuning. So fix it here. Fixes: `d87fc96636` ("mmc: sdhci-esdhc-imx: support setting tuning start point") Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Link: https://lore.kernel.org/r/1590488522-9292-1-git-send-email-haibo.chen@nxp.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:10 +02:00
Sharon	7fe3a1c298	iwlwifi: mvm: fix aux station leak [ Upstream commit `f327236df2` ] When mvm is initialized we alloc aux station with aux queue. We later free the station memory when driver is stopped, but we never free the queue's memory, which casues a leak. Add a proper de-initialization of the station. Signed-off-by: Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20200529092401.0121c5be55e9.Id7516fbb3482131d0c9dfb51ff20b226617ddb49@changeid Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:10 +02:00
Xie XiuQi	ffa118a164	ixgbe: fix signed-integer-overflow warning [ Upstream commit `3b70683fc4` ] ubsan report this warning, fix it by adding a unsigned suffix. UBSAN: signed-integer-overflow in drivers/net/ethernet/intel/ixgbe/ixgbe_common.c:2246:26 65535 * 65537 cannot be represented in type 'int' CPU: 21 PID: 7 Comm: kworker/u256:0 Not tainted 5.7.0-rc3-debug+ #39 Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 03/27/2020 Workqueue: ixgbe ixgbe_service_task [ixgbe] Call trace: dump_backtrace+0x0/0x3f0 show_stack+0x28/0x38 dump_stack+0x154/0x1e4 ubsan_epilogue+0x18/0x60 handle_overflow+0xf8/0x148 __ubsan_handle_mul_overflow+0x34/0x48 ixgbe_fc_enable_generic+0x4d0/0x590 [ixgbe] ixgbe_service_task+0xc20/0x1f78 [ixgbe] process_one_work+0x8f0/0xf18 worker_thread+0x430/0x6d0 kthread+0x218/0x238 ret_from_fork+0x10/0x18 Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:10 +02:00
Jacob Keller	99ea968e37	ice: fix potential double free in probe unrolling [ Upstream commit `bc3a024101` ] If ice_init_interrupt_scheme fails, ice_probe will jump to clearing up the interrupts. This can lead to some static analysis tools such as the compiler sanitizers complaining about double free problems. Since ice_init_interrupt_scheme already unrolls internally on failure, there is no need to call ice_clear_interrupt_scheme when it fails. Add a new unroll label and use that instead. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:10 +02:00
Ulf Hansson	62b2fbb9c4	mmc: via-sdmmc: Respect the cmd->busy_timeout from the mmc core [ Upstream commit `966244ccd2` ] Using a fixed 1s timeout for all commands (and data transfers) is a bit problematic. For some commands it means waiting longer than needed for the timer to expire, which may not a big issue, but still. For other commands, like for an erase (CMD38) that uses a R1B response, may require longer timeouts than 1s. In these cases, we may end up treating the command as it failed, while it just needed some more time to complete successfully. Fix the problem by respecting the cmd->busy_timeout, which is provided by the mmc core. Cc: Bruce Chang <brucechang@via.com.tw> Cc: Harald Welte <HaraldWelte@viatech.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20200414161413.3036-17-ulf.hansson@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:09 +02:00
Ulf Hansson	3d6143663f	staging: greybus: sdio: Respect the cmd->busy_timeout from the mmc core [ Upstream commit `a389087ee9` ] Using a fixed 1s timeout for all commands is a bit problematic. For some commands it means waiting longer than needed for the timeout to expire, which may not a big issue, but still. For other commands, like for an erase (CMD38) that uses a R1B response, may require longer timeouts than 1s. In these cases, we may end up treating the command as it failed, while it just needed some more time to complete successfully. Fix the problem by respecting the cmd->busy_timeout, which is provided by the mmc core. Cc: Rui Miguel Silva <rmfrfs@gmail.com> Cc: Johan Hovold <johan@kernel.org> Cc: Alex Elder <elder@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: greybus-dev@lists.linaro.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rui Miguel Silva <rmfrfs@gmail.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20200414161413.3036-20-ulf.hansson@linaro.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:09 +02:00
Veerabhadrarao Badiganti	8a7c5b83f8	mmc: sdhci-msm: Set SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12 quirk [ Upstream commit `d863cb03fb` ] sdhci-msm can support auto cmd12. So enable SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12 quirk. Signed-off-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/1587363626-20413-3-git-send-email-vbadigan@codeaurora.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:09 +02:00
Coly Li	62e7e4f597	bcache: fix refcount underflow in bcache_device_free() [ Upstream commit `86da9f7367` ] The problematic code piece in bcache_device_free() is, 785 static void bcache_device_free(struct bcache_device d) 786 { 787 struct gendisk disk = d->disk; [snipped] 799 if (disk) { 800 if (disk->flags & GENHD_FL_UP) 801 del_gendisk(disk); 802 803 if (disk->queue) 804 blk_cleanup_queue(disk->queue); 805 806 ida_simple_remove(&bcache_device_idx, 807 first_minor_to_idx(disk->first_minor)); 808 put_disk(disk); 809 } [snipped] 816 } At line 808, put_disk(disk) may encounter kobject refcount of 'disk' being underflow. Here is how to reproduce the issue, - Attche the backing device to a cache device and do random write to make the cache being dirty. - Stop the bcache device while the cache device has dirty data of the backing device. - Only register the backing device back, NOT register cache device. - The bcache device node /dev/bcache0 won't show up, because backing device waits for the cache device shows up for the missing dirty data. - Now echo 1 into /sys/fs/bcache/pendings_cleanup, to stop the pending backing device. - After the pending backing device stopped, use 'dmesg' to check kernel message, a use-after-free warning from KASA reported the refcount of kobject linked to the 'disk' is underflow. The dropping refcount at line 808 in the above code piece is added by add_disk(d->disk) in bch_cached_dev_run(). But in the above condition the cache device is not registered, bch_cached_dev_run() has no chance to be called and the refcount is not added. The put_disk() for a non- added refcount of gendisk kobject triggers a underflow warning. This patch checks whether GENHD_FL_UP is set in disk->flags, if it is not set then the bcache device was not added, don't call put_disk() and the the underflow issue can be avoided. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:09 +02:00
YuanJunQing	d55960f7f6	MIPS: Fix IRQ tracing when call handle_fpe() and handle_msa_fpe() [ Upstream commit `31e1b3efa8` ] Register "a1" is unsaved in this function, when CONFIG_TRACE_IRQFLAGS is enabled, the TRACE_IRQS_OFF macro will call trace_hardirqs_off(), and this may change register "a1". The changed register "a1" as argument will be send to do_fpe() and do_msa_fpe(). Signed-off-by: YuanJunQing <yuanjunqing66@163.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:09 +02:00
Jiaxun Yang	3f6482c0a4	PCI: Don't disable decoding when mmio_always_on is set [ Upstream commit `b6caa1d8c8` ] Don't disable MEM/IO decoding when a device have both non_compliant_bars and mmio_always_on. That would allow us quirk devices with junk in BARs but can't disable their decoding. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Acked-by: Bjorn Helgaas <helgaas@kernel.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:09 +02:00
Alexander Sverdlin	fa99a4b3fb	macvlan: Skip loopback packets in RX handler [ Upstream commit `81f3dc9349` ] Ignore loopback-originatig packets soon enough and don't try to process L2 header where it doesn't exist. The very similar br_handle_frame() in bridge code performs exactly the same check. This is an example of such ICMPv6 packet: skb len=96 headroom=40 headlen=96 tailroom=56 mac=(40,0) net=(40,40) trans=80 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) csum(0xae2e9a2f ip_summed=1 complete_sw=0 valid=0 level=0) hash(0xc97ebd88 sw=1 l4=1) proto=0x86dd pkttype=5 iif=24 dev name=etha01.212 feat=0x0x0000000040005000 skb headroom: 00000000: 00 7c 86 52 84 88 ff ff 00 00 00 00 00 00 08 00 skb headroom: 00000010: 45 00 00 9e 5d 5c 40 00 40 11 33 33 00 00 00 01 skb headroom: 00000020: 02 40 43 80 00 00 86 dd skb linear: 00000000: 60 09 88 bd 00 38 3a ff fe 80 00 00 00 00 00 00 skb linear: 00000010: 00 40 43 ff fe 80 00 00 ff 02 00 00 00 00 00 00 skb linear: 00000020: 00 00 00 00 00 00 00 01 86 00 61 00 40 00 00 2d skb linear: 00000030: 00 00 00 00 00 00 00 00 03 04 40 e0 00 00 01 2c skb linear: 00000040: 00 00 00 78 00 00 00 00 fd 5f 42 68 23 87 a8 81 skb linear: 00000050: 00 00 00 00 00 00 00 00 01 01 02 40 43 80 00 00 skb tailroom: 00000000: ... skb tailroom: 00000010: ... skb tailroom: 00000020: ... skb tailroom: 00000030: ... Call Trace, how it happens exactly: ... macvlan_handle_frame+0x321/0x425 [macvlan] ? macvlan_forward_source+0x110/0x110 [macvlan] __netif_receive_skb_core+0x545/0xda0 ? enqueue_task_fair+0xe5/0x8e0 ? __netif_receive_skb_one_core+0x36/0x70 __netif_receive_skb_one_core+0x36/0x70 process_backlog+0x97/0x140 net_rx_action+0x1eb/0x350 ? __hrtimer_run_queues+0x136/0x2e0 __do_softirq+0xe3/0x383 do_softirq_own_stack+0x2a/0x40 </IRQ> do_softirq.part.4+0x4e/0x50 netif_rx_ni+0x60/0xd0 dev_loopback_xmit+0x83/0xf0 ip6_finish_output2+0x575/0x590 [ipv6] ? ip6_cork_release.isra.1+0x64/0x90 [ipv6] ? __ip6_make_skb+0x38d/0x680 [ipv6] ? ip6_output+0x6c/0x140 [ipv6] ip6_output+0x6c/0x140 [ipv6] ip6_send_skb+0x1e/0x60 [ipv6] rawv6_sendmsg+0xc4b/0xe10 [ipv6] ? proc_put_long+0xd0/0xd0 ? rw_copy_check_uvector+0x4e/0x110 ? sock_sendmsg+0x36/0x40 sock_sendmsg+0x36/0x40 ___sys_sendmsg+0x2b6/0x2d0 ? proc_dointvec+0x23/0x30 ? addrconf_sysctl_forward+0x8d/0x250 [ipv6] ? dev_forward_change+0x130/0x130 [ipv6] ? _raw_spin_unlock+0x12/0x30 ? proc_sys_call_handler.isra.14+0x9f/0x110 ? __call_rcu+0x213/0x510 ? get_max_files+0x10/0x10 ? trace_hardirqs_on+0x2c/0xe0 ? __sys_sendmsg+0x63/0xa0 __sys_sendmsg+0x63/0xa0 do_syscall_64+0x6c/0x1e0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:09 +02:00
Qu Wenruo	c6f1f12a8c	btrfs: qgroup: mark qgroup inconsistent if we're inherting snapshot to a new qgroup [ Upstream commit `cbab8ade58` ] [BUG] For the following operation, qgroup is guaranteed to be screwed up due to snapshot adding to a new qgroup: # mkfs.btrfs -f $dev # mount $dev $mnt # btrfs qgroup en $mnt # btrfs subv create $mnt/src # xfs_io -f -c "pwrite 0 1m" $mnt/src/file # sync # btrfs qgroup create 1/0 $mnt/src # btrfs subv snapshot -i 1/0 $mnt/src $mnt/snapshot # btrfs qgroup show -prce $mnt/src qgroupid rfer excl max_rfer max_excl parent child -------- ---- ---- -------- -------- ------ ----- 0/5 16.00KiB 16.00KiB none none --- --- 0/257 1.02MiB 16.00KiB none none --- --- 0/258 1.02MiB 16.00KiB none none 1/0 --- 1/0 0.00B 0.00B none none --- 0/258 ^^^^^^^^^^^^^^^^^^^^ [CAUSE] The problem is in btrfs_qgroup_inherit(), we don't have good enough check to determine if the new relation would break the existing accounting. Unlike btrfs_add_qgroup_relation(), which has proper check to determine if we can do quick update without a rescan, in btrfs_qgroup_inherit() we can even assign a snapshot to multiple qgroups. [FIX] Fix it by manually marking qgroup inconsistent for snapshot inheritance. For subvolume creation, since all its extents are exclusively owned, we don't need to rescan. In theory, we should call relation check like quick_update_accounting() when doing qgroup inheritance and inform user about qgroup accounting inconsistency. But we don't have good mechanism to relay that back to the user in the snapshot creation context, thus we can only silently mark the qgroup inconsistent. Anyway, user shouldn't use qgroup inheritance during snapshot creation, and should add qgroup relationship after snapshot creation by 'btrfs qgroup assign', which has a much better UI to inform user about qgroup inconsistent and kick in rescan automatically. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:09 +02:00
Josef Bacik	1e42a1857b	btrfs: improve global reserve stealing logic [ Upstream commit `7f9fe61440` ] For unlink transactions and block group removal btrfs_start_transaction_fallback_global_rsv will first try to start an ordinary transaction and if it fails it will fall back to reserving the required amount by stealing from the global reserve. This is problematic because of all the same reasons we had with previous iterations of the ENOSPC handling, thundering herd. We get a bunch of failures all at once, everybody tries to allocate from the global reserve, some win and some lose, we get an ENSOPC. Fix this behavior by introducing BTRFS_RESERVE_FLUSH_ALL_STEAL. It's used to mark unlink reservation. To fix this we need to integrate this logic into the normal ENOSPC infrastructure. We still go through all of the normal flushing work, and at the moment we begin to fail all the tickets we try to satisfy any tickets that are allowed to steal by stealing from the global reserve. If this works we start the flushing system over again just like we would with a normal ticket satisfaction. This serializes our global reserve stealing, so we don't have the thundering herd problem. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:08 +02:00
Finn Thain	590aad8835	m68k: mac: Don't call via_flush_cache() on Mac IIfx [ Upstream commit `bcc44f6b74` ] There is no VIA2 chip on the Mac IIfx, so don't call via_flush_cache(). This avoids a boot crash which appeared in v5.4. printk: console [ttyS0] enabled printk: bootconsole [debug0] disabled printk: bootconsole [debug0] disabled Calibrating delay loop... 9.61 BogoMIPS (lpj=48064) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear) Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear) devtmpfs: initialized random: get_random_u32 called from bucket_table_alloc.isra.27+0x68/0x194 with crng_init=0 clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns futex hash table entries: 256 (order: -1, 3072 bytes, linear) NET: Registered protocol family 16 Data read fault at 0x00000000 in Super Data (pc=0x8a6a) BAD KERNEL BUSERR Oops: 00000000 Modules linked in: PC: [<00008a6a>] via_flush_cache+0x12/0x2c SR: 2700 SP: 01c1fe3c a2: 01c24000 d0: 00001119 d1: 0000000c d2: 00012000 d3: 0000000f d4: 01c06840 d5: 00033b92 a0: 00000000 a1: 00000000 Process swapper (pid: 1, task=01c24000) Frame format=B ssw=0755 isc=0200 isb=fff7 daddr=00000000 dobuf=01c1fed0 baddr=00008a6e dibuf=0000004e ver=f Stack from 01c1fec4: 01c1fed0 00007d7e 00010080 01c1fedc 0000792e 00000001 01c1fef4 00006b40 01c80000 00040000 00000006 00000003 01c1ff1c 004a545e 004ff200 00040000 00000000 00000003 01c06840 00033b92 004a5410 004b6c88 01c1ff84 000021e2 00000073 00000003 01c06840 00033b92 0038507a 004bb094 004b6ca8 004b6c88 004b6ca4 004b6c88 000021ae 00020002 00000000 01c0685d 00000000 01c1ffb4 0049f938 00409c85 01c06840 0045bd40 00000073 00000002 00000002 00000000 Call Trace: [<00007d7e>] mac_cache_card_flush+0x12/0x1c [<00010080>] fix_dnrm+0x2/0x18 [<0000792e>] cache_push+0x46/0x5a [<00006b40>] arch_dma_prep_coherent+0x60/0x6e [<00040000>] switched_to_dl+0x76/0xd0 [<004a545e>] dma_atomic_pool_init+0x4e/0x188 [<00040000>] switched_to_dl+0x76/0xd0 [<00033b92>] parse_args+0x0/0x370 [<004a5410>] dma_atomic_pool_init+0x0/0x188 [<000021e2>] do_one_initcall+0x34/0x1be [<00033b92>] parse_args+0x0/0x370 [<0038507a>] strcpy+0x0/0x1e [<000021ae>] do_one_initcall+0x0/0x1be [<00020002>] do_proc_dointvec_conv+0x54/0x74 [<0049f938>] kernel_init_freeable+0x126/0x190 [<0049f94c>] kernel_init_freeable+0x13a/0x190 [<004a5410>] dma_atomic_pool_init+0x0/0x188 [<00041798>] complete+0x0/0x3c [<000b9b0c>] kfree+0x0/0x20a [<0038df98>] schedule+0x0/0xd0 [<0038d604>] kernel_init+0x0/0xda [<0038d610>] kernel_init+0xc/0xda [<0038d604>] kernel_init+0x0/0xda [<00002d38>] ret_from_kernel_thread+0xc/0x14 Code: 0000 2079 0048 10da 2279 0048 10c8 d3c8 <1011> 0200 fff7 1280 d1f9 0048 10c8 1010 0000 0008 1080 4e5e 4e75 4e56 0000 2039 Disabling lock debugging due to kernel taint Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Thanks to Stan Johnson for capturing the console log and running git bisect. Git bisect said commit `8e3a68fb55` ("dma-mapping: make dma_atomic_pool_init self-contained") is the first "bad" commit. I don't know why. Perhaps mach_l2_flush first became reachable with that commit. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-and-tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Cc: Joshua Thompson <funaho@jurai.org> Link: https://lore.kernel.org/r/b8bbeef197d6b3898e82ed0d231ad08f575a4b34.1589949122.git.fthain@telegraphics.com.au Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:08 +02:00
Kaige Li	ce066ce05e	MIPS: tools: Fix resource leak in elf-entry.c [ Upstream commit `f33a0b9410` ] There is a file descriptor resource leak in elf-entry.c, fix this by adding fclose() before return and die. Signed-off-by: Kaige Li <likaige@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:08 +02:00
Arvind Sankar	87ef5086a3	x86/mm: Stop printing BRK addresses [ Upstream commit `67d631b7c0` ] This currently leaks kernel physical addresses into userspace. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Link: https://lkml.kernel.org/r/20200229231120.1147527-1-nivedita@alum.mit.edu Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:08 +02:00
Alan Maguire	41b44325c9	selftests/bpf: CONFIG_IPV6_SEG6_BPF required for test_seg6_loop.o [ Upstream commit `3c8e8cf4b1` ] test_seg6_loop.o uses the helper bpf_lwt_seg6_adjust_srh(); it will not be present if CONFIG_IPV6_SEG6_BPF is not specified. Fixes: `b061017f8b` ("selftests/bpf: add realistic loop tests") Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1590147389-26482-2-git-send-email-alan.maguire@oracle.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:08 +02:00
Felix Kuehling	5b8d09eeb4	drm/amdgpu: Sync with VM root BO when switching VM to CPU update mode [ Upstream commit `90ca78deb0` ] This fixes an intermittent bug where a root PD clear operation still in progress could overwrite a PDE update done by the CPU, resulting in a VM fault. Fixes: `108b4d928c` ("drm/amd/amdgpu: Update VM function pointer") Reported-by: Jay Cornwall <Jay.Cornwall@amd.com> Tested-by: Jay Cornwall <Jay.Cornwall@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:08 +02:00
chen gong	b06a7dc9e5	drm/amd/powerpay: Disable gfxoff when setting manual mode on picasso and raven [ Upstream commit `cbd2d08c74` ] [Problem description] 1. Boot up picasso platform, launches desktop, Don't do anything (APU enter into "gfxoff" state) 2. Remote login to platform using SSH, then type the command line: sudo su -c "echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level" sudo su -c "echo 2 > /sys/class/drm/card0/device/pp_dpm_sclk" (fix SCLK to 1400MHz) 3. Move the mouse around in Window 4. Phenomenon : The screen frozen Tester will switch sclk level during glmark2 run time. APU will enter "gfxoff" state intermittently during glmark2 run time. The system got hanged if fix GFXCLK to 1400MHz when APU is in "gfxoff" state. [Debug] 1. Fix SCLK to X MHz 1400: screen frozen, screen black, then OS will reboot. 1300: screen frozen. 1200: screen frozen, screen black. 1100: screen frozen, screen black, then OS will reboot. 1000: screen frozen, screen black. 900: screen frozen, screen black, then OS will reboot. 800: Situation Nomal, issue disappear. 700: Situation Nomal, issue disappear. 2. SBIOS setting: AMD CBS --> SMU Debug Options -->SMU Debug --> "GFX DLDO Psm Margin Control": 50 : Situation Nomal, issue disappear. 45 : Situation Nomal, issue disappear. 40 : Situation Nomal, issue disappear. 35 : Situation Nomal, issue disappear. 30 : screen black. 25 : screen frozen, then blurred screen. 20 : screen frozen. 15 : screen black. 10 : screen frozen. 5 : screen frozen, then blurred screen. 3. Disable GFXOFF feature Situation Nomal, issue disappear. [Why] Through a period of time debugging with Sys Eng team and SMU team, Sys Eng team said this is voltage/frequency marginal issue not a F/W or H/W bug. This experiment proves that default targetPsm [for f=1400MHz] is not sufficient when GFXOFF is enabled on Picasso. SMU team think it is an odd test conditions to force sclk="1400MHz" when GPU is in "gfxoff" state，then wake up the GFX. SCLK should be in the "lowest frequency" when gfxoff. [How] Disable gfxoff when setting manual mode. Enable gfxoff when setting other mode(exiting manual mode) again. By the way, from the user point of view, now that user switch to manual mode and force SCLK Frequency, he don't want SCLK be controlled by workload.It becomes meaningless to "switch to manual mode" if APU enter "gfxoff" due to lack of workload at this point. Tips: Same issue observed on Raven. Signed-off-by: chen gong <curry.gong@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-06-22 09:31:08 +02:00

1 2 3 4 5 ...

879293 Commits (f387759c2d6732024c351eaccdc29533a5ddc13f) All Branches Search

879293 Commits (f387759c2d6732024c351eaccdc29533a5ddc13f)

All Branches