From 5a39fb2f22fd3d0a01b65ad76cb95ce17ed5727f Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Mon, 19 Oct 2020 21:38:25 +0100 Subject: [PATCH 001/151] drm/i915/gem: Flush coherency domains on first set-domain-ioctl MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 59dd13ad310793757e34afa489dd6fc8544fc3da ] Avoid skipping what appears to be a no-op set-domain-ioctl if the cache coherency state is inconsistent with our target domain. This also has the utility of using the population of the pages to validate the backing store. The danger in skipping the first set-domain is leaving the cache inconsistent and submitting stale data, or worse leaving the clean data in the cache and not flushing it to the GPU. The impact should be small as it requires a no-op set-domain as the very first ioctl in a particular sequence not found in typical userspace. Reported-by: Zbigniew Kempczyński Fixes: 754a25442705 ("drm/i915: Skip object locking around a no-op set-domain ioctl") Testcase: igt/gem_mmap_offset/blt-coherency Signed-off-by: Chris Wilson Cc: Joonas Lahtinen Cc: Matthew Auld Cc: Zbigniew Kempczyński Cc: # v5.2+ Reviewed-by: Matthew Auld Link: https://patchwork.freedesktop.org/patch/msgid/20201019203825.10966-1-chris@chris-wilson.co.uk (cherry picked from commit 44c2200afcd59f441b43f27829b4003397cc495d) Signed-off-by: Rodrigo Vivi Signed-off-by: Sasha Levin --- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 28 ++++++++++------------ 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 9c58e8fac1d9..a4b48c9abeac 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -605,21 +605,6 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, if (!obj) return -ENOENT; - /* - * Already in the desired write domain? Nothing for us to do! - * - * We apply a little bit of cunning here to catch a broader set of - * no-ops. If obj->write_domain is set, we must be in the same - * obj->read_domains, and only that domain. Therefore, if that - * obj->write_domain matches the request read_domains, we are - * already in the same read/write domain and can skip the operation, - * without having to further check the requested write_domain. - */ - if (READ_ONCE(obj->write_domain) == read_domains) { - err = 0; - goto out; - } - /* * Try to flush the object off the GPU without holding the lock. * We will repeat the flush holding the lock in the normal manner @@ -657,6 +642,19 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, if (err) goto out; + /* + * Already in the desired write domain? Nothing for us to do! + * + * We apply a little bit of cunning here to catch a broader set of + * no-ops. If obj->write_domain is set, we must be in the same + * obj->read_domains, and only that domain. Therefore, if that + * obj->write_domain matches the request read_domains, we are + * already in the same read/write domain and can skip the operation, + * without having to further check the requested write_domain. + */ + if (READ_ONCE(obj->write_domain) == read_domains) + goto out_unpin; + err = i915_gem_object_lock_interruptible(obj); if (err) goto out_unpin; From 160777b19b86bf28a1eec7cbfb4b5f7cece76b5e Mon Sep 17 00:00:00 2001 From: Zeng Tao Date: Tue, 1 Sep 2020 17:30:13 +0800 Subject: [PATCH 002/151] time: Prevent undefined behaviour in timespec64_to_ns() [ Upstream commit cb47755725da7b90fecbb2aa82ac3b24a7adb89b ] UBSAN reports: Undefined behaviour in ./include/linux/time64.h:127:27 signed integer overflow: 17179869187 * 1000000000 cannot be represented in type 'long long int' Call Trace: timespec64_to_ns include/linux/time64.h:127 [inline] set_cpu_itimer+0x65c/0x880 kernel/time/itimer.c:180 do_setitimer+0x8e/0x740 kernel/time/itimer.c:245 __x64_sys_setitimer+0x14c/0x2c0 kernel/time/itimer.c:336 do_syscall_64+0xa1/0x540 arch/x86/entry/common.c:295 Commit bd40a175769d ("y2038: itimer: change implementation to timespec64") replaced the original conversion which handled time clamping correctly with timespec64_to_ns() which has no overflow protection. Fix it in timespec64_to_ns() as this is not necessarily limited to the usage in itimers. [ tglx: Added comment and adjusted the fixes tag ] Fixes: 361a3bf00582 ("time64: Add time64.h header and define struct timespec64") Signed-off-by: Zeng Tao Signed-off-by: Thomas Gleixner Reviewed-by: Arnd Bergmann Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1598952616-6416-1-git-send-email-prime.zeng@hisilicon.com Signed-off-by: Sasha Levin --- include/linux/time64.h | 4 ++++ kernel/time/itimer.c | 4 ---- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/time64.h b/include/linux/time64.h index 19125489ae94..5eab3f263518 100644 --- a/include/linux/time64.h +++ b/include/linux/time64.h @@ -132,6 +132,10 @@ static inline bool timespec64_valid_settod(const struct timespec64 *ts) */ static inline s64 timespec64_to_ns(const struct timespec64 *ts) { + /* Prevent multiplication overflow */ + if ((unsigned long long)ts->tv_sec >= KTIME_SEC_MAX) + return KTIME_MAX; + return ((s64) ts->tv_sec * NSEC_PER_SEC) + ts->tv_nsec; } diff --git a/kernel/time/itimer.c b/kernel/time/itimer.c index 77f1e5635cc1..62dc9757118c 100644 --- a/kernel/time/itimer.c +++ b/kernel/time/itimer.c @@ -147,10 +147,6 @@ static void set_cpu_itimer(struct task_struct *tsk, unsigned int clock_id, u64 oval, nval, ointerval, ninterval; struct cpu_itimer *it = &tsk->signal->it[clock_id]; - /* - * Use the to_ktime conversion because that clamps the maximum - * value to KTIME_MAX and avoid multiplication overflows. - */ nval = ktime_to_ns(timeval_to_ktime(value->it_value)); ninterval = ktime_to_ns(timeval_to_ktime(value->it_interval)); From 95fda70d39555594d31311f365f99581883c2a67 Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Wed, 28 Oct 2020 15:24:34 +0800 Subject: [PATCH 003/151] nbd: don't update block size after device is started [ Upstream commit b40813ddcd6bf9f01d020804e4cb8febc480b9e4 ] Mounted NBD device can be resized, one use case is rbd-nbd. Fix the issue by setting up default block size, then not touch it in nbd_size_update() any more. This kind of usage is aligned with loop which has same use case too. Cc: stable@vger.kernel.org Fixes: c8a83a6b54d0 ("nbd: Use set_blocksize() to set device blocksize") Reported-by: lining Signed-off-by: Ming Lei Cc: Josef Bacik Cc: Jan Kara Tested-by: lining Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin --- drivers/block/nbd.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 742f8160b6e2..62a873718b5b 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -296,7 +296,7 @@ static void nbd_size_clear(struct nbd_device *nbd) } } -static void nbd_size_update(struct nbd_device *nbd) +static void nbd_size_update(struct nbd_device *nbd, bool start) { struct nbd_config *config = nbd->config; struct block_device *bdev = bdget_disk(nbd->disk, 0); @@ -312,7 +312,8 @@ static void nbd_size_update(struct nbd_device *nbd) if (bdev) { if (bdev->bd_disk) { bd_set_size(bdev, config->bytesize); - set_blocksize(bdev, config->blksize); + if (start) + set_blocksize(bdev, config->blksize); } else bdev->bd_invalidated = 1; bdput(bdev); @@ -327,7 +328,7 @@ static void nbd_size_set(struct nbd_device *nbd, loff_t blocksize, config->blksize = blocksize; config->bytesize = blocksize * nr_blocks; if (nbd->task_recv != NULL) - nbd_size_update(nbd); + nbd_size_update(nbd, false); } static void nbd_complete_rq(struct request *req) @@ -1293,7 +1294,7 @@ static int nbd_start_device(struct nbd_device *nbd) args->index = i; queue_work(nbd->recv_workq, &args->work); } - nbd_size_update(nbd); + nbd_size_update(nbd, true); return error; } From 9dfbc2f82ac82eb7398ae2d0c125a52e525edbc6 Mon Sep 17 00:00:00 2001 From: Santosh Shukla Date: Mon, 26 Oct 2020 16:54:07 +0530 Subject: [PATCH 004/151] KVM: arm64: Force PTE mapping on fault resulting in a device mapping [ Upstream commit 91a2c34b7d6fadc9c5d9433c620ea4c32ee7cae8 ] VFIO allows a device driver to resolve a fault by mapping a MMIO range. This can be subsequently result in user_mem_abort() to try and compute a huge mapping based on the MMIO pfn, which is a sure recipe for things to go wrong. Instead, force a PTE mapping when the pfn faulted in has a device mapping. Fixes: 6d674e28f642 ("KVM: arm/arm64: Properly handle faulting of device mappings") Suggested-by: Marc Zyngier Signed-off-by: Santosh Shukla [maz: rewritten commit message] Signed-off-by: Marc Zyngier Reviewed-by: Gavin Shan Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1603711447-11998-2-git-send-email-sashukla@nvidia.com Signed-off-by: Sasha Levin --- virt/kvm/arm/mmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 8700402f3000..03a586ab6d27 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1756,6 +1756,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (kvm_is_device_pfn(pfn)) { mem_type = PAGE_S2_DEVICE; flags |= KVM_S2PTE_FLAG_IS_IOMAP; + force_pte = true; } else if (logging_active) { /* * Faults on pages in a memslot with logging enabled From 504cfb5e3bcae73f2fe7d62cbb457a4eb6d5c2a5 Mon Sep 17 00:00:00 2001 From: Ansuel Smith Date: Tue, 1 Sep 2020 14:49:54 +0200 Subject: [PATCH 005/151] PCI: qcom: Make sure PCIe is reset before init for rev 2.1.0 [ Upstream commit d3d4d028afb785e52c55024d779089654f8302e7 ] Qsdk U-Boot can incorrectly leave the PCIe interface in an undefined state if bootm command is used instead of bootipq. This is caused by the not deinit of PCIe when bootm is called. Reset the PCIe before init anyway to fix this U-Boot bug. Link: https://lore.kernel.org/r/20200901124955.137-1-ansuelsmth@gmail.com Fixes: 82a823833f4e ("PCI: qcom: Add Qualcomm PCIe controller driver") Signed-off-by: Ansuel Smith Signed-off-by: Lorenzo Pieralisi Reviewed-by: Bjorn Andersson Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Sasha Levin --- drivers/pci/controller/dwc/pcie-qcom.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c index 374db5d59cf8..14196c0287a2 100644 --- a/drivers/pci/controller/dwc/pcie-qcom.c +++ b/drivers/pci/controller/dwc/pcie-qcom.c @@ -303,6 +303,9 @@ static void qcom_pcie_deinit_2_1_0(struct qcom_pcie *pcie) clk_disable_unprepare(res->core_clk); clk_disable_unprepare(res->aux_clk); clk_disable_unprepare(res->ref_clk); + + writel(1, pcie->parf + PCIE20_PARF_PHY_CTRL); + regulator_bulk_disable(ARRAY_SIZE(res->supplies), res->supplies); } @@ -315,6 +318,16 @@ static int qcom_pcie_init_2_1_0(struct qcom_pcie *pcie) u32 val; int ret; + /* reset the PCIe interface as uboot can leave it undefined state */ + reset_control_assert(res->pci_reset); + reset_control_assert(res->axi_reset); + reset_control_assert(res->ahb_reset); + reset_control_assert(res->por_reset); + reset_control_assert(res->ext_reset); + reset_control_assert(res->phy_reset); + + writel(1, pcie->parf + PCIE20_PARF_PHY_CTRL); + ret = regulator_bulk_enable(ARRAY_SIZE(res->supplies), res->supplies); if (ret < 0) { dev_err(dev, "cannot enable regulators\n"); From ab031673e2abcbc298b73ae69f28c60dd11ae4a9 Mon Sep 17 00:00:00 2001 From: Thinh Nguyen Date: Tue, 31 Mar 2020 01:40:42 -0700 Subject: [PATCH 006/151] usb: dwc3: gadget: Continue to process pending requests [ Upstream commit d9feef974e0d8cb6842533c92476a1b32a41ba31 ] If there are still pending requests because no TRB was available, prepare more when started requests are completed. Introduce dwc3_gadget_ep_should_continue() to check for incomplete and pending requests to resume updating new TRBs to the controller's TRB cache. Signed-off-by: Thinh Nguyen Signed-off-by: Felipe Balbi Signed-off-by: Sasha Levin --- drivers/usb/dwc3/gadget.c | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 1d65de84464d..b6a454211329 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -2644,10 +2644,8 @@ static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep, req->request.actual = req->request.length - req->remaining; - if (!dwc3_gadget_ep_request_completed(req)) { - __dwc3_gadget_kick_transfer(dep); + if (!dwc3_gadget_ep_request_completed(req)) goto out; - } dwc3_gadget_giveback(dep, req, status); @@ -2671,6 +2669,24 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep, } } +static bool dwc3_gadget_ep_should_continue(struct dwc3_ep *dep) +{ + struct dwc3_request *req; + + if (!list_empty(&dep->pending_list)) + return true; + + /* + * We only need to check the first entry of the started list. We can + * assume the completed requests are removed from the started list. + */ + req = next_request(&dep->started_list); + if (!req) + return false; + + return !dwc3_gadget_ep_request_completed(req); +} + static void dwc3_gadget_endpoint_frame_from_event(struct dwc3_ep *dep, const struct dwc3_event_depevt *event) { @@ -2700,6 +2716,8 @@ static void dwc3_gadget_endpoint_transfer_in_progress(struct dwc3_ep *dep, if (stop) dwc3_stop_active_transfer(dep, true, true); + else if (dwc3_gadget_ep_should_continue(dep)) + __dwc3_gadget_kick_transfer(dep); /* * WORKAROUND: This is the 2nd half of U1/U2 -> U0 workaround. From e24516cf62f9c2489ee5bc13dce8ecae603919e2 Mon Sep 17 00:00:00 2001 From: Thinh Nguyen Date: Thu, 24 Sep 2020 01:21:24 -0700 Subject: [PATCH 007/151] usb: dwc3: gadget: Reclaim extra TRBs after request completion [ Upstream commit 690e5c2dc29f8891fcfd30da67e0d5837c2c9df5 ] An SG request may be partially completed (due to no available TRBs). Don't reclaim extra TRBs and clear the needs_extra_trb flag until the request is fully completed. Otherwise, the driver will reclaim the wrong TRB. Cc: stable@vger.kernel.org Fixes: 1f512119a08c ("usb: dwc3: gadget: add remaining sg entries to ring") Signed-off-by: Thinh Nguyen Signed-off-by: Felipe Balbi Signed-off-by: Sasha Levin --- drivers/usb/dwc3/gadget.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index b6a454211329..9269cda4c183 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -2627,6 +2627,11 @@ static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep, ret = dwc3_gadget_ep_reclaim_trb_linear(dep, req, event, status); + req->request.actual = req->request.length - req->remaining; + + if (!dwc3_gadget_ep_request_completed(req)) + goto out; + if (req->needs_extra_trb) { unsigned int maxp = usb_endpoint_maxp(dep->endpoint.desc); @@ -2642,11 +2647,6 @@ static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep, req->needs_extra_trb = false; } - req->request.actual = req->request.length - req->remaining; - - if (!dwc3_gadget_ep_request_completed(req)) - goto out; - dwc3_gadget_giveback(dep, req, status); out: From c58fa93b1409185bb6fbe9ba4ce8873c8508c1f3 Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Tue, 28 Jul 2020 09:42:49 +0800 Subject: [PATCH 008/151] btrfs: tracepoints: output proper root owner for trace_find_free_extent() The current trace event always output result like this: find_free_extent: root=2(EXTENT_TREE) len=16384 empty_size=0 flags=4(METADATA) find_free_extent: root=2(EXTENT_TREE) len=16384 empty_size=0 flags=4(METADATA) find_free_extent: root=2(EXTENT_TREE) len=8192 empty_size=0 flags=1(DATA) find_free_extent: root=2(EXTENT_TREE) len=8192 empty_size=0 flags=1(DATA) find_free_extent: root=2(EXTENT_TREE) len=4096 empty_size=0 flags=1(DATA) find_free_extent: root=2(EXTENT_TREE) len=4096 empty_size=0 flags=1(DATA) T's saying we're allocating data extent for EXTENT tree, which is not even possible. It's because we always use EXTENT tree as the owner for trace_find_free_extent() without using the @root from btrfs_reserve_extent(). This patch will change the parameter to use proper @root for trace_find_free_extent(): Now it looks much better: find_free_extent: root=5(FS_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP) find_free_extent: root=5(FS_TREE) len=8192 empty_size=0 flags=1(DATA) find_free_extent: root=5(FS_TREE) len=16384 empty_size=0 flags=1(DATA) find_free_extent: root=5(FS_TREE) len=4096 empty_size=0 flags=1(DATA) find_free_extent: root=5(FS_TREE) len=8192 empty_size=0 flags=1(DATA) find_free_extent: root=5(FS_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP) find_free_extent: root=7(CSUM_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP) find_free_extent: root=2(EXTENT_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP) find_free_extent: root=1(ROOT_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP) Reported-by: Hans van Kranenburg CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 7 ++++--- include/trace/events/btrfs.h | 10 ++++++---- 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 388449101705..c6d9e8c07c23 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3800,11 +3800,12 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info, * |- Push harder to find free extents * |- If not found, re-iterate all block groups */ -static noinline int find_free_extent(struct btrfs_fs_info *fs_info, +static noinline int find_free_extent(struct btrfs_root *root, u64 ram_bytes, u64 num_bytes, u64 empty_size, u64 hint_byte, struct btrfs_key *ins, u64 flags, int delalloc) { + struct btrfs_fs_info *fs_info = root->fs_info; int ret = 0; int cache_block_group_error = 0; struct btrfs_free_cluster *last_ptr = NULL; @@ -3833,7 +3834,7 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info, ins->objectid = 0; ins->offset = 0; - trace_find_free_extent(fs_info, num_bytes, empty_size, flags); + trace_find_free_extent(root, num_bytes, empty_size, flags); space_info = btrfs_find_space_info(fs_info, flags); if (!space_info) { @@ -4141,7 +4142,7 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, flags = get_alloc_profile_by_root(root, is_data); again: WARN_ON(num_bytes < fs_info->sectorsize); - ret = find_free_extent(fs_info, ram_bytes, num_bytes, empty_size, + ret = find_free_extent(root, ram_bytes, num_bytes, empty_size, hint_byte, ins, flags, delalloc); if (!ret && !is_data) { btrfs_dec_block_group_reservations(fs_info, ins->objectid); diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index 75ae1899452b..94a3adb65b8a 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -1159,25 +1159,27 @@ DEFINE_EVENT(btrfs__reserved_extent, btrfs_reserved_extent_free, TRACE_EVENT(find_free_extent, - TP_PROTO(const struct btrfs_fs_info *fs_info, u64 num_bytes, + TP_PROTO(const struct btrfs_root *root, u64 num_bytes, u64 empty_size, u64 data), - TP_ARGS(fs_info, num_bytes, empty_size, data), + TP_ARGS(root, num_bytes, empty_size, data), TP_STRUCT__entry_btrfs( + __field( u64, root_objectid ) __field( u64, num_bytes ) __field( u64, empty_size ) __field( u64, data ) ), - TP_fast_assign_btrfs(fs_info, + TP_fast_assign_btrfs(root->fs_info, + __entry->root_objectid = root->root_key.objectid; __entry->num_bytes = num_bytes; __entry->empty_size = empty_size; __entry->data = data; ), TP_printk_btrfs("root=%llu(%s) len=%llu empty_size=%llu flags=%llu(%s)", - show_root_type(BTRFS_EXTENT_TREE_OBJECTID), + show_root_type(__entry->root_objectid), __entry->num_bytes, __entry->empty_size, __entry->data, __print_flags((unsigned long)__entry->data, "|", BTRFS_GROUP_FLAGS)) From 0ee771e96954fa7ea59466b35a765a7965a66753 Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Tue, 1 Sep 2020 08:09:01 -0400 Subject: [PATCH 009/151] btrfs: sysfs: init devices outside of the chunk_mutex [ Upstream commit ca10845a56856fff4de3804c85e6424d0f6d0cde ] While running btrfs/061, btrfs/073, btrfs/078, or btrfs/178 we hit the following lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 5.9.0-rc3+ #4 Not tainted ------------------------------------------------------ kswapd0/100 is trying to acquire lock: ffff96ecc22ef4a0 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x330 but task is already holding lock: ffffffff8dd74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (fs_reclaim){+.+.}-{0:0}: fs_reclaim_acquire+0x65/0x80 slab_pre_alloc_hook.constprop.0+0x20/0x200 kmem_cache_alloc+0x37/0x270 alloc_inode+0x82/0xb0 iget_locked+0x10d/0x2c0 kernfs_get_inode+0x1b/0x130 kernfs_get_tree+0x136/0x240 sysfs_get_tree+0x16/0x40 vfs_get_tree+0x28/0xc0 path_mount+0x434/0xc00 __x64_sys_mount+0xe3/0x120 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #2 (kernfs_mutex){+.+.}-{3:3}: __mutex_lock+0x7e/0x7e0 kernfs_add_one+0x23/0x150 kernfs_create_link+0x63/0xa0 sysfs_do_create_link_sd+0x5e/0xd0 btrfs_sysfs_add_devices_dir+0x81/0x130 btrfs_init_new_device+0x67f/0x1250 btrfs_ioctl+0x1ef/0x2e20 __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}: __mutex_lock+0x7e/0x7e0 btrfs_chunk_alloc+0x125/0x3a0 find_free_extent+0xdf6/0x1210 btrfs_reserve_extent+0xb3/0x1b0 btrfs_alloc_tree_block+0xb0/0x310 alloc_tree_block_no_bg_flush+0x4a/0x60 __btrfs_cow_block+0x11a/0x530 btrfs_cow_block+0x104/0x220 btrfs_search_slot+0x52e/0x9d0 btrfs_insert_empty_items+0x64/0xb0 btrfs_insert_delayed_items+0x90/0x4f0 btrfs_commit_inode_delayed_items+0x93/0x140 btrfs_log_inode+0x5de/0x2020 btrfs_log_inode_parent+0x429/0xc90 btrfs_log_new_name+0x95/0x9b btrfs_rename2+0xbb9/0x1800 vfs_rename+0x64f/0x9f0 do_renameat2+0x320/0x4e0 __x64_sys_rename+0x1f/0x30 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&delayed_node->mutex){+.+.}-{3:3}: __lock_acquire+0x119c/0x1fc0 lock_acquire+0xa7/0x3d0 __mutex_lock+0x7e/0x7e0 __btrfs_release_delayed_node.part.0+0x3f/0x330 btrfs_evict_inode+0x24c/0x500 evict+0xcf/0x1f0 dispose_list+0x48/0x70 prune_icache_sb+0x44/0x50 super_cache_scan+0x161/0x1e0 do_shrink_slab+0x178/0x3c0 shrink_slab+0x17c/0x290 shrink_node+0x2b2/0x6d0 balance_pgdat+0x30a/0x670 kswapd+0x213/0x4c0 kthread+0x138/0x160 ret_from_fork+0x1f/0x30 other info that might help us debug this: Chain exists of: &delayed_node->mutex --> kernfs_mutex --> fs_reclaim Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(kernfs_mutex); lock(fs_reclaim); lock(&delayed_node->mutex); *** DEADLOCK *** 3 locks held by kswapd0/100: #0: ffffffff8dd74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30 #1: ffffffff8dd65c50 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x115/0x290 #2: ffff96ed2ade30e0 (&type->s_umount_key#36){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0 stack backtrace: CPU: 0 PID: 100 Comm: kswapd0 Not tainted 5.9.0-rc3+ #4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x8b/0xb8 check_noncircular+0x12d/0x150 __lock_acquire+0x119c/0x1fc0 lock_acquire+0xa7/0x3d0 ? __btrfs_release_delayed_node.part.0+0x3f/0x330 __mutex_lock+0x7e/0x7e0 ? __btrfs_release_delayed_node.part.0+0x3f/0x330 ? __btrfs_release_delayed_node.part.0+0x3f/0x330 ? lock_acquire+0xa7/0x3d0 ? find_held_lock+0x2b/0x80 __btrfs_release_delayed_node.part.0+0x3f/0x330 btrfs_evict_inode+0x24c/0x500 evict+0xcf/0x1f0 dispose_list+0x48/0x70 prune_icache_sb+0x44/0x50 super_cache_scan+0x161/0x1e0 do_shrink_slab+0x178/0x3c0 shrink_slab+0x17c/0x290 shrink_node+0x2b2/0x6d0 balance_pgdat+0x30a/0x670 kswapd+0x213/0x4c0 ? _raw_spin_unlock_irqrestore+0x41/0x50 ? add_wait_queue_exclusive+0x70/0x70 ? balance_pgdat+0x670/0x670 kthread+0x138/0x160 ? kthread_create_worker_on_cpu+0x40/0x40 ret_from_fork+0x1f/0x30 This happens because we are holding the chunk_mutex at the time of adding in a new device. However we only need to hold the device_list_mutex, as we're going to iterate over the fs_devices devices. Move the sysfs init stuff outside of the chunk_mutex to get rid of this lockdep splat. CC: stable@vger.kernel.org # 4.4.x: f3cd2c58110dad14e: btrfs: sysfs, rename device_link add/remove functions CC: stable@vger.kernel.org # 4.4.x Reported-by: David Sterba Signed-off-by: Josef Bacik Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Sasha Levin --- fs/btrfs/volumes.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 58910a0a3e4a..00e7816dc9a1 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2728,9 +2728,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path btrfs_set_super_num_devices(fs_info->super_copy, orig_super_num_devices + 1); - /* add sysfs device entry */ - btrfs_sysfs_add_device_link(fs_devices, device); - /* * we've got more storage, clear any full flags on the space * infos @@ -2738,6 +2735,10 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path btrfs_clear_space_info_full(fs_info); mutex_unlock(&fs_info->chunk_mutex); + + /* Add sysfs device entry */ + btrfs_sysfs_add_device_link(fs_devices, device); + mutex_unlock(&fs_devices->device_list_mutex); if (seeding_dev) { From a8ec66026dd816e49bf84b1101941d747d498dad Mon Sep 17 00:00:00 2001 From: Johannes Thumshirn Date: Tue, 22 Sep 2020 17:27:29 +0900 Subject: [PATCH 010/151] btrfs: reschedule when cloning lots of extents [ Upstream commit 6b613cc97f0ace77f92f7bc112b8f6ad3f52baf8 ] We have several occurrences of a soft lockup from fstest's generic/175 testcase, which look more or less like this one: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [xfs_io:10030] Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 10030 Comm: xfs_io Tainted: G L 5.9.0-rc5+ #768 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014 Call Trace: dump_stack+0x77/0xa0 panic+0xfa/0x2cb watchdog_timer_fn.cold+0x85/0xa5 ? lockup_detector_update_enable+0x50/0x50 __hrtimer_run_queues+0x99/0x4c0 ? recalibrate_cpu_khz+0x10/0x10 hrtimer_run_queues+0x9f/0xb0 update_process_times+0x28/0x80 tick_handle_periodic+0x1b/0x60 __sysvec_apic_timer_interrupt+0x76/0x210 asm_call_on_stack+0x12/0x20 sysvec_apic_timer_interrupt+0x7f/0x90 asm_sysvec_apic_timer_interrupt+0x12/0x20 RIP: 0010:btrfs_tree_unlock+0x91/0x1a0 [btrfs] RSP: 0018:ffffc90007123a58 EFLAGS: 00000282 RAX: ffff8881cea2fbe0 RBX: ffff8881cea2fbe0 RCX: 0000000000000000 RDX: ffff8881d23fd200 RSI: ffffffff82045220 RDI: ffff8881cea2fba0 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000032 R10: 0000160000000000 R11: 0000000000001000 R12: 0000000000001000 R13: ffff8882357fd5b0 R14: ffff88816fa76e70 R15: ffff8881cea2fad0 ? btrfs_tree_unlock+0x15b/0x1a0 [btrfs] btrfs_release_path+0x67/0x80 [btrfs] btrfs_insert_replace_extent+0x177/0x2c0 [btrfs] btrfs_replace_file_extents+0x472/0x7c0 [btrfs] btrfs_clone+0x9ba/0xbd0 [btrfs] btrfs_clone_files.isra.0+0xeb/0x140 [btrfs] ? file_update_time+0xcd/0x120 btrfs_remap_file_range+0x322/0x3b0 [btrfs] do_clone_file_range+0xb7/0x1e0 vfs_clone_file_range+0x30/0xa0 ioctl_file_clone+0x8a/0xc0 do_vfs_ioctl+0x5b2/0x6f0 __x64_sys_ioctl+0x37/0xa0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f87977fc247 RSP: 002b:00007ffd51a2f6d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f87977fc247 RDX: 00007ffd51a2f710 RSI: 000000004020940d RDI: 0000000000000003 RBP: 0000000000000004 R08: 00007ffd51a79080 R09: 0000000000000000 R10: 00005621f11352f2 R11: 0000000000000206 R12: 0000000000000000 R13: 0000000000000000 R14: 00005621f128b958 R15: 0000000080000000 Kernel Offset: disabled ---[ end Kernel panic - not syncing: softlockup: hung tasks ]--- All of these lockup reports have the call chain btrfs_clone_files() -> btrfs_clone() in common. btrfs_clone_files() calls btrfs_clone() with both source and destination extents locked and loops over the source extent to create the clones. Conditionally reschedule in the btrfs_clone() loop, to give some time back to other processes. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Sasha Levin --- fs/btrfs/ioctl.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 63394b450afc..3fd6c9aed719 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3752,6 +3752,8 @@ process_slot: ret = -EINTR; goto out; } + + cond_resched(); } ret = 0; From bb8c6bd53cc0196b0416bef704eb1b3d57295f6f Mon Sep 17 00:00:00 2001 From: Tomasz Figa Date: Wed, 14 Oct 2020 14:16:24 +0000 Subject: [PATCH 011/151] ASoC: Intel: kbl_rt5663_max98927: Fix kabylake_ssp_fixup function MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 9fe9efd6924c9a62ebb759025bb8927e398f51f7 ] This is a copy of commit 5c5f1baee85a ("ASoC: Intel: kbl_rt5663_rt5514_max98927: Fix kabylake_ssp_fixup function") applied to the kbl_rt5663_max98927 board file. Original explanation of the change: kabylake_ssp_fixup function uses snd_soc_dpcm to identify the codecs DAIs. The HW parameters are changed based on the codec DAI of the stream. The earlier approach to get snd_soc_dpcm was using container_of() macro on snd_pcm_hw_params. The structures have been modified over time and snd_soc_dpcm does not have snd_pcm_hw_params as a reference but as a copy. This causes the current driver to crash when used. This patch changes the way snd_soc_dpcm is extracted. snd_soc_pcm_runtime holds 2 dpcm instances (one for playback and one for capture). 2 codecs on the SSP are dmic (capture) and speakers (playback). Based on the stream direction, snd_soc_dpcm is extracted from snd_soc_pcm_runtime. Fixes a boot crash on a HP Chromebook x2: [ 16.582225] BUG: kernel NULL pointer dereference, address: 0000000000000050 [ 16.582231] #PF: supervisor read access in kernel mode [ 16.582233] #PF: error_code(0x0000) - not-present page [ 16.582234] PGD 0 P4D 0 [ 16.582238] Oops: 0000 [#1] PREEMPT SMP PTI [ 16.582241] CPU: 0 PID: 1980 Comm: cras Tainted: G C 5.4.58 #1 [ 16.582243] Hardware name: HP Soraka/Soraka, BIOS Google_Soraka.10431.75.0 08/30/2018 [ 16.582247] RIP: 0010:kabylake_ssp_fixup+0x19/0xbb [snd_soc_kbl_rt5663_max98927] [ 16.582250] Code: c6 6f c5 80 c0 44 89 f2 31 c0 e8 3e c9 4c d6 eb de 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 53 48 89 f3 48 8b 46 c8 48 8b 4e d0 <48> 8b 49 10 4c 8b 78 10 4c 8b 31 4c 89 f7 48 c7 c6 4b c2 80 c0 e8 [ 16.582252] RSP: 0000:ffffaf7e81e0b958 EFLAGS: 00010282 [ 16.582254] RAX: ffffffff96f13e0d RBX: ffffaf7e81e0ba00 RCX: 0000000000000040 [ 16.582256] RDX: ffffaf7e81e0ba00 RSI: ffffaf7e81e0ba00 RDI: ffffa3b208558028 [ 16.582258] RBP: ffffaf7e81e0b970 R08: ffffa3b203b54160 R09: ffffaf7e81e0ba00 [ 16.582259] R10: 0000000000000000 R11: ffffffffc080b345 R12: ffffa3b209fb6e00 [ 16.582261] R13: ffffa3b1b1a47838 R14: ffffa3b1e6197f28 R15: ffffaf7e81e0ba00 [ 16.582263] FS: 00007eb3f25aaf80(0000) GS:ffffa3b236a00000(0000) knlGS:0000000000000000 [ 16.582265] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.582267] CR2: 0000000000000050 CR3: 0000000246bc8006 CR4: 00000000003606f0 [ 16.582269] Call Trace: [ 16.582275] snd_soc_link_be_hw_params_fixup+0x21/0x68 [ 16.582278] snd_soc_dai_hw_params+0x25/0x94 [ 16.582282] soc_pcm_hw_params+0x2d8/0x583 [ 16.582288] dpcm_be_dai_hw_params+0x172/0x29e [ 16.582291] dpcm_fe_dai_hw_params+0x9f/0x12f [ 16.582295] snd_pcm_hw_params+0x137/0x41c [ 16.582298] snd_pcm_hw_params_user+0x3c/0x71 [ 16.582301] snd_pcm_common_ioctl+0x2c6/0x565 [ 16.582304] snd_pcm_ioctl+0x32/0x36 [ 16.582307] do_vfs_ioctl+0x506/0x783 [ 16.582311] ksys_ioctl+0x58/0x83 [ 16.582313] __x64_sys_ioctl+0x1a/0x1e [ 16.582316] do_syscall_64+0x54/0x7e [ 16.582319] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 16.582322] RIP: 0033:0x7eb3f1886157 [ 16.582324] Code: 8a 66 90 48 8b 05 11 dd 2b 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 dc 2b 00 f7 d8 64 89 01 48 [ 16.582326] RSP: 002b:00007ffff7559818 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 16.582329] RAX: ffffffffffffffda RBX: 00005acc9188b140 RCX: 00007eb3f1886157 [ 16.582330] RDX: 00007ffff7559940 RSI: 00000000c2604111 RDI: 000000000000001e [ 16.582332] RBP: 00007ffff7559840 R08: 0000000000000004 R09: 0000000000000000 [ 16.582333] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000bb80 [ 16.582335] R13: 00005acc91702e80 R14: 00007ffff7559940 R15: 00005acc91702e80 [ 16.582337] Modules linked in: rfcomm cmac algif_hash algif_skcipher af_alg uinput hid_google_hammer snd_soc_kbl_rt5663_max98927 snd_soc_hdac_hdmi snd_soc_dmic snd_soc_skl_ssp_clk snd_soc_skl snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_hdac_hda snd_soc_acpi_intel_match snd_soc_acpi snd_hda_ext_core snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core ipu3_cio2 ipu3_imgu(C) videobuf2_v4l2 videobuf2_common videobuf2_dma_sg videobuf2_memops snd_soc_rt5663 snd_soc_max98927 snd_soc_rl6231 ov5670 ov13858 acpi_als v4l2_fwnode dw9714 fuse xt_MASQUERADE iio_trig_sysfs cros_ec_light_prox cros_ec_sensors cros_ec_sensors_core cros_ec_sensors_ring industrialio_triggered_buffer kfifo_buf industrialio cros_ec_sensorhub cdc_ether usbnet btusb btrtl btintel btbcm bluetooth ecdh_generic ecc lzo_rle lzo_compress iwlmvm zram iwl7000_mac80211 r8152 mii iwlwifi cfg80211 joydev [ 16.584243] gsmi: Log Shutdown Reason 0x03 [ 16.584246] CR2: 0000000000000050 [ 16.584248] ---[ end trace c8511d090c11edff ]--- Suggested-by: Łukasz Majczak Fixes: 2e5894d73789e ("ASoC: pcm: Add support for DAI multicodec") Signed-off-by: Tomasz Figa Acked-by: Pierre-Louis Bossart Link: https://lore.kernel.org/r/20201014141624.4143453-1-tfiga@chromium.org Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/intel/boards/kbl_rt5663_max98927.c | 39 ++++++++++++++++---- 1 file changed, 31 insertions(+), 8 deletions(-) diff --git a/sound/soc/intel/boards/kbl_rt5663_max98927.c b/sound/soc/intel/boards/kbl_rt5663_max98927.c index 7cefda341fbf..a540a2dad80c 100644 --- a/sound/soc/intel/boards/kbl_rt5663_max98927.c +++ b/sound/soc/intel/boards/kbl_rt5663_max98927.c @@ -401,17 +401,40 @@ static int kabylake_ssp_fixup(struct snd_soc_pcm_runtime *rtd, struct snd_interval *channels = hw_param_interval(params, SNDRV_PCM_HW_PARAM_CHANNELS); struct snd_mask *fmt = hw_param_mask(params, SNDRV_PCM_HW_PARAM_FORMAT); - struct snd_soc_dpcm *dpcm = container_of( - params, struct snd_soc_dpcm, hw_params); - struct snd_soc_dai_link *fe_dai_link = dpcm->fe->dai_link; - struct snd_soc_dai_link *be_dai_link = dpcm->be->dai_link; + struct snd_soc_dpcm *dpcm, *rtd_dpcm = NULL; + + /* + * The following loop will be called only for playback stream + * In this platform, there is only one playback device on every SSP + */ + for_each_dpcm_fe(rtd, SNDRV_PCM_STREAM_PLAYBACK, dpcm) { + rtd_dpcm = dpcm; + break; + } + + /* + * This following loop will be called only for capture stream + * In this platform, there is only one capture device on every SSP + */ + for_each_dpcm_fe(rtd, SNDRV_PCM_STREAM_CAPTURE, dpcm) { + rtd_dpcm = dpcm; + break; + } + + if (!rtd_dpcm) + return -EINVAL; + + /* + * The above 2 loops are mutually exclusive based on the stream direction, + * thus rtd_dpcm variable will never be overwritten + */ /* * The ADSP will convert the FE rate to 48k, stereo, 24 bit */ - if (!strcmp(fe_dai_link->name, "Kbl Audio Port") || - !strcmp(fe_dai_link->name, "Kbl Audio Headset Playback") || - !strcmp(fe_dai_link->name, "Kbl Audio Capture Port")) { + if (!strcmp(rtd_dpcm->fe->dai_link->name, "Kbl Audio Port") || + !strcmp(rtd_dpcm->fe->dai_link->name, "Kbl Audio Headset Playback") || + !strcmp(rtd_dpcm->fe->dai_link->name, "Kbl Audio Capture Port")) { rate->min = rate->max = 48000; channels->min = channels->max = 2; snd_mask_none(fmt); @@ -421,7 +444,7 @@ static int kabylake_ssp_fixup(struct snd_soc_pcm_runtime *rtd, * The speaker on the SSP0 supports S16_LE and not S24_LE. * thus changing the mask here */ - if (!strcmp(be_dai_link->name, "SSP0-Codec")) + if (!strcmp(rtd_dpcm->be->dai_link->name, "SSP0-Codec")) snd_mask_set_format(fmt, SNDRV_PCM_FORMAT_S16_LE); return 0; From bb2b60242c8ef59310f27287b26e8f633c65918d Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Thu, 15 Oct 2020 21:41:44 +0100 Subject: [PATCH 012/151] genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY [ Upstream commit 151a535171be6ff824a0a3875553ea38570f4c05 ] kernel/irq/ipi.c otherwise fails to compile if nothing else selects it. Fixes: 379b656446a3 ("genirq: Add GENERIC_IRQ_IPI Kconfig symbol") Reported-by: Pavel Machek Tested-by: Pavel Machek Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20201015101222.GA32747@amd Signed-off-by: Sasha Levin --- kernel/irq/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig index f92d9a687372..4e11120265c7 100644 --- a/kernel/irq/Kconfig +++ b/kernel/irq/Kconfig @@ -81,6 +81,7 @@ config IRQ_FASTEOI_HIERARCHY_HANDLERS # Generic IRQ IPI support config GENERIC_IRQ_IPI bool + select IRQ_DOMAIN_HIERARCHY # Generic MSI interrupt support config GENERIC_MSI_IRQ From 4e438ca1b62959c283ac036edf852e6b1a9707a0 Mon Sep 17 00:00:00 2001 From: Olaf Hering Date: Thu, 8 Oct 2020 09:12:15 +0200 Subject: [PATCH 013/151] hv_balloon: disable warning when floor reached [ Upstream commit 2c3bd2a5c86fe744e8377733c5e511a5ca1e14f5 ] It is not an error if the host requests to balloon down, but the VM refuses to do so. Without this change a warning is logged in dmesg every five minutes. Fixes: b3bb97b8a49f3 ("Drivers: hv: balloon: Add logging for dynamic memory operations") Signed-off-by: Olaf Hering Reviewed-by: Michael Kelley Link: https://lore.kernel.org/r/20201008071216.16554-1-olaf@aepfle.de Signed-off-by: Wei Liu Signed-off-by: Sasha Levin --- drivers/hv/hv_balloon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index 930674117533..bd4e72f6dfd4 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -1277,7 +1277,7 @@ static void balloon_up(struct work_struct *dummy) /* Refuse to balloon below the floor. */ if (avail_pages < num_pages || avail_pages - num_pages < floor) { - pr_warn("Balloon request will be partially fulfilled. %s\n", + pr_info("Balloon request will be partially fulfilled. %s\n", avail_pages < num_pages ? "Not enough memory." : "Balloon floor reached."); From 5cb904da85ed368a60c2b2a4077e6ee0cf4b41ed Mon Sep 17 00:00:00 2001 From: zhuoliang zhang Date: Fri, 23 Oct 2020 09:05:35 +0200 Subject: [PATCH 014/151] net: xfrm: fix a race condition during allocing spi [ Upstream commit a779d91314ca7208b7feb3ad817b62904397c56d ] we found that the following race condition exists in xfrm_alloc_userspi flow: user thread state_hash_work thread ---- ---- xfrm_alloc_userspi() __find_acq_core() /*alloc new xfrm_state:x*/ xfrm_state_alloc() /*schedule state_hash_work thread*/ xfrm_hash_grow_check() xfrm_hash_resize() xfrm_alloc_spi /*hold lock*/ x->id.spi = htonl(spi) spin_lock_bh(&net->xfrm.xfrm_state_lock) /*waiting lock release*/ xfrm_hash_transfer() spin_lock_bh(&net->xfrm.xfrm_state_lock) /*add x into hlist:net->xfrm.state_byspi*/ hlist_add_head_rcu(&x->byspi) spin_unlock_bh(&net->xfrm.xfrm_state_lock) /*add x into hlist:net->xfrm.state_byspi 2 times*/ hlist_add_head_rcu(&x->byspi) 1. a new state x is alloced in xfrm_state_alloc() and added into the bydst hlist in __find_acq_core() on the LHS; 2. on the RHS, state_hash_work thread travels the old bydst and tranfers every xfrm_state (include x) into the new bydst hlist and new byspi hlist; 3. user thread on the LHS gets the lock and adds x into the new byspi hlist again. So the same xfrm_state (x) is added into the same list_hash (net->xfrm.state_byspi) 2 times that makes the list_hash become an inifite loop. To fix the race, x->id.spi = htonl(spi) in the xfrm_alloc_spi() is moved to the back of spin_lock_bh, sothat state_hash_work thread no longer add x which id.spi is zero into the hash_list. Fixes: f034b5d4efdf ("[XFRM]: Dynamic xfrm_state hash table sizing.") Signed-off-by: zhuoliang zhang Acked-by: Herbert Xu Signed-off-by: Steffen Klassert Signed-off-by: Sasha Levin --- net/xfrm/xfrm_state.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index aaea8cb7459d..61fd0569d393 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -2001,6 +2001,7 @@ int xfrm_alloc_spi(struct xfrm_state *x, u32 low, u32 high) int err = -ENOENT; __be32 minspi = htonl(low); __be32 maxspi = htonl(high); + __be32 newspi = 0; u32 mark = x->mark.v & x->mark.m; spin_lock_bh(&x->lock); @@ -2019,21 +2020,22 @@ int xfrm_alloc_spi(struct xfrm_state *x, u32 low, u32 high) xfrm_state_put(x0); goto unlock; } - x->id.spi = minspi; + newspi = minspi; } else { u32 spi = 0; for (h = 0; h < high-low+1; h++) { spi = low + prandom_u32()%(high-low+1); x0 = xfrm_state_lookup(net, mark, &x->id.daddr, htonl(spi), x->id.proto, x->props.family); if (x0 == NULL) { - x->id.spi = htonl(spi); + newspi = htonl(spi); break; } xfrm_state_put(x0); } } - if (x->id.spi) { + if (newspi) { spin_lock_bh(&net->xfrm.xfrm_state_lock); + x->id.spi = newspi; h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto, x->props.family); hlist_add_head_rcu(&x->byspi, net->xfrm.state_byspi + h); spin_unlock_bh(&net->xfrm.xfrm_state_lock); From 933f911136e294e9d590516520bcf161912e82b0 Mon Sep 17 00:00:00 2001 From: Srinivas Kandagatla Date: Wed, 28 Oct 2020 15:43:40 +0000 Subject: [PATCH 015/151] ASoC: codecs: wcd9335: Set digital gain range correctly [ Upstream commit 6d6bc54ab4f2404d46078abc04bf4dee4db01def ] digital gain range is -84dB min to 40dB max, however this was not correctly specified in the range. Fix this by with correct range! Fixes: 8c4f021d806a ("ASoC: wcd9335: add basic controls") Signed-off-by: Srinivas Kandagatla Link: https://lore.kernel.org/r/20201028154340.17090-2-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/codecs/wcd9335.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/codecs/wcd9335.c b/sound/soc/codecs/wcd9335.c index f318403133e9..81906c25e4a8 100644 --- a/sound/soc/codecs/wcd9335.c +++ b/sound/soc/codecs/wcd9335.c @@ -618,7 +618,7 @@ static const char * const sb_tx8_mux_text[] = { "ZERO", "RX_MIX_TX8", "DEC8", "DEC8_192" }; -static const DECLARE_TLV_DB_SCALE(digital_gain, 0, 1, 0); +static const DECLARE_TLV_DB_SCALE(digital_gain, -8400, 100, -8400); static const DECLARE_TLV_DB_SCALE(line_gain, 0, 7, 1); static const DECLARE_TLV_DB_SCALE(analog_gain, 0, 25, 1); static const DECLARE_TLV_DB_SCALE(ear_pa_gain, 0, 150, 0); From 6234710dc634b0cb95372a8b556baac951fc91ca Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Mon, 26 Oct 2020 15:19:38 -0700 Subject: [PATCH 016/151] xfs: set xefi_discard when creating a deferred agfl free log intent item [ Upstream commit 2c334e12f957cd8c6bb66b4aa3f79848b7c33cab ] Make sure that we actually initialize xefi_discard when we're scheduling a deferred free of an AGFL block. This was (eventually) found by the UBSAN while I was banging on realtime rmap problems, but it exists in the upstream codebase. While we're at it, rearrange the structure to reduce the struct size from 64 to 56 bytes. Fixes: fcb762f5de2e ("xfs: add bmapi nodiscard flag") Signed-off-by: Darrick J. Wong Reviewed-by: Brian Foster Signed-off-by: Sasha Levin --- fs/xfs/libxfs/xfs_alloc.c | 1 + fs/xfs/libxfs/xfs_bmap.h | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 0a36f532cf86..436f686a9891 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2209,6 +2209,7 @@ xfs_defer_agfl_block( new->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno); new->xefi_blockcount = 1; new->xefi_oinfo = *oinfo; + new->xefi_skip_discard = false; trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1); diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index e2798c6f3a5f..093716a074fb 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -52,9 +52,9 @@ struct xfs_extent_free_item { xfs_fsblock_t xefi_startblock;/* starting fs block number */ xfs_extlen_t xefi_blockcount;/* number of blocks in extent */ + bool xefi_skip_discard; struct list_head xefi_list; struct xfs_owner_info xefi_oinfo; /* extent owner */ - bool xefi_skip_discard; }; #define XFS_BMAP_MAX_NMAP 4 From 56907fa27b9496609cdc90485555a176a7d4c16b Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Thu, 29 Oct 2020 03:56:06 +0100 Subject: [PATCH 017/151] netfilter: use actual socket sk rather than skb sk when routing harder [ Upstream commit 46d6c5ae953cc0be38efd0e469284df7c4328cf8 ] If netfilter changes the packet mark when mangling, the packet is rerouted using the route_me_harder set of functions. Prior to this commit, there's one big difference between route_me_harder and the ordinary initial routing functions, described in the comment above __ip_queue_xmit(): /* Note: skb->sk can be different from sk, in case of tunnels */ int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl, That function goes on to correctly make use of sk->sk_bound_dev_if, rather than skb->sk->sk_bound_dev_if. And indeed the comment is true: a tunnel will receive a packet in ndo_start_xmit with an initial skb->sk. It will make some transformations to that packet, and then it will send the encapsulated packet out of a *new* socket. That new socket will basically always have a different sk_bound_dev_if (otherwise there'd be a routing loop). So for the purposes of routing the encapsulated packet, the routing information as it pertains to the socket should come from that socket's sk, rather than the packet's original skb->sk. For that reason __ip_queue_xmit() and related functions all do the right thing. One might argue that all tunnels should just call skb_orphan(skb) before transmitting the encapsulated packet into the new socket. But tunnels do *not* do this -- and this is wisely avoided in skb_scrub_packet() too -- because features like TSQ rely on skb->destructor() being called when that buffer space is truely available again. Calling skb_orphan(skb) too early would result in buffers filling up unnecessarily and accounting info being all wrong. Instead, additional routing must take into account the new sk, just as __ip_queue_xmit() notes. So, this commit addresses the problem by fishing the correct sk out of state->sk -- it's already set properly in the call to nf_hook() in __ip_local_out(), which receives the sk as part of its normal functionality. So we make sure to plumb state->sk through the various route_me_harder functions, and then make correct use of it following the example of __ip_queue_xmit(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jason A. Donenfeld Reviewed-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- include/linux/netfilter_ipv4.h | 2 +- include/linux/netfilter_ipv6.h | 10 +++++----- net/ipv4/netfilter.c | 8 +++++--- net/ipv4/netfilter/iptable_mangle.c | 2 +- net/ipv4/netfilter/nf_reject_ipv4.c | 2 +- net/ipv6/netfilter.c | 6 +++--- net/ipv6/netfilter/ip6table_mangle.c | 2 +- net/netfilter/ipvs/ip_vs_core.c | 4 ++-- net/netfilter/nf_nat_proto.c | 4 ++-- net/netfilter/nf_synproxy_core.c | 2 +- net/netfilter/nft_chain_route.c | 4 ++-- net/netfilter/utils.c | 4 ++-- 12 files changed, 26 insertions(+), 24 deletions(-) diff --git a/include/linux/netfilter_ipv4.h b/include/linux/netfilter_ipv4.h index 082e2c41b7ff..5b70ca868bb1 100644 --- a/include/linux/netfilter_ipv4.h +++ b/include/linux/netfilter_ipv4.h @@ -16,7 +16,7 @@ struct ip_rt_info { u_int32_t mark; }; -int ip_route_me_harder(struct net *net, struct sk_buff *skb, unsigned addr_type); +int ip_route_me_harder(struct net *net, struct sock *sk, struct sk_buff *skb, unsigned addr_type); struct nf_queue_entry; diff --git a/include/linux/netfilter_ipv6.h b/include/linux/netfilter_ipv6.h index 9b67394471e1..48314ade1506 100644 --- a/include/linux/netfilter_ipv6.h +++ b/include/linux/netfilter_ipv6.h @@ -42,7 +42,7 @@ struct nf_ipv6_ops { #if IS_MODULE(CONFIG_IPV6) int (*chk_addr)(struct net *net, const struct in6_addr *addr, const struct net_device *dev, int strict); - int (*route_me_harder)(struct net *net, struct sk_buff *skb); + int (*route_me_harder)(struct net *net, struct sock *sk, struct sk_buff *skb); int (*dev_get_saddr)(struct net *net, const struct net_device *dev, const struct in6_addr *daddr, unsigned int srcprefs, struct in6_addr *saddr); @@ -143,9 +143,9 @@ static inline int nf_br_ip6_fragment(struct net *net, struct sock *sk, #endif } -int ip6_route_me_harder(struct net *net, struct sk_buff *skb); +int ip6_route_me_harder(struct net *net, struct sock *sk, struct sk_buff *skb); -static inline int nf_ip6_route_me_harder(struct net *net, struct sk_buff *skb) +static inline int nf_ip6_route_me_harder(struct net *net, struct sock *sk, struct sk_buff *skb) { #if IS_MODULE(CONFIG_IPV6) const struct nf_ipv6_ops *v6_ops = nf_get_ipv6_ops(); @@ -153,9 +153,9 @@ static inline int nf_ip6_route_me_harder(struct net *net, struct sk_buff *skb) if (!v6_ops) return -EHOSTUNREACH; - return v6_ops->route_me_harder(net, skb); + return v6_ops->route_me_harder(net, sk, skb); #elif IS_BUILTIN(CONFIG_IPV6) - return ip6_route_me_harder(net, skb); + return ip6_route_me_harder(net, sk, skb); #else return -EHOSTUNREACH; #endif diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c index a058213b77a7..7c841037c533 100644 --- a/net/ipv4/netfilter.c +++ b/net/ipv4/netfilter.c @@ -17,17 +17,19 @@ #include /* route_me_harder function, used by iptable_nat, iptable_mangle + ip_queue */ -int ip_route_me_harder(struct net *net, struct sk_buff *skb, unsigned int addr_type) +int ip_route_me_harder(struct net *net, struct sock *sk, struct sk_buff *skb, unsigned int addr_type) { const struct iphdr *iph = ip_hdr(skb); struct rtable *rt; struct flowi4 fl4 = {}; __be32 saddr = iph->saddr; - const struct sock *sk = skb_to_full_sk(skb); - __u8 flags = sk ? inet_sk_flowi_flags(sk) : 0; + __u8 flags; struct net_device *dev = skb_dst(skb)->dev; unsigned int hh_len; + sk = sk_to_full_sk(sk); + flags = sk ? inet_sk_flowi_flags(sk) : 0; + if (addr_type == RTN_UNSPEC) addr_type = inet_addr_type_dev_table(net, dev, saddr); if (addr_type == RTN_LOCAL || addr_type == RTN_UNICAST) diff --git a/net/ipv4/netfilter/iptable_mangle.c b/net/ipv4/netfilter/iptable_mangle.c index bb9266ea3785..ae45bcdd335e 100644 --- a/net/ipv4/netfilter/iptable_mangle.c +++ b/net/ipv4/netfilter/iptable_mangle.c @@ -62,7 +62,7 @@ ipt_mangle_out(struct sk_buff *skb, const struct nf_hook_state *state) iph->daddr != daddr || skb->mark != mark || iph->tos != tos) { - err = ip_route_me_harder(state->net, skb, RTN_UNSPEC); + err = ip_route_me_harder(state->net, state->sk, skb, RTN_UNSPEC); if (err < 0) ret = NF_DROP_ERR(err); } diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c index 2361fdac2c43..57817313a85c 100644 --- a/net/ipv4/netfilter/nf_reject_ipv4.c +++ b/net/ipv4/netfilter/nf_reject_ipv4.c @@ -127,7 +127,7 @@ void nf_send_reset(struct net *net, struct sk_buff *oldskb, int hook) ip4_dst_hoplimit(skb_dst(nskb))); nf_reject_ip_tcphdr_put(nskb, oldskb, oth); - if (ip_route_me_harder(net, nskb, RTN_UNSPEC)) + if (ip_route_me_harder(net, nskb->sk, nskb, RTN_UNSPEC)) goto free_nskb; niph = ip_hdr(nskb); diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c index 6d0e942d082d..ab9a279dd6d4 100644 --- a/net/ipv6/netfilter.c +++ b/net/ipv6/netfilter.c @@ -20,10 +20,10 @@ #include #include "../bridge/br_private.h" -int ip6_route_me_harder(struct net *net, struct sk_buff *skb) +int ip6_route_me_harder(struct net *net, struct sock *sk_partial, struct sk_buff *skb) { const struct ipv6hdr *iph = ipv6_hdr(skb); - struct sock *sk = sk_to_full_sk(skb->sk); + struct sock *sk = sk_to_full_sk(sk_partial); unsigned int hh_len; struct dst_entry *dst; int strict = (ipv6_addr_type(&iph->daddr) & @@ -84,7 +84,7 @@ static int nf_ip6_reroute(struct sk_buff *skb, if (!ipv6_addr_equal(&iph->daddr, &rt_info->daddr) || !ipv6_addr_equal(&iph->saddr, &rt_info->saddr) || skb->mark != rt_info->mark) - return ip6_route_me_harder(entry->state.net, skb); + return ip6_route_me_harder(entry->state.net, entry->state.sk, skb); } return 0; } diff --git a/net/ipv6/netfilter/ip6table_mangle.c b/net/ipv6/netfilter/ip6table_mangle.c index 070afb97fa2b..401e8dcb2c84 100644 --- a/net/ipv6/netfilter/ip6table_mangle.c +++ b/net/ipv6/netfilter/ip6table_mangle.c @@ -57,7 +57,7 @@ ip6t_mangle_out(struct sk_buff *skb, const struct nf_hook_state *state) skb->mark != mark || ipv6_hdr(skb)->hop_limit != hop_limit || flowlabel != *((u_int32_t *)ipv6_hdr(skb)))) { - err = ip6_route_me_harder(state->net, skb); + err = ip6_route_me_harder(state->net, state->sk, skb); if (err < 0) ret = NF_DROP_ERR(err); } diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index 64a05906cc0e..89aa1fc334b1 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -748,12 +748,12 @@ static int ip_vs_route_me_harder(struct netns_ipvs *ipvs, int af, struct dst_entry *dst = skb_dst(skb); if (dst->dev && !(dst->dev->flags & IFF_LOOPBACK) && - ip6_route_me_harder(ipvs->net, skb) != 0) + ip6_route_me_harder(ipvs->net, skb->sk, skb) != 0) return 1; } else #endif if (!(skb_rtable(skb)->rt_flags & RTCF_LOCAL) && - ip_route_me_harder(ipvs->net, skb, RTN_LOCAL) != 0) + ip_route_me_harder(ipvs->net, skb->sk, skb, RTN_LOCAL) != 0) return 1; return 0; diff --git a/net/netfilter/nf_nat_proto.c b/net/netfilter/nf_nat_proto.c index 59151dc07fdc..e87b6bd6b3cd 100644 --- a/net/netfilter/nf_nat_proto.c +++ b/net/netfilter/nf_nat_proto.c @@ -715,7 +715,7 @@ nf_nat_ipv4_local_fn(void *priv, struct sk_buff *skb, if (ct->tuplehash[dir].tuple.dst.u3.ip != ct->tuplehash[!dir].tuple.src.u3.ip) { - err = ip_route_me_harder(state->net, skb, RTN_UNSPEC); + err = ip_route_me_harder(state->net, state->sk, skb, RTN_UNSPEC); if (err < 0) ret = NF_DROP_ERR(err); } @@ -953,7 +953,7 @@ nf_nat_ipv6_local_fn(void *priv, struct sk_buff *skb, if (!nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.dst.u3, &ct->tuplehash[!dir].tuple.src.u3)) { - err = nf_ip6_route_me_harder(state->net, skb); + err = nf_ip6_route_me_harder(state->net, state->sk, skb); if (err < 0) ret = NF_DROP_ERR(err); } diff --git a/net/netfilter/nf_synproxy_core.c b/net/netfilter/nf_synproxy_core.c index b9cbe1e2453e..4bb4cfde28b4 100644 --- a/net/netfilter/nf_synproxy_core.c +++ b/net/netfilter/nf_synproxy_core.c @@ -446,7 +446,7 @@ synproxy_send_tcp(struct net *net, skb_dst_set_noref(nskb, skb_dst(skb)); nskb->protocol = htons(ETH_P_IP); - if (ip_route_me_harder(net, nskb, RTN_UNSPEC)) + if (ip_route_me_harder(net, nskb->sk, nskb, RTN_UNSPEC)) goto free_nskb; if (nfct) { diff --git a/net/netfilter/nft_chain_route.c b/net/netfilter/nft_chain_route.c index 8826bbe71136..edd02cda57fc 100644 --- a/net/netfilter/nft_chain_route.c +++ b/net/netfilter/nft_chain_route.c @@ -42,7 +42,7 @@ static unsigned int nf_route_table_hook4(void *priv, iph->daddr != daddr || skb->mark != mark || iph->tos != tos) { - err = ip_route_me_harder(state->net, skb, RTN_UNSPEC); + err = ip_route_me_harder(state->net, state->sk, skb, RTN_UNSPEC); if (err < 0) ret = NF_DROP_ERR(err); } @@ -92,7 +92,7 @@ static unsigned int nf_route_table_hook6(void *priv, skb->mark != mark || ipv6_hdr(skb)->hop_limit != hop_limit || flowlabel != *((u32 *)ipv6_hdr(skb)))) { - err = nf_ip6_route_me_harder(state->net, skb); + err = nf_ip6_route_me_harder(state->net, state->sk, skb); if (err < 0) ret = NF_DROP_ERR(err); } diff --git a/net/netfilter/utils.c b/net/netfilter/utils.c index 51b454d8fa9c..924195861faf 100644 --- a/net/netfilter/utils.c +++ b/net/netfilter/utils.c @@ -191,8 +191,8 @@ static int nf_ip_reroute(struct sk_buff *skb, const struct nf_queue_entry *entry skb->mark == rt_info->mark && iph->daddr == rt_info->daddr && iph->saddr == rt_info->saddr)) - return ip_route_me_harder(entry->state.net, skb, - RTN_UNSPEC); + return ip_route_me_harder(entry->state.net, entry->state.sk, + skb, RTN_UNSPEC); } #endif return 0; From ad017cf5dacea8f318151c31f6227264eca4746d Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Thu, 29 Oct 2020 13:50:03 +0100 Subject: [PATCH 018/151] netfilter: nf_tables: missing validation from the abort path [ Upstream commit c0391b6ab810381df632677a1dcbbbbd63d05b6d ] If userspace does not include the trailing end of batch message, then nfnetlink aborts the transaction. This allows to check that ruleset updates trigger no errors. After this patch, invoking this command from the prerouting chain: # nft -c add rule x y fib saddr . oif type local fails since oif is not supported there. This patch fixes the lack of rule validation from the abort/check path to catch configuration errors such as the one above. Fixes: a654de8fdc18 ("netfilter: nf_tables: fix chain dependency validation") Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- include/linux/netfilter/nfnetlink.h | 9 ++++++++- net/netfilter/nf_tables_api.c | 15 ++++++++++----- net/netfilter/nfnetlink.c | 22 ++++++++++++++++++---- 3 files changed, 36 insertions(+), 10 deletions(-) diff --git a/include/linux/netfilter/nfnetlink.h b/include/linux/netfilter/nfnetlink.h index 89016d08f6a2..f6267e2883f2 100644 --- a/include/linux/netfilter/nfnetlink.h +++ b/include/linux/netfilter/nfnetlink.h @@ -24,6 +24,12 @@ struct nfnl_callback { const u_int16_t attr_count; /* number of nlattr's */ }; +enum nfnl_abort_action { + NFNL_ABORT_NONE = 0, + NFNL_ABORT_AUTOLOAD, + NFNL_ABORT_VALIDATE, +}; + struct nfnetlink_subsystem { const char *name; __u8 subsys_id; /* nfnetlink subsystem ID */ @@ -31,7 +37,8 @@ struct nfnetlink_subsystem { const struct nfnl_callback *cb; /* callback for individual types */ struct module *owner; int (*commit)(struct net *net, struct sk_buff *skb); - int (*abort)(struct net *net, struct sk_buff *skb, bool autoload); + int (*abort)(struct net *net, struct sk_buff *skb, + enum nfnl_abort_action action); void (*cleanup)(struct net *net); bool (*valid_genid)(struct net *net, u32 genid); }; diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 5a77b7a17722..51391d5d2265 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7010,11 +7010,15 @@ static void nf_tables_abort_release(struct nft_trans *trans) kfree(trans); } -static int __nf_tables_abort(struct net *net, bool autoload) +static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) { struct nft_trans *trans, *next; struct nft_trans_elem *te; + if (action == NFNL_ABORT_VALIDATE && + nf_tables_validate(net) < 0) + return -EAGAIN; + list_for_each_entry_safe_reverse(trans, next, &net->nft.commit_list, list) { switch (trans->msg_type) { @@ -7132,7 +7136,7 @@ static int __nf_tables_abort(struct net *net, bool autoload) nf_tables_abort_release(trans); } - if (autoload) + if (action == NFNL_ABORT_AUTOLOAD) nf_tables_module_autoload(net); else nf_tables_module_autoload_cleanup(net); @@ -7145,9 +7149,10 @@ static void nf_tables_cleanup(struct net *net) nft_validate_state_update(net, NFT_VALIDATE_SKIP); } -static int nf_tables_abort(struct net *net, struct sk_buff *skb, bool autoload) +static int nf_tables_abort(struct net *net, struct sk_buff *skb, + enum nfnl_abort_action action) { - int ret = __nf_tables_abort(net, autoload); + int ret = __nf_tables_abort(net, action); mutex_unlock(&net->nft.commit_mutex); @@ -7754,7 +7759,7 @@ static void __net_exit nf_tables_exit_net(struct net *net) { mutex_lock(&net->nft.commit_mutex); if (!list_empty(&net->nft.commit_list)) - __nf_tables_abort(net, false); + __nf_tables_abort(net, NFNL_ABORT_NONE); __nft_release_tables(net); mutex_unlock(&net->nft.commit_mutex); WARN_ON_ONCE(!list_empty(&net->nft.tables)); diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c index 6d03b0909621..81c86a156c6c 100644 --- a/net/netfilter/nfnetlink.c +++ b/net/netfilter/nfnetlink.c @@ -315,7 +315,7 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh, return netlink_ack(skb, nlh, -EINVAL, NULL); replay: status = 0; - +replay_abort: skb = netlink_skb_clone(oskb, GFP_KERNEL); if (!skb) return netlink_ack(oskb, nlh, -ENOMEM, NULL); @@ -481,7 +481,7 @@ ack: } done: if (status & NFNL_BATCH_REPLAY) { - ss->abort(net, oskb, true); + ss->abort(net, oskb, NFNL_ABORT_AUTOLOAD); nfnl_err_reset(&err_list); kfree_skb(skb); module_put(ss->owner); @@ -492,11 +492,25 @@ done: status |= NFNL_BATCH_REPLAY; goto done; } else if (err) { - ss->abort(net, oskb, false); + ss->abort(net, oskb, NFNL_ABORT_NONE); netlink_ack(oskb, nlmsg_hdr(oskb), err, NULL); } } else { - ss->abort(net, oskb, false); + enum nfnl_abort_action abort_action; + + if (status & NFNL_BATCH_FAILURE) + abort_action = NFNL_ABORT_NONE; + else + abort_action = NFNL_ABORT_VALIDATE; + + err = ss->abort(net, oskb, abort_action); + if (err == -EAGAIN) { + nfnl_err_reset(&err_list); + kfree_skb(skb); + module_put(ss->owner); + status |= NFNL_BATCH_FAILURE; + goto replay_abort; + } } if (ss->cleanup) ss->cleanup(net); From 1c8fe343a79d5c254e629f3f18e4aff37880adfe Mon Sep 17 00:00:00 2001 From: Stefano Brivio Date: Thu, 29 Oct 2020 16:39:46 +0100 Subject: [PATCH 019/151] netfilter: ipset: Update byte and packet counters regardless of whether they match [ Upstream commit 7d10e62c2ff8e084c136c94d32d9a94de4d31248 ] In ip_set_match_extensions(), for sets with counters, we take care of updating counters themselves by calling ip_set_update_counter(), and of checking if the given comparison and values match, by calling ip_set_match_counter() if needed. However, if a given comparison on counters doesn't match the configured values, that doesn't mean the set entry itself isn't matching. This fix restores the behaviour we had before commit 4750005a85f7 ("netfilter: ipset: Fix "don't update counters" mode when counters used at the matching"), without reintroducing the issue fixed there: back then, mtype_data_match() first updated counters in any case, and then took care of matching on counters. Now, if the IPSET_FLAG_SKIP_COUNTER_UPDATE flag is set, ip_set_update_counter() will anyway skip counter updates if desired. The issue observed is illustrated by this reproducer: ipset create c hash:ip counters ipset add c 192.0.2.1 iptables -I INPUT -m set --match-set c src --bytes-gt 800 -j DROP if we now send packets from 192.0.2.1, bytes and packets counters for the entry as shown by 'ipset list' are always zero, and, no matter how many bytes we send, the rule will never match, because counters themselves are not updated. Reported-by: Mithil Mhatre Fixes: 4750005a85f7 ("netfilter: ipset: Fix "don't update counters" mode when counters used at the matching") Signed-off-by: Stefano Brivio Signed-off-by: Jozsef Kadlecsik Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- net/netfilter/ipset/ip_set_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index 133a3f1b6f56..3cc4daa856d6 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -485,13 +485,14 @@ ip_set_match_extensions(struct ip_set *set, const struct ip_set_ext *ext, if (SET_WITH_COUNTER(set)) { struct ip_set_counter *counter = ext_counter(data, set); + ip_set_update_counter(counter, ext, flags); + if (flags & IPSET_FLAG_MATCH_COUNTERS && !(ip_set_match_counter(ip_set_get_packets(counter), mext->packets, mext->packets_op) && ip_set_match_counter(ip_set_get_bytes(counter), mext->bytes, mext->bytes_op))) return false; - ip_set_update_counter(counter, ext, flags); } if (SET_WITH_SKBINFO(set)) ip_set_get_skbinfo(ext_skbinfo(data, set), From d261d0bd90660dd3706542d34bbf9bfe49d937ae Mon Sep 17 00:00:00 2001 From: Qian Cai Date: Wed, 28 Oct 2020 11:27:17 -0400 Subject: [PATCH 020/151] powerpc/eeh_cache: Fix a possible debugfs deadlock [ Upstream commit fd552e0542b4532483289cce48fdbd27b692984b ] Lockdep complains that a possible deadlock below in eeh_addr_cache_show() because it is acquiring a lock with IRQ enabled, but eeh_addr_cache_insert_dev() needs to acquire the same lock with IRQ disabled. Let's just make eeh_addr_cache_show() acquire the lock with IRQ disabled as well. CPU0 CPU1 ---- ---- lock(&pci_io_addr_cache_root.piar_lock); local_irq_disable(); lock(&tp->lock); lock(&pci_io_addr_cache_root.piar_lock); lock(&tp->lock); *** DEADLOCK *** lock_acquire+0x140/0x5f0 _raw_spin_lock_irqsave+0x64/0xb0 eeh_addr_cache_insert_dev+0x48/0x390 eeh_probe_device+0xb8/0x1a0 pnv_pcibios_bus_add_device+0x3c/0x80 pcibios_bus_add_device+0x118/0x290 pci_bus_add_device+0x28/0xe0 pci_bus_add_devices+0x54/0xb0 pcibios_init+0xc4/0x124 do_one_initcall+0xac/0x528 kernel_init_freeable+0x35c/0x3fc kernel_init+0x24/0x148 ret_from_kernel_thread+0x5c/0x80 lock_acquire+0x140/0x5f0 _raw_spin_lock+0x4c/0x70 eeh_addr_cache_show+0x38/0x110 seq_read+0x1a0/0x660 vfs_read+0xc8/0x1f0 ksys_read+0x74/0x130 system_call_exception+0xf8/0x1d0 system_call_common+0xe8/0x218 Fixes: 5ca85ae6318d ("powerpc/eeh_cache: Add a way to dump the EEH address cache") Signed-off-by: Qian Cai Reviewed-by: Oliver O'Halloran Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20201028152717.8967-1-cai@redhat.com Signed-off-by: Sasha Levin --- arch/powerpc/kernel/eeh_cache.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c index cf11277ebd02..000ebb5a6fb3 100644 --- a/arch/powerpc/kernel/eeh_cache.c +++ b/arch/powerpc/kernel/eeh_cache.c @@ -272,8 +272,9 @@ static int eeh_addr_cache_show(struct seq_file *s, void *v) { struct pci_io_addr_range *piar; struct rb_node *n; + unsigned long flags; - spin_lock(&pci_io_addr_cache_root.piar_lock); + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); for (n = rb_first(&pci_io_addr_cache_root.rb_root); n; n = rb_next(n)) { piar = rb_entry(n, struct pci_io_addr_range, rb_node); @@ -281,7 +282,7 @@ static int eeh_addr_cache_show(struct seq_file *s, void *v) (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", &piar->addr_lo, &piar->addr_hi, pci_name(piar->pcidev)); } - spin_unlock(&pci_io_addr_cache_root.piar_lock); + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); return 0; } From b36f78fd48e9261503c2bf5a79b2a9127e1d1a57 Mon Sep 17 00:00:00 2001 From: Stanislav Ivanichkin Date: Tue, 27 Oct 2020 12:43:57 +0300 Subject: [PATCH 021/151] perf trace: Fix segfault when trying to trace events by cgroup [ Upstream commit a6293f36ac92ab513771a98efe486477be2f981f ] # ./perf trace -e sched:sched_switch -G test -a sleep 1 perf: Segmentation fault Obtained 11 stack frames. ./perf(sighandler_dump_stack+0x43) [0x55cfdc636db3] /lib/x86_64-linux-gnu/libc.so.6(+0x3efcf) [0x7fd23eecafcf] ./perf(parse_cgroups+0x36) [0x55cfdc673f36] ./perf(+0x3186ed) [0x55cfdc70d6ed] ./perf(parse_options_subcommand+0x629) [0x55cfdc70e999] ./perf(cmd_trace+0x9c2) [0x55cfdc5ad6d2] ./perf(+0x1e8ae0) [0x55cfdc5ddae0] ./perf(+0x1e8ded) [0x55cfdc5ddded] ./perf(main+0x370) [0x55cfdc556f00] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x7fd23eeadb96] ./perf(_start+0x29) [0x55cfdc557389] Segmentation fault # It happens because "struct trace" in option->value is passed to the parse_cgroups function instead of "struct evlist". Fixes: 9ea42ba4411ac ("perf trace: Support setting cgroups as targets") Signed-off-by: Stanislav Ivanichkin Tested-by: Arnaldo Carvalho de Melo Acked-by: Namhyung Kim Cc: Dmitry Monakhov Link: http://lore.kernel.org/lkml/20201027094357.94881-1-sivanichkin@yandex-team.ru Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin --- tools/perf/builtin-trace.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c index bb5130d02155..a5201de1a191 100644 --- a/tools/perf/builtin-trace.c +++ b/tools/perf/builtin-trace.c @@ -3979,9 +3979,9 @@ do_concat: err = 0; if (lists[0]) { - struct option o = OPT_CALLBACK('e', "event", &trace->evlist, "event", - "event selector. use 'perf list' to list available events", - parse_events_option); + struct option o = { + .value = &trace->evlist, + }; err = parse_events_option(&o, lists[0], 0); } out: @@ -3995,9 +3995,12 @@ static int trace__parse_cgroups(const struct option *opt, const char *str, int u { struct trace *trace = opt->value; - if (!list_empty(&trace->evlist->core.entries)) - return parse_cgroups(opt, str, unset); - + if (!list_empty(&trace->evlist->core.entries)) { + struct option o = { + .value = &trace->evlist, + }; + return parse_cgroups(&o, str, unset); + } trace->cgroup = evlist__findnew_cgroup(trace->evlist, str); return 0; From 22901751d2697a1c824df22c4c4af8cb4ecd3f68 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Mon, 2 Nov 2020 00:31:03 +0100 Subject: [PATCH 022/151] perf tools: Add missing swap for ino_generation [ Upstream commit fe01adb72356a4e2f8735e4128af85921ca98fa1 ] We are missing swap for ino_generation field. Fixes: 5c5e854bc760 ("perf tools: Add attr->mmap2 support") Signed-off-by: Jiri Olsa Acked-by: Namhyung Kim Link: https://lore.kernel.org/r/20201101233103.3537427-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin --- tools/perf/util/session.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 5c172845fa5a..ff524a3fc500 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -588,6 +588,7 @@ static void perf_event__mmap2_swap(union perf_event *event, event->mmap2.maj = bswap_32(event->mmap2.maj); event->mmap2.min = bswap_32(event->mmap2.min); event->mmap2.ino = bswap_64(event->mmap2.ino); + event->mmap2.ino_generation = bswap_64(event->mmap2.ino_generation); if (sample_id_all) { void *data = &event->mmap2.filename; From 2825a5bf3ca5d7a9ed27738aad5f7e26e1857111 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Tue, 3 Nov 2020 13:18:07 +0300 Subject: [PATCH 023/151] ALSA: hda: prevent undefined shift in snd_hdac_ext_bus_get_link() [ Upstream commit 158e1886b6262c1d1c96a18c85fac5219b8bf804 ] This is harmless, but the "addr" comes from the user and it could lead to a negative shift or to shift wrapping if it's too high. Fixes: 0b00a5615dc4 ("ALSA: hdac_ext: add hdac extended controller") Signed-off-by: Dan Carpenter Link: https://lore.kernel.org/r/20201103101807.GC1127762@mwanda Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin --- sound/hda/ext/hdac_ext_controller.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/hda/ext/hdac_ext_controller.c b/sound/hda/ext/hdac_ext_controller.c index 09ff209df4a3..c87187f63573 100644 --- a/sound/hda/ext/hdac_ext_controller.c +++ b/sound/hda/ext/hdac_ext_controller.c @@ -148,6 +148,8 @@ struct hdac_ext_link *snd_hdac_ext_bus_get_link(struct hdac_bus *bus, return NULL; if (bus->idx != bus_idx) return NULL; + if (addr < 0 || addr > 31) + return NULL; list_for_each_entry(hlink, &bus->hlink_list, list) { for (i = 0; i < HDA_MAX_CODECS; i++) { From 9946509a027bcc1665d6a3b8e38ce6de138051ad Mon Sep 17 00:00:00 2001 From: "Liu, Yi L" Date: Fri, 30 Oct 2020 10:37:24 +0800 Subject: [PATCH 024/151] iommu/vt-d: Fix a bug for PDP check in prq_event_thread [ Upstream commit 71cd8e2d16703a9df5c86a9e19f4cba99316cc53 ] In prq_event_thread(), the QI_PGRP_PDP is wrongly set by 'req->pasid_present' which should be replaced to 'req->priv_data_present'. Fixes: 5b438f4ba315 ("iommu/vt-d: Support page request in scalable mode") Signed-off-by: Liu, Yi L Signed-off-by: Yi Sun Acked-by: Lu Baolu Link: https://lore.kernel.org/r/1604025444-6954-3-git-send-email-yi.y.sun@linux.intel.com Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/intel-svm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index 1d3816cd65d5..ec69a99b99ba 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -646,7 +646,7 @@ static irqreturn_t prq_event_thread(int irq, void *d) resp.qw0 = QI_PGRP_PASID(req->pasid) | QI_PGRP_DID(req->rid) | QI_PGRP_PASID_P(req->pasid_present) | - QI_PGRP_PDP(req->pasid_present) | + QI_PGRP_PDP(req->priv_data_present) | QI_PGRP_RESP_CODE(result) | QI_PGRP_RESP_TYPE; resp.qw1 = QI_PGRP_IDX(req->prg_index) | From e201588fad54f7b4b5fc4a1b0f4a249223362a3f Mon Sep 17 00:00:00 2001 From: David Howells Date: Tue, 3 Nov 2020 16:32:58 +0000 Subject: [PATCH 025/151] afs: Fix warning due to unadvanced marshalling pointer [ Upstream commit c80afa1d9c3603d5eddeb8d63368823b1982f3f0 ] When using the afs.yfs.acl xattr to change an AuriStor ACL, a warning can be generated when the request is marshalled because the buffer pointer isn't increased after adding the last element, thereby triggering the check at the end if the ACL wasn't empty. This just causes something like the following warning, but doesn't stop the call from happening successfully: kAFS: YFS.StoreOpaqueACL2: Request buffer underflow (36<108) Fix this simply by increasing the count prior to the check. Fixes: f5e4546347bc ("afs: Implement YFS ACL setting") Signed-off-by: David Howells Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- fs/afs/yfsclient.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c index d21cf61d86b9..3b19b009452a 100644 --- a/fs/afs/yfsclient.c +++ b/fs/afs/yfsclient.c @@ -2162,6 +2162,7 @@ int yfs_fs_store_opaque_acl2(struct afs_fs_cursor *fc, const struct afs_acl *acl memcpy(bp, acl->data, acl->size); if (acl->size != size) memset((void *)bp + acl->size, 0, size - acl->size); + bp += size / sizeof(__be32); yfs_check_req(call, bp); trace_afs_make_fs_call(call, &vnode->fid); From 3d095476791840529385f03ef437f183d0ac805e Mon Sep 17 00:00:00 2001 From: Marc Kleine-Budde Date: Thu, 18 Jun 2020 12:47:06 +0200 Subject: [PATCH 026/151] can: rx-offload: don't call kfree_skb() from IRQ context [ Upstream commit 2ddd6bfe7bdbb6c661835c3ff9cab8e0769940a6 ] A CAN driver, using the rx-offload infrastructure, is reading CAN frames (usually in IRQ context) from the hardware and placing it into the rx-offload queue to be delivered to the networking stack via NAPI. In case the rx-offload queue is full, trying to add more skbs results in the skbs being dropped using kfree_skb(). If done from hard-IRQ context this results in the following warning: [ 682.552693] ------------[ cut here ]------------ [ 682.557360] WARNING: CPU: 0 PID: 3057 at net/core/skbuff.c:650 skb_release_head_state+0x74/0x84 [ 682.566075] Modules linked in: can_raw can coda_vpu flexcan dw_hdmi_ahb_audio v4l2_jpeg imx_vdoa can_dev [ 682.575597] CPU: 0 PID: 3057 Comm: cansend Tainted: G W 5.7.0+ #18 [ 682.583098] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 682.589657] [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [ 682.597423] [] (show_stack) from [] (dump_stack+0xe0/0x114) [ 682.604759] [] (dump_stack) from [] (__warn+0xc0/0x10c) [ 682.611742] [] (__warn) from [] (warn_slowpath_fmt+0x5c/0xc0) [ 682.619248] [] (warn_slowpath_fmt) from [] (skb_release_head_state+0x74/0x84) [ 682.628143] [] (skb_release_head_state) from [] (skb_release_all+0xc/0x24) [ 682.636774] [] (skb_release_all) from [] (kfree_skb+0x74/0x1c8) [ 682.644479] [] (kfree_skb) from [] (can_rx_offload_queue_sorted+0xe0/0xe8 [can_dev]) [ 682.654051] [] (can_rx_offload_queue_sorted [can_dev]) from [] (can_rx_offload_get_echo_skb+0x48/0x94 [can_dev]) [ 682.666007] [] (can_rx_offload_get_echo_skb [can_dev]) from [] (flexcan_irq+0x194/0x5dc [flexcan]) [ 682.676734] [] (flexcan_irq [flexcan]) from [] (__handle_irq_event_percpu+0x4c/0x3ec) [ 682.686322] [] (__handle_irq_event_percpu) from [] (handle_irq_event_percpu+0x2c/0x88) [ 682.695993] [] (handle_irq_event_percpu) from [] (handle_irq_event+0x38/0x5c) [ 682.704887] [] (handle_irq_event) from [] (handle_fasteoi_irq+0xc8/0x180) [ 682.713432] [] (handle_fasteoi_irq) from [] (generic_handle_irq+0x30/0x44) [ 682.722063] [] (generic_handle_irq) from [] (__handle_domain_irq+0x64/0xdc) [ 682.730783] [] (__handle_domain_irq) from [] (gic_handle_irq+0x48/0x9c) [ 682.739158] [] (gic_handle_irq) from [] (__irq_svc+0x70/0x98) [ 682.746656] Exception stack(0xe80e9dd8 to 0xe80e9e20) [ 682.751725] 9dc0: 00000001 e80e8000 [ 682.759922] 9de0: e820cf80 00000000 ffffe000 00000000 eaf08fe4 00000000 600d0013 00000000 [ 682.768117] 9e00: c1732e3c c16093a8 e820d4c0 e80e9e28 c018a57c c018b870 600d0013 ffffffff [ 682.776315] [] (__irq_svc) from [] (lock_acquire+0x108/0x4e8) [ 682.783821] [] (lock_acquire) from [] (down_write+0x48/0xa8) [ 682.791242] [] (down_write) from [] (unlink_file_vma+0x24/0x40) [ 682.798922] [] (unlink_file_vma) from [] (free_pgtables+0x34/0xb8) [ 682.806858] [] (free_pgtables) from [] (exit_mmap+0xe4/0x170) [ 682.814361] [] (exit_mmap) from [] (mmput+0x5c/0x110) [ 682.821171] [] (mmput) from [] (do_exit+0x374/0xbe4) [ 682.827892] [] (do_exit) from [] (do_group_exit+0x38/0xb4) [ 682.835132] [] (do_group_exit) from [] (__wake_up_parent+0x0/0x14) [ 682.843063] irq event stamp: 1936 [ 682.846399] hardirqs last enabled at (1935): [] rmqueue+0xf4/0xc64 [ 682.853553] hardirqs last disabled at (1936): [] __irq_svc+0x60/0x98 [ 682.860799] softirqs last enabled at (1878): [] raw_release+0x108/0x1f0 [can_raw] [ 682.869256] softirqs last disabled at (1876): [] release_sock+0x18/0x98 [ 682.876753] ---[ end trace 7bca4751ce44c444 ]--- This patch fixes the problem by replacing the kfree_skb() by dev_kfree_skb_any(), as rx-offload might be called from threaded IRQ handlers as well. Fixes: ca913f1ac024 ("can: rx-offload: can_rx_offload_queue_sorted(): fix error handling, avoid skb mem leak") Fixes: 6caf8a6d6586 ("can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak") Link: http://lore.kernel.org/r/20201019190524.1285319-3-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin --- drivers/net/can/rx-offload.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/can/rx-offload.c b/drivers/net/can/rx-offload.c index 84cae167e42f..7e75a87a8a6a 100644 --- a/drivers/net/can/rx-offload.c +++ b/drivers/net/can/rx-offload.c @@ -272,7 +272,7 @@ int can_rx_offload_queue_sorted(struct can_rx_offload *offload, if (skb_queue_len(&offload->skb_queue) > offload->skb_queue_len_max) { - kfree_skb(skb); + dev_kfree_skb_any(skb); return -ENOBUFS; } @@ -317,7 +317,7 @@ int can_rx_offload_queue_tail(struct can_rx_offload *offload, { if (skb_queue_len(&offload->skb_queue) > offload->skb_queue_len_max) { - kfree_skb(skb); + dev_kfree_skb_any(skb); return -ENOBUFS; } From ab46748bf98864f9c3f5559060bf8caf9df2b41e Mon Sep 17 00:00:00 2001 From: Vincent Mailhol Date: Sat, 3 Oct 2020 00:41:45 +0900 Subject: [PATCH 027/151] can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ context [ Upstream commit 2283f79b22684d2812e5c76fc2280aae00390365 ] If a driver calls can_get_echo_skb() during a hardware IRQ (which is often, but not always, the case), the 'WARN_ON(in_irq)' in net/core/skbuff.c#skb_release_head_state() might be triggered, under network congestion circumstances, together with the potential risk of a NULL pointer dereference. The root cause of this issue is the call to kfree_skb() instead of dev_kfree_skb_irq() in net/core/dev.c#enqueue_to_backlog(). This patch prevents the skb to be freed within the call to netif_rx() by incrementing its reference count with skb_get(). The skb is finally freed by one of the in-irq-context safe functions: dev_consume_skb_any() or dev_kfree_skb_any(). The "any" version is used because some drivers might call can_get_echo_skb() in a normal context. The reason for this issue to occur is that initially, in the core network stack, loopback skb were not supposed to be received in hardware IRQ context. The CAN stack is an exeption. This bug was previously reported back in 2017 in [1] but the proposed patch never got accepted. While [1] directly modifies net/core/dev.c, we try to propose here a smoother modification local to CAN network stack (the assumption behind is that only CAN devices are affected by this issue). [1] http://lore.kernel.org/r/57a3ffb6-3309-3ad5-5a34-e93c3fe3614d@cetitec.com Signed-off-by: Vincent Mailhol Link: https://lore.kernel.org/r/20201002154219.4887-2-mailhol.vincent@wanadoo.fr Fixes: 39549eef3587 ("can: CAN Network device driver and Netlink interface") Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin --- drivers/net/can/dev.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c index 3a33fb503400..de3c589b1ae0 100644 --- a/drivers/net/can/dev.c +++ b/drivers/net/can/dev.c @@ -512,7 +512,11 @@ unsigned int can_get_echo_skb(struct net_device *dev, unsigned int idx) if (!skb) return 0; - netif_rx(skb); + skb_get(skb); + if (netif_rx(skb) == NET_RX_SUCCESS) + dev_consume_skb_any(skb); + else + dev_kfree_skb_any(skb); return len; } From 183f1af506fe0317dfda966f5ceb21619d726947 Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Tue, 20 Oct 2020 08:44:43 +0200 Subject: [PATCH 028/151] can: dev: __can_get_echo_skb(): fix real payload length return value for RTR frames [ Upstream commit ed3320cec279407a86bc4c72edc4a39eb49165ec ] The can_get_echo_skb() function returns the number of received bytes to be used for netdev statistics. In the case of RTR frames we get a valid (potential non-zero) data length value which has to be passed for further operations. But on the wire RTR frames have no payload length. Therefore the value to be used in the statistics has to be zero for RTR frames. Reported-by: Vincent Mailhol Signed-off-by: Oliver Hartkopp Link: https://lore.kernel.org/r/20201020064443.80164-1-socketcan@hartkopp.net Fixes: cf5046b309b3 ("can: dev: let can_get_echo_skb() return dlc of CAN frame") Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin --- drivers/net/can/dev.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c index de3c589b1ae0..448d1548cca3 100644 --- a/drivers/net/can/dev.c +++ b/drivers/net/can/dev.c @@ -486,9 +486,13 @@ __can_get_echo_skb(struct net_device *dev, unsigned int idx, u8 *len_ptr) */ struct sk_buff *skb = priv->echo_skb[idx]; struct canfd_frame *cf = (struct canfd_frame *)skb->data; - u8 len = cf->len; - *len_ptr = len; + /* get the real payload length for netdev statistics */ + if (cf->can_id & CAN_RTR_FLAG) + *len_ptr = 0; + else + *len_ptr = cf->len; + priv->echo_skb[idx] = NULL; return skb; From 5bde65abe166692dbe1d9ff0c7957aa954515056 Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Wed, 18 Dec 2019 09:39:02 +0100 Subject: [PATCH 029/151] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone() [ Upstream commit 286228d382ba6320f04fa2e7c6fc8d4d92e428f4 ] All user space generated SKBs are owned by a socket (unless injected into the key via AF_PACKET). If a socket is closed, all associated skbs will be cleaned up. This leads to a problem when a CAN driver calls can_put_echo_skb() on a unshared SKB. If the socket is closed prior to the TX complete handler, can_get_echo_skb() and the subsequent delivering of the echo SKB to all registered callbacks, a SKB with a refcount of 0 is delivered. To avoid the problem, in can_get_echo_skb() the original SKB is now always cloned, regardless of shared SKB or not. If the process exists it can now safely discard its SKBs, without disturbing the delivery of the echo SKB. The problem shows up in the j1939 stack, when it clones the incoming skb, which detects the already 0 refcount. We can easily reproduce this with following example: testj1939 -B -r can0: & cansend can0 1823ff40#0123 WARNING: CPU: 0 PID: 293 at lib/refcount.c:25 refcount_warn_saturate+0x108/0x174 refcount_t: addition on 0; use-after-free. Modules linked in: coda_vpu imx_vdoa videobuf2_vmalloc dw_hdmi_ahb_audio vcan CPU: 0 PID: 293 Comm: cansend Not tainted 5.5.0-rc6-00376-g9e20dcb7040d #1 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Backtrace: [] (dump_backtrace) from [] (show_stack+0x20/0x24) [] (show_stack) from [] (dump_stack+0x8c/0xa0) [] (dump_stack) from [] (__warn+0xe0/0x108) [] (__warn) from [] (warn_slowpath_fmt+0xa8/0xcc) [] (warn_slowpath_fmt) from [] (refcount_warn_saturate+0x108/0x174) [] (refcount_warn_saturate) from [] (j1939_can_recv+0x20c/0x210) [] (j1939_can_recv) from [] (can_rcv_filter+0xb4/0x268) [] (can_rcv_filter) from [] (can_receive+0xb0/0xe4) [] (can_receive) from [] (can_rcv+0x48/0x98) [] (can_rcv) from [] (__netif_receive_skb_one_core+0x64/0x88) [] (__netif_receive_skb_one_core) from [] (__netif_receive_skb+0x38/0x94) [] (__netif_receive_skb) from [] (netif_receive_skb_internal+0x64/0xf8) [] (netif_receive_skb_internal) from [] (netif_receive_skb+0x34/0x19c) [] (netif_receive_skb) from [] (can_rx_offload_napi_poll+0x58/0xb4) Fixes: 0ae89beb283a ("can: add destructor for self generated skbs") Signed-off-by: Oleksij Rempel Link: http://lore.kernel.org/r/20200124132656.22156-1-o.rempel@pengutronix.de Acked-by: Oliver Hartkopp Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin --- include/linux/can/skb.h | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/include/linux/can/skb.h b/include/linux/can/skb.h index a954def26c0d..0783b0c6d9e2 100644 --- a/include/linux/can/skb.h +++ b/include/linux/can/skb.h @@ -61,21 +61,17 @@ static inline void can_skb_set_owner(struct sk_buff *skb, struct sock *sk) */ static inline struct sk_buff *can_create_echo_skb(struct sk_buff *skb) { - if (skb_shared(skb)) { - struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC); + struct sk_buff *nskb; - if (likely(nskb)) { - can_skb_set_owner(nskb, skb->sk); - consume_skb(skb); - return nskb; - } else { - kfree_skb(skb); - return NULL; - } + nskb = skb_clone(skb, GFP_ATOMIC); + if (unlikely(!nskb)) { + kfree_skb(skb); + return NULL; } - /* we can assume to have an unshared skb with proper owner */ - return skb; + can_skb_set_owner(nskb, skb->sk); + consume_skb(skb); + return nskb; } #endif /* !_CAN_SKB_H */ From 0ab4c839409a2dc3aeb7b017b83b54774b461862 Mon Sep 17 00:00:00 2001 From: Yegor Yefremov Date: Thu, 22 Oct 2020 10:37:08 +0200 Subject: [PATCH 030/151] can: j1939: swap addr and pgn in the send example [ Upstream commit ea780d39b1888ed5afc243c29b23d9bdb3828c7a ] The address was wrongly assigned to the PGN field and vice versa. Signed-off-by: Yegor Yefremov Link: https://lore.kernel.org/r/20201022083708.8755-1-yegorslists@googlemail.com Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin --- Documentation/networking/j1939.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/networking/j1939.rst b/Documentation/networking/j1939.rst index f5be243d250a..4b0db514b201 100644 --- a/Documentation/networking/j1939.rst +++ b/Documentation/networking/j1939.rst @@ -414,8 +414,8 @@ Send: .can_family = AF_CAN, .can_addr.j1939 = { .name = J1939_NO_NAME; - .pgn = 0x30, - .addr = 0x12300, + .addr = 0x30, + .pgn = 0x12300, }, }; From b9c4a9a07c4a07b2badc606efb4e2643201defcb Mon Sep 17 00:00:00 2001 From: Zhang Changzhong Date: Mon, 7 Sep 2020 14:31:48 +0800 Subject: [PATCH 031/151] can: j1939: j1939_sk_bind(): return failure if netdev is down [ Upstream commit 08c487d8d807535f509ed80c6a10ad90e6872139 ] When a netdev down event occurs after a successful call to j1939_sk_bind(), j1939_netdev_notify() can handle it correctly. But if the netdev already in down state before calling j1939_sk_bind(), j1939_sk_release() will stay in wait_event_interruptible() blocked forever. Because in this case, j1939_netdev_notify() won't be called and j1939_tp_txtimer() won't call j1939_session_cancel() or other function to clear session for ENETDOWN error, this lead to mismatch of j1939_session_get/put() and jsk->skb_pending will never decrease to zero. To reproduce it use following commands: 1. ip link add dev vcan0 type vcan 2. j1939acd -r 100,80-120 1122334455667788 vcan0 3. presses ctrl-c and thread will be blocked forever This patch adds check for ndev->flags in j1939_sk_bind() to avoid this kind of situation and return with -ENETDOWN. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Signed-off-by: Zhang Changzhong Link: https://lore.kernel.org/r/1599460308-18770-1-git-send-email-zhangchangzhong@huawei.com Acked-by: Oleksij Rempel Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin --- net/can/j1939/socket.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c index bf9fd6ee88fe..047090960539 100644 --- a/net/can/j1939/socket.c +++ b/net/can/j1939/socket.c @@ -475,6 +475,12 @@ static int j1939_sk_bind(struct socket *sock, struct sockaddr *uaddr, int len) goto out_release_sock; } + if (!(ndev->flags & IFF_UP)) { + dev_put(ndev); + ret = -ENETDOWN; + goto out_release_sock; + } + priv = j1939_netdev_start(ndev); dev_put(ndev); if (IS_ERR(priv)) { From 51920ca7519c4f546cd30cc66bb569999c6a2579 Mon Sep 17 00:00:00 2001 From: Zhang Changzhong Date: Fri, 17 Jul 2020 16:04:39 +0800 Subject: [PATCH 032/151] can: ti_hecc: ti_hecc_probe(): add missed clk_disable_unprepare() in error path [ Upstream commit e002103b36a695f7cb6048b96da73e66c86ddffb ] The driver forgets to call clk_disable_unprepare() in error path after a success calling for clk_prepare_enable(). Fix it by adding a clk_disable_unprepare() in error path. Signed-off-by: Zhang Changzhong Link: https://lore.kernel.org/r/1594973079-27743-1-git-send-email-zhangchangzhong@huawei.com Fixes: befa60113ce7 ("can: ti_hecc: add missing prepare and unprepare of the clock") Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin --- drivers/net/can/ti_hecc.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/can/ti_hecc.c b/drivers/net/can/ti_hecc.c index 31ad364a89bb..d3a7631eecaf 100644 --- a/drivers/net/can/ti_hecc.c +++ b/drivers/net/can/ti_hecc.c @@ -936,7 +936,7 @@ static int ti_hecc_probe(struct platform_device *pdev) err = clk_prepare_enable(priv->clk); if (err) { dev_err(&pdev->dev, "clk_prepare_enable() failed\n"); - goto probe_exit_clk; + goto probe_exit_release_clk; } priv->offload.mailbox_read = ti_hecc_mailbox_read; @@ -945,7 +945,7 @@ static int ti_hecc_probe(struct platform_device *pdev) err = can_rx_offload_add_timestamp(ndev, &priv->offload); if (err) { dev_err(&pdev->dev, "can_rx_offload_add_timestamp() failed\n"); - goto probe_exit_clk; + goto probe_exit_disable_clk; } err = register_candev(ndev); @@ -963,7 +963,9 @@ static int ti_hecc_probe(struct platform_device *pdev) probe_exit_offload: can_rx_offload_del(&priv->offload); -probe_exit_clk: +probe_exit_disable_clk: + clk_disable_unprepare(priv->clk); +probe_exit_release_clk: clk_put(priv->clk); probe_exit_candev: free_candev(ndev); From d6c34afab0ed93343726c34a8f6a63718979b0f7 Mon Sep 17 00:00:00 2001 From: Navid Emamdoost Date: Thu, 4 Jun 2020 22:32:39 -0500 Subject: [PATCH 033/151] can: xilinx_can: handle failure cases of pm_runtime_get_sync [ Upstream commit 79c43333bdd5a7026a5aab606b53053b643585e7 ] Calling pm_runtime_get_sync increments the counter even in case of failure, causing incorrect ref count. Call pm_runtime_put if pm_runtime_get_sync fails. Signed-off-by: Navid Emamdoost Link: https://lore.kernel.org/r/20200605033239.60664-1-navid.emamdoost@gmail.com Fixes: 4716620d1b62 ("can: xilinx: Convert to runtime_pm") Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin --- drivers/net/can/xilinx_can.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c index 2be846ee627d..0de39ebb3566 100644 --- a/drivers/net/can/xilinx_can.c +++ b/drivers/net/can/xilinx_can.c @@ -1384,7 +1384,7 @@ static int xcan_open(struct net_device *ndev) if (ret < 0) { netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n", __func__, ret); - return ret; + goto err; } ret = request_irq(ndev->irq, xcan_interrupt, priv->irq_flags, @@ -1468,6 +1468,7 @@ static int xcan_get_berr_counter(const struct net_device *ndev, if (ret < 0) { netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n", __func__, ret); + pm_runtime_put(priv->dev); return ret; } @@ -1783,7 +1784,7 @@ static int xcan_probe(struct platform_device *pdev) if (ret < 0) { netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n", __func__, ret); - goto err_pmdisable; + goto err_disableclks; } if (priv->read_reg(priv, XCAN_SR_OFFSET) != XCAN_SR_CONFIG_MASK) { @@ -1818,7 +1819,6 @@ static int xcan_probe(struct platform_device *pdev) err_disableclks: pm_runtime_put(priv->dev); -err_pmdisable: pm_runtime_disable(&pdev->dev); err_free: free_candev(ndev); From 44b2c4beff8aff69c2823296ad99cd38b8c34dc0 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Thu, 13 Aug 2020 17:06:04 +0300 Subject: [PATCH 034/151] can: peak_usb: add range checking in decode operations [ Upstream commit a6921dd524fe31d1f460c161d3526a407533b6db ] These values come from skb->data so Smatch considers them untrusted. I believe Smatch is correct but I don't have a way to test this. The usb_if->dev[] array has 2 elements but the index is in the 0-15 range without checks. The cfd->len can be up to 255 but the maximum valid size is CANFD_MAX_DLEN (64) so that could lead to memory corruption. Fixes: 0a25e1f4f185 ("can: peak_usb: add support for PEAK new CANFD USB adapters") Signed-off-by: Dan Carpenter Link: https://lore.kernel.org/r/20200813140604.GA456946@mwanda Acked-by: Stephane Grosjean Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin --- drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 48 +++++++++++++++++----- 1 file changed, 37 insertions(+), 11 deletions(-) diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c index 47cc1ff5b88e..dee3e689b54d 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c +++ b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c @@ -468,12 +468,18 @@ static int pcan_usb_fd_decode_canmsg(struct pcan_usb_fd_if *usb_if, struct pucan_msg *rx_msg) { struct pucan_rx_msg *rm = (struct pucan_rx_msg *)rx_msg; - struct peak_usb_device *dev = usb_if->dev[pucan_msg_get_channel(rm)]; - struct net_device *netdev = dev->netdev; + struct peak_usb_device *dev; + struct net_device *netdev; struct canfd_frame *cfd; struct sk_buff *skb; const u16 rx_msg_flags = le16_to_cpu(rm->flags); + if (pucan_msg_get_channel(rm) >= ARRAY_SIZE(usb_if->dev)) + return -ENOMEM; + + dev = usb_if->dev[pucan_msg_get_channel(rm)]; + netdev = dev->netdev; + if (rx_msg_flags & PUCAN_MSG_EXT_DATA_LEN) { /* CANFD frame case */ skb = alloc_canfd_skb(netdev, &cfd); @@ -519,15 +525,21 @@ static int pcan_usb_fd_decode_status(struct pcan_usb_fd_if *usb_if, struct pucan_msg *rx_msg) { struct pucan_status_msg *sm = (struct pucan_status_msg *)rx_msg; - struct peak_usb_device *dev = usb_if->dev[pucan_stmsg_get_channel(sm)]; - struct pcan_usb_fd_device *pdev = - container_of(dev, struct pcan_usb_fd_device, dev); + struct pcan_usb_fd_device *pdev; enum can_state new_state = CAN_STATE_ERROR_ACTIVE; enum can_state rx_state, tx_state; - struct net_device *netdev = dev->netdev; + struct peak_usb_device *dev; + struct net_device *netdev; struct can_frame *cf; struct sk_buff *skb; + if (pucan_stmsg_get_channel(sm) >= ARRAY_SIZE(usb_if->dev)) + return -ENOMEM; + + dev = usb_if->dev[pucan_stmsg_get_channel(sm)]; + pdev = container_of(dev, struct pcan_usb_fd_device, dev); + netdev = dev->netdev; + /* nothing should be sent while in BUS_OFF state */ if (dev->can.state == CAN_STATE_BUS_OFF) return 0; @@ -579,9 +591,14 @@ static int pcan_usb_fd_decode_error(struct pcan_usb_fd_if *usb_if, struct pucan_msg *rx_msg) { struct pucan_error_msg *er = (struct pucan_error_msg *)rx_msg; - struct peak_usb_device *dev = usb_if->dev[pucan_ermsg_get_channel(er)]; - struct pcan_usb_fd_device *pdev = - container_of(dev, struct pcan_usb_fd_device, dev); + struct pcan_usb_fd_device *pdev; + struct peak_usb_device *dev; + + if (pucan_ermsg_get_channel(er) >= ARRAY_SIZE(usb_if->dev)) + return -EINVAL; + + dev = usb_if->dev[pucan_ermsg_get_channel(er)]; + pdev = container_of(dev, struct pcan_usb_fd_device, dev); /* keep a trace of tx and rx error counters for later use */ pdev->bec.txerr = er->tx_err_cnt; @@ -595,11 +612,17 @@ static int pcan_usb_fd_decode_overrun(struct pcan_usb_fd_if *usb_if, struct pucan_msg *rx_msg) { struct pcan_ufd_ovr_msg *ov = (struct pcan_ufd_ovr_msg *)rx_msg; - struct peak_usb_device *dev = usb_if->dev[pufd_omsg_get_channel(ov)]; - struct net_device *netdev = dev->netdev; + struct peak_usb_device *dev; + struct net_device *netdev; struct can_frame *cf; struct sk_buff *skb; + if (pufd_omsg_get_channel(ov) >= ARRAY_SIZE(usb_if->dev)) + return -EINVAL; + + dev = usb_if->dev[pufd_omsg_get_channel(ov)]; + netdev = dev->netdev; + /* allocate an skb to store the error frame */ skb = alloc_can_err_skb(netdev, &cf); if (!skb) @@ -716,6 +739,9 @@ static int pcan_usb_fd_encode_msg(struct peak_usb_device *dev, u16 tx_msg_size, tx_msg_flags; u8 can_dlc; + if (cfd->len > CANFD_MAX_DLEN) + return -EINVAL; + tx_msg_size = ALIGN(sizeof(struct pucan_tx_msg) + cfd->len, 4); tx_msg->size = cpu_to_le16(tx_msg_size); tx_msg->type = cpu_to_le16(PUCAN_MSG_CAN_TX); From a23ee99566122f407ab0d7d950c689442d0472c6 Mon Sep 17 00:00:00 2001 From: Stephane Grosjean Date: Wed, 14 Oct 2020 10:56:31 +0200 Subject: [PATCH 035/151] can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping [ Upstream commit ecc7b4187dd388549544195fb13a11b4ea8e6a84 ] Fabian Inostroza has discovered a potential problem in the hardware timestamp reporting from the PCAN-USB USB CAN interface (only), related to the fact that a timestamp of an event may precede the timestamp used for synchronization when both records are part of the same USB packet. However, this case was used to detect the wrapping of the time counter. This patch details and fixes the two identified cases where this problem can occur. Reported-by: Fabian Inostroza Signed-off-by: Stephane Grosjean Link: https://lore.kernel.org/r/20201014085631.15128-1-s.grosjean@peak-system.com Fixes: bb4785551f64 ("can: usb: PEAK-System Technik USB adapters driver core") Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin --- drivers/net/can/usb/peak_usb/pcan_usb_core.c | 51 ++++++++++++++++++-- 1 file changed, 46 insertions(+), 5 deletions(-) diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.c b/drivers/net/can/usb/peak_usb/pcan_usb_core.c index 0b7766b715fd..c844c6abe5fc 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb_core.c +++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.c @@ -130,14 +130,55 @@ void peak_usb_get_ts_time(struct peak_time_ref *time_ref, u32 ts, ktime_t *time) /* protect from getting time before setting now */ if (ktime_to_ns(time_ref->tv_host)) { u64 delta_us; + s64 delta_ts = 0; - delta_us = ts - time_ref->ts_dev_2; - if (ts < time_ref->ts_dev_2) - delta_us &= (1 << time_ref->adapter->ts_used_bits) - 1; + /* General case: dev_ts_1 < dev_ts_2 < ts, with: + * + * - dev_ts_1 = previous sync timestamp + * - dev_ts_2 = last sync timestamp + * - ts = event timestamp + * - ts_period = known sync period (theoretical) + * ~ dev_ts2 - dev_ts1 + * *but*: + * + * - time counters wrap (see adapter->ts_used_bits) + * - sometimes, dev_ts_1 < ts < dev_ts2 + * + * "normal" case (sync time counters increase): + * must take into account case when ts wraps (tsw) + * + * < ts_period > < > + * | | | + * ---+--------+----+-------0-+--+--> + * ts_dev_1 | ts_dev_2 | + * ts tsw + */ + if (time_ref->ts_dev_1 < time_ref->ts_dev_2) { + /* case when event time (tsw) wraps */ + if (ts < time_ref->ts_dev_1) + delta_ts = 1 << time_ref->adapter->ts_used_bits; - delta_us += time_ref->ts_total; + /* Otherwise, sync time counter (ts_dev_2) has wrapped: + * handle case when event time (tsn) hasn't. + * + * < ts_period > < > + * | | | + * ---+--------+--0-+---------+--+--> + * ts_dev_1 | ts_dev_2 | + * tsn ts + */ + } else if (time_ref->ts_dev_1 < ts) { + delta_ts = -(1 << time_ref->adapter->ts_used_bits); + } - delta_us *= time_ref->adapter->us_per_ts_scale; + /* add delay between last sync and event timestamps */ + delta_ts += (signed int)(ts - time_ref->ts_dev_2); + + /* add time from beginning to last sync */ + delta_ts += time_ref->ts_total; + + /* convert ticks number into microseconds */ + delta_us = delta_ts * time_ref->adapter->us_per_ts_scale; delta_us >>= time_ref->adapter->us_per_ts_shift; *time = ktime_add_us(time_ref->tv_host_0, delta_us); From 56c56af0a3a185fcff17d42f8b1f38fdfc3a4d86 Mon Sep 17 00:00:00 2001 From: Stephane Grosjean Date: Tue, 13 Oct 2020 17:39:47 +0200 Subject: [PATCH 036/151] can: peak_canfd: pucan_handle_can_rx(): fix echo management when loopback is on [ Upstream commit 93ef65e5a6357cc7381f85fcec9283fe29970045 ] Echo management is driven by PUCAN_MSG_LOOPED_BACK bit, while loopback frames are identified with PUCAN_MSG_SELF_RECEIVE bit. Those bits are set for each outgoing frame written to the IP core so that a copy of each one will be placed into the rx path. Thus, - when PUCAN_MSG_LOOPED_BACK is set then the rx frame is an echo of a previously sent frame, - when PUCAN_MSG_LOOPED_BACK+PUCAN_MSG_SELF_RECEIVE are set, then the rx frame is an echo AND a loopback frame. Therefore, this frame must be put into the socket rx path too. This patch fixes how CAN frames are handled when these are sent while the can interface is configured in "loopback on" mode. Signed-off-by: Stephane Grosjean Link: https://lore.kernel.org/r/20201013153947.28012-1-s.grosjean@peak-system.com Fixes: 8ac8321e4a79 ("can: peak: add support for PEAK PCAN-PCIe FD CAN-FD boards") Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin --- drivers/net/can/peak_canfd/peak_canfd.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/net/can/peak_canfd/peak_canfd.c b/drivers/net/can/peak_canfd/peak_canfd.c index 6b0c6a99fc8d..91b156b2123a 100644 --- a/drivers/net/can/peak_canfd/peak_canfd.c +++ b/drivers/net/can/peak_canfd/peak_canfd.c @@ -248,8 +248,7 @@ static int pucan_handle_can_rx(struct peak_canfd_priv *priv, cf_len = get_can_dlc(pucan_msg_get_dlc(msg)); /* if this frame is an echo, */ - if ((rx_msg_flags & PUCAN_MSG_LOOPED_BACK) && - !(rx_msg_flags & PUCAN_MSG_SELF_RECEIVE)) { + if (rx_msg_flags & PUCAN_MSG_LOOPED_BACK) { unsigned long flags; spin_lock_irqsave(&priv->echo_lock, flags); @@ -263,7 +262,13 @@ static int pucan_handle_can_rx(struct peak_canfd_priv *priv, netif_wake_queue(priv->ndev); spin_unlock_irqrestore(&priv->echo_lock, flags); - return 0; + + /* if this frame is only an echo, stop here. Otherwise, + * continue to push this application self-received frame into + * its own rx queue. + */ + if (!(rx_msg_flags & PUCAN_MSG_SELF_RECEIVE)) + return 0; } /* otherwise, it should be pushed into rx fifo */ From 0b657367309e617be56bcef12ddcd3a5add6b06f Mon Sep 17 00:00:00 2001 From: Joakim Zhang Date: Tue, 20 Oct 2020 23:53:55 +0800 Subject: [PATCH 037/151] can: flexcan: remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A [ Upstream commit 018799649071a1638c0c130526af36747df4355a ] After double check with Layerscape CAN owner (Pankaj Bansal), confirm that LS1021A doesn't support ECC feature, so remove FLEXCAN_QUIRK_DISABLE_MECR quirk. Fixes: 99b7668c04b27 ("can: flexcan: adding platform specific details for LS1021A") Cc: Pankaj Bansal Signed-off-by: Joakim Zhang Link: https://lore.kernel.org/r/20201020155402.30318-4-qiangqing.zhang@nxp.com Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin --- drivers/net/can/flexcan.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c index d59c6c87164f..fbf812cf4f5d 100644 --- a/drivers/net/can/flexcan.c +++ b/drivers/net/can/flexcan.c @@ -321,8 +321,7 @@ static const struct flexcan_devtype_data fsl_vf610_devtype_data = { static const struct flexcan_devtype_data fsl_ls1021a_r2_devtype_data = { .quirks = FLEXCAN_QUIRK_DISABLE_RXFG | FLEXCAN_QUIRK_ENABLE_EACEN_RRS | - FLEXCAN_QUIRK_DISABLE_MECR | FLEXCAN_QUIRK_BROKEN_PERR_STATE | - FLEXCAN_QUIRK_USE_OFF_TIMESTAMP, + FLEXCAN_QUIRK_BROKEN_PERR_STATE | FLEXCAN_QUIRK_USE_OFF_TIMESTAMP, }; static const struct can_bittiming_const flexcan_bittiming_const = { From 66ce8bfad6f65e63cddeba7b358e3227a6776c3b Mon Sep 17 00:00:00 2001 From: Joakim Zhang Date: Wed, 21 Oct 2020 02:45:27 +0800 Subject: [PATCH 038/151] can: flexcan: flexcan_remove(): disable wakeup completely [ Upstream commit ab07ff1c92fa60f29438e655a1b4abab860ed0b6 ] With below sequence, we can see wakeup default is enabled after re-load module, if it was enabled before, so we need disable wakeup in flexcan_remove(). | # cat /sys/bus/platform/drivers/flexcan/5a8e0000.can/power/wakeup | disabled | # echo enabled > /sys/bus/platform/drivers/flexcan/5a8e0000.can/power/wakeup | # cat /sys/bus/platform/drivers/flexcan/5a8e0000.can/power/wakeup | enabled | # rmmod flexcan | # modprobe flexcan | # cat /sys/bus/platform/drivers/flexcan/5a8e0000.can/power/wakeup | enabled Fixes: de3578c198c6 ("can: flexcan: add self wakeup support") Fixes: 915f9666421c ("can: flexcan: add support for DT property 'wakeup-source'") Signed-off-by: Joakim Zhang Link: https://lore.kernel.org/r/20201020184527.8190-1-qiangqing.zhang@nxp.com [mkl: streamlined commit message] Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin --- drivers/net/can/flexcan.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c index fbf812cf4f5d..130f3022d339 100644 --- a/drivers/net/can/flexcan.c +++ b/drivers/net/can/flexcan.c @@ -1676,6 +1676,8 @@ static int flexcan_remove(struct platform_device *pdev) { struct net_device *dev = platform_get_drvdata(pdev); + device_set_wakeup_enable(&pdev->dev, false); + device_set_wakeup_capable(&pdev->dev, false); unregister_flexcandev(dev); pm_runtime_disable(&pdev->dev); free_candev(dev); From 2f6cbef32718532ba50efa6f52879800b551add4 Mon Sep 17 00:00:00 2001 From: Brian Foster Date: Thu, 29 Oct 2020 14:30:48 -0700 Subject: [PATCH 039/151] xfs: flush new eof page on truncate to avoid post-eof corruption [ Upstream commit 869ae85dae64b5540e4362d7fe4cd520e10ec05c ] It is possible to expose non-zeroed post-EOF data in XFS if the new EOF page is dirty, backed by an unwritten block and the truncate happens to race with writeback. iomap_truncate_page() will not zero the post-EOF portion of the page if the underlying block is unwritten. The subsequent call to truncate_setsize() will, but doesn't dirty the page. Therefore, if writeback happens to complete after iomap_truncate_page() (so it still sees the unwritten block) but before truncate_setsize(), the cached page becomes inconsistent with the on-disk block. A mapped read after the associated page is reclaimed or invalidated exposes non-zero post-EOF data. For example, consider the following sequence when run on a kernel modified to explicitly flush the new EOF page within the race window: $ xfs_io -fc "falloc 0 4k" -c fsync /mnt/file $ xfs_io -c "pwrite 0 4k" -c "truncate 1k" /mnt/file ... $ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file 00000400: 00 00 00 00 00 00 00 00 ........ $ umount /mnt/; mount /mnt/ $ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file 00000400: cd cd cd cd cd cd cd cd ........ Update xfs_setattr_size() to explicitly flush the new EOF page prior to the page truncate to ensure iomap has the latest state of the underlying block. Fixes: 68a9f5e7007c ("xfs: implement iomap based buffered write path") Signed-off-by: Brian Foster Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Sasha Levin --- fs/xfs/xfs_iops.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index fe285d123d69..dec511823fcb 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -885,6 +885,16 @@ xfs_setattr_size( error = iomap_zero_range(inode, oldsize, newsize - oldsize, &did_zeroing, &xfs_iomap_ops); } else { + /* + * iomap won't detect a dirty page over an unwritten block (or a + * cow block over a hole) and subsequently skips zeroing the + * newly post-EOF portion of the page. Flush the new EOF to + * convert the block before the pagecache truncate. + */ + error = filemap_write_and_wait_range(inode->i_mapping, newsize, + newsize); + if (error) + return error; error = iomap_truncate_page(inode, newsize, &did_zeroing, &xfs_iomap_ops); } From 0685eb84ad56483af1ef8cb4ab439c4e1946edf6 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Mon, 2 Nov 2020 17:14:07 -0800 Subject: [PATCH 040/151] xfs: fix scrub flagging rtinherit even if there is no rt device [ Upstream commit c1f6b1ac00756a7108e5fcb849a2f8230c0b62a5 ] The kernel has always allowed directories to have the rtinherit flag set, even if there is no rt device, so this check is wrong. Fixes: 80e4e1268802 ("xfs: scrub inodes") Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Sasha Levin --- fs/xfs/scrub/inode.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c index 6d483ab29e63..1bea029b634a 100644 --- a/fs/xfs/scrub/inode.c +++ b/fs/xfs/scrub/inode.c @@ -121,8 +121,7 @@ xchk_inode_flags( goto bad; /* rt flags require rt device */ - if ((flags & (XFS_DIFLAG_REALTIME | XFS_DIFLAG_RTINHERIT)) && - !mp->m_rtdev_targp) + if ((flags & XFS_DIFLAG_REALTIME) && !mp->m_rtdev_targp) goto bad; /* new rt bitmap flag only valid for rbmino */ From 327af342ca9b28f6144ac575800c624c218f88ef Mon Sep 17 00:00:00 2001 From: Tyler Hicks Date: Wed, 28 Oct 2020 10:41:02 -0500 Subject: [PATCH 041/151] tpm: efi: Don't create binary_bios_measurements file for an empty log MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 8ffd778aff45be760292225049e0141255d4ad6e ] Mimic the pre-existing ACPI and Device Tree event log behavior by not creating the binary_bios_measurements file when the EFI TPM event log is empty. This fixes the following NULL pointer dereference that can occur when reading /sys/kernel/security/tpm0/binary_bios_measurements after the kernel received an empty event log from the firmware: BUG: kernel NULL pointer dereference, address: 000000000000002c #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 2 PID: 3932 Comm: fwupdtpmevlog Not tainted 5.9.0-00003-g629990edad62 #17 Hardware name: LENOVO 20LCS03L00/20LCS03L00, BIOS N27ET38W (1.24 ) 11/28/2019 RIP: 0010:tpm2_bios_measurements_start+0x3a/0x550 Code: 54 53 48 83 ec 68 48 8b 57 70 48 8b 1e 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 8b 82 c0 06 00 00 48 8b 8a c8 06 00 00 <44> 8b 60 1c 48 89 4d a0 4c 89 e2 49 83 c4 20 48 83 fb 00 75 2a 49 RSP: 0018:ffffa9c901203db0 EFLAGS: 00010246 RAX: 0000000000000010 RBX: 0000000000000000 RCX: 0000000000000010 RDX: ffff8ba1eb99c000 RSI: ffff8ba1e4ce8280 RDI: ffff8ba1e4ce8258 RBP: ffffa9c901203e40 R08: ffffa9c901203dd8 R09: ffff8ba1ec443300 R10: ffffa9c901203e50 R11: 0000000000000000 R12: ffff8ba1e4ce8280 R13: ffffa9c901203ef0 R14: ffffa9c901203ef0 R15: ffff8ba1e4ce8258 FS: 00007f6595460880(0000) GS:ffff8ba1ef880000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000002c CR3: 00000007d8d18003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? __kmalloc_node+0x113/0x320 ? kvmalloc_node+0x31/0x80 seq_read+0x94/0x420 vfs_read+0xa7/0x190 ksys_read+0xa7/0xe0 __x64_sys_read+0x1a/0x20 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 In this situation, the bios_event_log pointer in the tpm_bios_log struct was not NULL but was equal to the ZERO_SIZE_PTR (0x10) value. This was due to the following kmemdup() in tpm_read_log_efi(): int tpm_read_log_efi(struct tpm_chip *chip) { ... /* malloc EventLog space */ log->bios_event_log = kmemdup(log_tbl->log, log_size, GFP_KERNEL); if (!log->bios_event_log) { ret = -ENOMEM; goto out; } ... } When log_size is zero, due to an empty event log from firmware, ZERO_SIZE_PTR is returned from kmemdup(). Upon a read of the binary_bios_measurements file, the tpm2_bios_measurements_start() function does not perform a ZERO_OR_NULL_PTR() check on the bios_event_log pointer before dereferencing it. Rather than add a ZERO_OR_NULL_PTR() check in functions that make use of the bios_event_log pointer, simply avoid creating the binary_bios_measurements_file as is done in other event log retrieval backends. Explicitly ignore all of the events in the final event log when the main event log is empty. The list of events in the final event log cannot be accurately parsed without referring to the first event in the main event log (the event log header) so the final event log is useless in such a situation. Fixes: 58cc1e4faf10 ("tpm: parse TPM event logs based on EFI table") Link: https://lore.kernel.org/linux-integrity/E1FDCCCB-CA51-4AEE-AC83-9CDE995EAE52@canonical.com/ Reported-by: Kai-Heng Feng Reported-by: Kenneth R. Crudup Reported-by: Mimi Zohar Cc: Thiébaud Weksteen Cc: Ard Biesheuvel Signed-off-by: Tyler Hicks Reviewed-by: Jarkko Sakkinen Signed-off-by: Jarkko Sakkinen Signed-off-by: Sasha Levin --- drivers/char/tpm/eventlog/efi.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/char/tpm/eventlog/efi.c b/drivers/char/tpm/eventlog/efi.c index 6bb023de17f1..35229e5143ca 100644 --- a/drivers/char/tpm/eventlog/efi.c +++ b/drivers/char/tpm/eventlog/efi.c @@ -41,6 +41,11 @@ int tpm_read_log_efi(struct tpm_chip *chip) log_size = log_tbl->size; memunmap(log_tbl); + if (!log_size) { + pr_warn("UEFI TPM log area empty\n"); + return -EIO; + } + log_tbl = memremap(efi.tpm_log, sizeof(*log_tbl) + log_size, MEMREMAP_WB); if (!log_tbl) { From 213e1238caccfa8a0d1aa04967486cd8f3025c16 Mon Sep 17 00:00:00 2001 From: George Spelvin Date: Sun, 9 Aug 2020 06:57:44 +0000 Subject: [PATCH 042/151] random32: make prandom_u32() output unpredictable commit c51f8f88d705e06bd696d7510aff22b33eb8e638 upstream. Non-cryptographic PRNGs may have great statistical properties, but are usually trivially predictable to someone who knows the algorithm, given a small sample of their output. An LFSR like prandom_u32() is particularly simple, even if the sample is widely scattered bits. It turns out the network stack uses prandom_u32() for some things like random port numbers which it would prefer are *not* trivially predictable. Predictability led to a practical DNS spoofing attack. Oops. This patch replaces the LFSR with a homebrew cryptographic PRNG based on the SipHash round function, which is in turn seeded with 128 bits of strong random key. (The authors of SipHash have *not* been consulted about this abuse of their algorithm.) Speed is prioritized over security; attacks are rare, while performance is always wanted. Replacing all callers of prandom_u32() is the quick fix. Whether to reinstate a weaker PRNG for uses which can tolerate it is an open question. Commit f227e3ec3b5c ("random32: update the net random state on interrupt and activity") was an earlier attempt at a solution. This patch replaces it. Reported-by: Amit Klein Cc: Willy Tarreau Cc: Eric Dumazet Cc: "Jason A. Donenfeld" Cc: Andy Lutomirski Cc: Kees Cook Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Linus Torvalds Cc: tytso@mit.edu Cc: Florian Westphal Cc: Marc Plumb Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and activity") Signed-off-by: George Spelvin Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/ [ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions to prandom.h for later use; merged George's prandom_seed() proposal; inlined siprand_u32(); replaced the net_rand_state[] array with 4 members to fix a build issue; cosmetic cleanups to make checkpatch happy; fixed RANDOM32_SELFTEST build ] Signed-off-by: Willy Tarreau [wt: backported to 5.4 -- no tracepoint there] Signed-off-by: Greg Kroah-Hartman --- drivers/char/random.c | 1 - include/linux/prandom.h | 36 +++- kernel/time/timer.c | 7 - lib/random32.c | 462 ++++++++++++++++++++++++---------------- 4 files changed, 317 insertions(+), 189 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 75a8f7f57269..2c29f83ae3d5 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1330,7 +1330,6 @@ void add_interrupt_randomness(int irq, int irq_flags) fast_mix(fast_pool); add_interrupt_bench(cycles); - this_cpu_add(net_rand_state.s1, fast_pool->pool[cycles & 3]); if (unlikely(crng_init == 0)) { if ((fast_pool->count >= 64) && diff --git a/include/linux/prandom.h b/include/linux/prandom.h index aa16e6468f91..cc1e71334e53 100644 --- a/include/linux/prandom.h +++ b/include/linux/prandom.h @@ -16,12 +16,44 @@ void prandom_bytes(void *buf, size_t nbytes); void prandom_seed(u32 seed); void prandom_reseed_late(void); +#if BITS_PER_LONG == 64 +/* + * The core SipHash round function. Each line can be executed in + * parallel given enough CPU resources. + */ +#define PRND_SIPROUND(v0, v1, v2, v3) ( \ + v0 += v1, v1 = rol64(v1, 13), v2 += v3, v3 = rol64(v3, 16), \ + v1 ^= v0, v0 = rol64(v0, 32), v3 ^= v2, \ + v0 += v3, v3 = rol64(v3, 21), v2 += v1, v1 = rol64(v1, 17), \ + v3 ^= v0, v1 ^= v2, v2 = rol64(v2, 32) \ +) + +#define PRND_K0 (0x736f6d6570736575 ^ 0x6c7967656e657261) +#define PRND_K1 (0x646f72616e646f6d ^ 0x7465646279746573) + +#elif BITS_PER_LONG == 32 +/* + * On 32-bit machines, we use HSipHash, a reduced-width version of SipHash. + * This is weaker, but 32-bit machines are not used for high-traffic + * applications, so there is less output for an attacker to analyze. + */ +#define PRND_SIPROUND(v0, v1, v2, v3) ( \ + v0 += v1, v1 = rol32(v1, 5), v2 += v3, v3 = rol32(v3, 8), \ + v1 ^= v0, v0 = rol32(v0, 16), v3 ^= v2, \ + v0 += v3, v3 = rol32(v3, 7), v2 += v1, v1 = rol32(v1, 13), \ + v3 ^= v0, v1 ^= v2, v2 = rol32(v2, 16) \ +) +#define PRND_K0 0x6c796765 +#define PRND_K1 0x74656462 + +#else +#error Unsupported BITS_PER_LONG +#endif + struct rnd_state { __u32 s1, s2, s3, s4; }; -DECLARE_PER_CPU(struct rnd_state, net_rand_state); - u32 prandom_u32_state(struct rnd_state *state); void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes); void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state); diff --git a/kernel/time/timer.c b/kernel/time/timer.c index a3ae244b1bcd..87fa73cdb90f 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1743,13 +1743,6 @@ void update_process_times(int user_tick) scheduler_tick(); if (IS_ENABLED(CONFIG_POSIX_TIMERS)) run_posix_cpu_timers(); - - /* The current CPU might make use of net randoms without receiving IRQs - * to renew them often enough. Let's update the net_rand_state from a - * non-constant value that's not affine to the number of calls to make - * sure it's updated when there's some activity (we don't care in idle). - */ - this_cpu_add(net_rand_state.s1, rol32(jiffies, 24) + user_tick); } /** diff --git a/lib/random32.c b/lib/random32.c index 1786f78bf4c5..9085b1172015 100644 --- a/lib/random32.c +++ b/lib/random32.c @@ -40,16 +40,6 @@ #include #include -#ifdef CONFIG_RANDOM32_SELFTEST -static void __init prandom_state_selftest(void); -#else -static inline void prandom_state_selftest(void) -{ -} -#endif - -DEFINE_PER_CPU(struct rnd_state, net_rand_state) __latent_entropy; - /** * prandom_u32_state - seeded pseudo-random number generator. * @state: pointer to state structure holding seeded state. @@ -69,25 +59,6 @@ u32 prandom_u32_state(struct rnd_state *state) } EXPORT_SYMBOL(prandom_u32_state); -/** - * prandom_u32 - pseudo random number generator - * - * A 32 bit pseudo-random number is generated using a fast - * algorithm suitable for simulation. This algorithm is NOT - * considered safe for cryptographic use. - */ -u32 prandom_u32(void) -{ - struct rnd_state *state = &get_cpu_var(net_rand_state); - u32 res; - - res = prandom_u32_state(state); - put_cpu_var(net_rand_state); - - return res; -} -EXPORT_SYMBOL(prandom_u32); - /** * prandom_bytes_state - get the requested number of pseudo-random bytes * @@ -119,20 +90,6 @@ void prandom_bytes_state(struct rnd_state *state, void *buf, size_t bytes) } EXPORT_SYMBOL(prandom_bytes_state); -/** - * prandom_bytes - get the requested number of pseudo-random bytes - * @buf: where to copy the pseudo-random bytes to - * @bytes: the requested number of bytes - */ -void prandom_bytes(void *buf, size_t bytes) -{ - struct rnd_state *state = &get_cpu_var(net_rand_state); - - prandom_bytes_state(state, buf, bytes); - put_cpu_var(net_rand_state); -} -EXPORT_SYMBOL(prandom_bytes); - static void prandom_warmup(struct rnd_state *state) { /* Calling RNG ten times to satisfy recurrence condition */ @@ -148,96 +105,6 @@ static void prandom_warmup(struct rnd_state *state) prandom_u32_state(state); } -static u32 __extract_hwseed(void) -{ - unsigned int val = 0; - - (void)(arch_get_random_seed_int(&val) || - arch_get_random_int(&val)); - - return val; -} - -static void prandom_seed_early(struct rnd_state *state, u32 seed, - bool mix_with_hwseed) -{ -#define LCG(x) ((x) * 69069U) /* super-duper LCG */ -#define HWSEED() (mix_with_hwseed ? __extract_hwseed() : 0) - state->s1 = __seed(HWSEED() ^ LCG(seed), 2U); - state->s2 = __seed(HWSEED() ^ LCG(state->s1), 8U); - state->s3 = __seed(HWSEED() ^ LCG(state->s2), 16U); - state->s4 = __seed(HWSEED() ^ LCG(state->s3), 128U); -} - -/** - * prandom_seed - add entropy to pseudo random number generator - * @entropy: entropy value - * - * Add some additional entropy to the prandom pool. - */ -void prandom_seed(u32 entropy) -{ - int i; - /* - * No locking on the CPUs, but then somewhat random results are, well, - * expected. - */ - for_each_possible_cpu(i) { - struct rnd_state *state = &per_cpu(net_rand_state, i); - - state->s1 = __seed(state->s1 ^ entropy, 2U); - prandom_warmup(state); - } -} -EXPORT_SYMBOL(prandom_seed); - -/* - * Generate some initially weak seeding values to allow - * to start the prandom_u32() engine. - */ -static int __init prandom_init(void) -{ - int i; - - prandom_state_selftest(); - - for_each_possible_cpu(i) { - struct rnd_state *state = &per_cpu(net_rand_state, i); - u32 weak_seed = (i + jiffies) ^ random_get_entropy(); - - prandom_seed_early(state, weak_seed, true); - prandom_warmup(state); - } - - return 0; -} -core_initcall(prandom_init); - -static void __prandom_timer(struct timer_list *unused); - -static DEFINE_TIMER(seed_timer, __prandom_timer); - -static void __prandom_timer(struct timer_list *unused) -{ - u32 entropy; - unsigned long expires; - - get_random_bytes(&entropy, sizeof(entropy)); - prandom_seed(entropy); - - /* reseed every ~60 seconds, in [40 .. 80) interval with slack */ - expires = 40 + prandom_u32_max(40); - seed_timer.expires = jiffies + msecs_to_jiffies(expires * MSEC_PER_SEC); - - add_timer(&seed_timer); -} - -static void __init __prandom_start_seed_timer(void) -{ - seed_timer.expires = jiffies + msecs_to_jiffies(40 * MSEC_PER_SEC); - add_timer(&seed_timer); -} - void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state) { int i; @@ -257,51 +124,6 @@ void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state) } EXPORT_SYMBOL(prandom_seed_full_state); -/* - * Generate better values after random number generator - * is fully initialized. - */ -static void __prandom_reseed(bool late) -{ - unsigned long flags; - static bool latch = false; - static DEFINE_SPINLOCK(lock); - - /* Asking for random bytes might result in bytes getting - * moved into the nonblocking pool and thus marking it - * as initialized. In this case we would double back into - * this function and attempt to do a late reseed. - * Ignore the pointless attempt to reseed again if we're - * already waiting for bytes when the nonblocking pool - * got initialized. - */ - - /* only allow initial seeding (late == false) once */ - if (!spin_trylock_irqsave(&lock, flags)) - return; - - if (latch && !late) - goto out; - - latch = true; - prandom_seed_full_state(&net_rand_state); -out: - spin_unlock_irqrestore(&lock, flags); -} - -void prandom_reseed_late(void) -{ - __prandom_reseed(true); -} - -static int __init prandom_reseed(void) -{ - __prandom_reseed(false); - __prandom_start_seed_timer(); - return 0; -} -late_initcall(prandom_reseed); - #ifdef CONFIG_RANDOM32_SELFTEST static struct prandom_test1 { u32 seed; @@ -421,7 +243,28 @@ static struct prandom_test2 { { 407983964U, 921U, 728767059U }, }; -static void __init prandom_state_selftest(void) +static u32 __extract_hwseed(void) +{ + unsigned int val = 0; + + (void)(arch_get_random_seed_int(&val) || + arch_get_random_int(&val)); + + return val; +} + +static void prandom_seed_early(struct rnd_state *state, u32 seed, + bool mix_with_hwseed) +{ +#define LCG(x) ((x) * 69069U) /* super-duper LCG */ +#define HWSEED() (mix_with_hwseed ? __extract_hwseed() : 0) + state->s1 = __seed(HWSEED() ^ LCG(seed), 2U); + state->s2 = __seed(HWSEED() ^ LCG(state->s1), 8U); + state->s3 = __seed(HWSEED() ^ LCG(state->s2), 16U); + state->s4 = __seed(HWSEED() ^ LCG(state->s3), 128U); +} + +static int __init prandom_state_selftest(void) { int i, j, errors = 0, runs = 0; bool error = false; @@ -461,5 +304,266 @@ static void __init prandom_state_selftest(void) pr_warn("prandom: %d/%d self tests failed\n", errors, runs); else pr_info("prandom: %d self tests passed\n", runs); + return 0; } +core_initcall(prandom_state_selftest); #endif + +/* + * The prandom_u32() implementation is now completely separate from the + * prandom_state() functions, which are retained (for now) for compatibility. + * + * Because of (ab)use in the networking code for choosing random TCP/UDP port + * numbers, which open DoS possibilities if guessable, we want something + * stronger than a standard PRNG. But the performance requirements of + * the network code do not allow robust crypto for this application. + * + * So this is a homebrew Junior Spaceman implementation, based on the + * lowest-latency trustworthy crypto primitive available, SipHash. + * (The authors of SipHash have not been consulted about this abuse of + * their work.) + * + * Standard SipHash-2-4 uses 2n+4 rounds to hash n words of input to + * one word of output. This abbreviated version uses 2 rounds per word + * of output. + */ + +struct siprand_state { + unsigned long v0; + unsigned long v1; + unsigned long v2; + unsigned long v3; +}; + +static DEFINE_PER_CPU(struct siprand_state, net_rand_state) __latent_entropy; + +/* + * This is the core CPRNG function. As "pseudorandom", this is not used + * for truly valuable things, just intended to be a PITA to guess. + * For maximum speed, we do just two SipHash rounds per word. This is + * the same rate as 4 rounds per 64 bits that SipHash normally uses, + * so hopefully it's reasonably secure. + * + * There are two changes from the official SipHash finalization: + * - We omit some constants XORed with v2 in the SipHash spec as irrelevant; + * they are there only to make the output rounds distinct from the input + * rounds, and this application has no input rounds. + * - Rather than returning v0^v1^v2^v3, return v1+v3. + * If you look at the SipHash round, the last operation on v3 is + * "v3 ^= v0", so "v0 ^ v3" just undoes that, a waste of time. + * Likewise "v1 ^= v2". (The rotate of v2 makes a difference, but + * it still cancels out half of the bits in v2 for no benefit.) + * Second, since the last combining operation was xor, continue the + * pattern of alternating xor/add for a tiny bit of extra non-linearity. + */ +static inline u32 siprand_u32(struct siprand_state *s) +{ + unsigned long v0 = s->v0, v1 = s->v1, v2 = s->v2, v3 = s->v3; + + PRND_SIPROUND(v0, v1, v2, v3); + PRND_SIPROUND(v0, v1, v2, v3); + s->v0 = v0; s->v1 = v1; s->v2 = v2; s->v3 = v3; + return v1 + v3; +} + + +/** + * prandom_u32 - pseudo random number generator + * + * A 32 bit pseudo-random number is generated using a fast + * algorithm suitable for simulation. This algorithm is NOT + * considered safe for cryptographic use. + */ +u32 prandom_u32(void) +{ + struct siprand_state *state = get_cpu_ptr(&net_rand_state); + u32 res = siprand_u32(state); + + put_cpu_ptr(&net_rand_state); + return res; +} +EXPORT_SYMBOL(prandom_u32); + +/** + * prandom_bytes - get the requested number of pseudo-random bytes + * @buf: where to copy the pseudo-random bytes to + * @bytes: the requested number of bytes + */ +void prandom_bytes(void *buf, size_t bytes) +{ + struct siprand_state *state = get_cpu_ptr(&net_rand_state); + u8 *ptr = buf; + + while (bytes >= sizeof(u32)) { + put_unaligned(siprand_u32(state), (u32 *)ptr); + ptr += sizeof(u32); + bytes -= sizeof(u32); + } + + if (bytes > 0) { + u32 rem = siprand_u32(state); + + do { + *ptr++ = (u8)rem; + rem >>= BITS_PER_BYTE; + } while (--bytes > 0); + } + put_cpu_ptr(&net_rand_state); +} +EXPORT_SYMBOL(prandom_bytes); + +/** + * prandom_seed - add entropy to pseudo random number generator + * @entropy: entropy value + * + * Add some additional seed material to the prandom pool. + * The "entropy" is actually our IP address (the only caller is + * the network code), not for unpredictability, but to ensure that + * different machines are initialized differently. + */ +void prandom_seed(u32 entropy) +{ + int i; + + add_device_randomness(&entropy, sizeof(entropy)); + + for_each_possible_cpu(i) { + struct siprand_state *state = per_cpu_ptr(&net_rand_state, i); + unsigned long v0 = state->v0, v1 = state->v1; + unsigned long v2 = state->v2, v3 = state->v3; + + do { + v3 ^= entropy; + PRND_SIPROUND(v0, v1, v2, v3); + PRND_SIPROUND(v0, v1, v2, v3); + v0 ^= entropy; + } while (unlikely(!v0 || !v1 || !v2 || !v3)); + + WRITE_ONCE(state->v0, v0); + WRITE_ONCE(state->v1, v1); + WRITE_ONCE(state->v2, v2); + WRITE_ONCE(state->v3, v3); + } +} +EXPORT_SYMBOL(prandom_seed); + +/* + * Generate some initially weak seeding values to allow + * the prandom_u32() engine to be started. + */ +static int __init prandom_init_early(void) +{ + int i; + unsigned long v0, v1, v2, v3; + + if (!arch_get_random_long(&v0)) + v0 = jiffies; + if (!arch_get_random_long(&v1)) + v1 = random_get_entropy(); + v2 = v0 ^ PRND_K0; + v3 = v1 ^ PRND_K1; + + for_each_possible_cpu(i) { + struct siprand_state *state; + + v3 ^= i; + PRND_SIPROUND(v0, v1, v2, v3); + PRND_SIPROUND(v0, v1, v2, v3); + v0 ^= i; + + state = per_cpu_ptr(&net_rand_state, i); + state->v0 = v0; state->v1 = v1; + state->v2 = v2; state->v3 = v3; + } + + return 0; +} +core_initcall(prandom_init_early); + + +/* Stronger reseeding when available, and periodically thereafter. */ +static void prandom_reseed(struct timer_list *unused); + +static DEFINE_TIMER(seed_timer, prandom_reseed); + +static void prandom_reseed(struct timer_list *unused) +{ + unsigned long expires; + int i; + + /* + * Reinitialize each CPU's PRNG with 128 bits of key. + * No locking on the CPUs, but then somewhat random results are, + * well, expected. + */ + for_each_possible_cpu(i) { + struct siprand_state *state; + unsigned long v0 = get_random_long(), v2 = v0 ^ PRND_K0; + unsigned long v1 = get_random_long(), v3 = v1 ^ PRND_K1; +#if BITS_PER_LONG == 32 + int j; + + /* + * On 32-bit machines, hash in two extra words to + * approximate 128-bit key length. Not that the hash + * has that much security, but this prevents a trivial + * 64-bit brute force. + */ + for (j = 0; j < 2; j++) { + unsigned long m = get_random_long(); + + v3 ^= m; + PRND_SIPROUND(v0, v1, v2, v3); + PRND_SIPROUND(v0, v1, v2, v3); + v0 ^= m; + } +#endif + /* + * Probably impossible in practice, but there is a + * theoretical risk that a race between this reseeding + * and the target CPU writing its state back could + * create the all-zero SipHash fixed point. + * + * To ensure that never happens, ensure the state + * we write contains no zero words. + */ + state = per_cpu_ptr(&net_rand_state, i); + WRITE_ONCE(state->v0, v0 ? v0 : -1ul); + WRITE_ONCE(state->v1, v1 ? v1 : -1ul); + WRITE_ONCE(state->v2, v2 ? v2 : -1ul); + WRITE_ONCE(state->v3, v3 ? v3 : -1ul); + } + + /* reseed every ~60 seconds, in [40 .. 80) interval with slack */ + expires = round_jiffies(jiffies + 40 * HZ + prandom_u32_max(40 * HZ)); + mod_timer(&seed_timer, expires); +} + +/* + * The random ready callback can be called from almost any interrupt. + * To avoid worrying about whether it's safe to delay that interrupt + * long enough to seed all CPUs, just schedule an immediate timer event. + */ +static void prandom_timer_start(struct random_ready_callback *unused) +{ + mod_timer(&seed_timer, jiffies); +} + +/* + * Start periodic full reseeding as soon as strong + * random numbers are available. + */ +static int __init prandom_init_late(void) +{ + static struct random_ready_callback random_ready = { + .func = prandom_timer_start + }; + int ret = add_random_ready_callback(&random_ready); + + if (ret == -EALREADY) { + prandom_timer_start(&random_ready); + ret = 0; + } + return ret; +} +late_initcall(prandom_init_late); From d2cef3bae14b755bc17a29968679b3e95a74909d Mon Sep 17 00:00:00 2001 From: Stephen Boyd Date: Fri, 23 Oct 2020 08:47:50 -0700 Subject: [PATCH 043/151] KVM: arm64: ARM_SMCCC_ARCH_WORKAROUND_1 doesn't return SMCCC_RET_NOT_REQUIRED commit 1de111b51b829bcf01d2e57971f8fd07a665fa3f upstream. According to the SMCCC spec[1](7.5.2 Discovery) the ARM_SMCCC_ARCH_WORKAROUND_1 function id only returns 0, 1, and SMCCC_RET_NOT_SUPPORTED. 0 is "workaround required and safe to call this function" 1 is "workaround not required but safe to call this function" SMCCC_RET_NOT_SUPPORTED is "might be vulnerable or might not be, who knows, I give up!" SMCCC_RET_NOT_SUPPORTED might as well mean "workaround required, except calling this function may not work because it isn't implemented in some cases". Wonderful. We map this SMC call to 0 is SPECTRE_MITIGATED 1 is SPECTRE_UNAFFECTED SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE For KVM hypercalls (hvc), we've implemented this function id to return SMCCC_RET_NOT_SUPPORTED, 0, and SMCCC_RET_NOT_REQUIRED. One of those isn't supposed to be there. Per the code we call arm64_get_spectre_v2_state() to figure out what to return for this feature discovery call. 0 is SPECTRE_MITIGATED SMCCC_RET_NOT_REQUIRED is SPECTRE_UNAFFECTED SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE Let's clean this up so that KVM tells the guest this mapping: 0 is SPECTRE_MITIGATED 1 is SPECTRE_UNAFFECTED SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE Note: SMCCC_RET_NOT_AFFECTED is 1 but isn't part of the SMCCC spec Fixes: c118bbb52743 ("arm64: KVM: Propagate full Spectre v2 workaround state to KVM guests") Signed-off-by: Stephen Boyd Acked-by: Marc Zyngier Acked-by: Will Deacon Cc: Andre Przywara Cc: Steven Price Cc: Marc Zyngier Cc: stable@vger.kernel.org Link: https://developer.arm.com/documentation/den0028/latest [1] Link: https://lore.kernel.org/r/20201023154751.1973872-1-swboyd@chromium.org Signed-off-by: Will Deacon Signed-off-by: Stephen Boyd Signed-off-by: Greg Kroah-Hartman --- include/linux/arm-smccc.h | 2 ++ virt/kvm/arm/psci.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h index 080012a6f025..157e4a6a83f6 100644 --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -76,6 +76,8 @@ ARM_SMCCC_SMC_32, \ 0, 0x7fff) +#define SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED 1 + #ifndef __ASSEMBLY__ #include diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c index 87927f7e1ee7..48fde38d64c3 100644 --- a/virt/kvm/arm/psci.c +++ b/virt/kvm/arm/psci.c @@ -408,7 +408,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) val = SMCCC_RET_SUCCESS; break; case KVM_BP_HARDEN_NOT_REQUIRED: - val = SMCCC_RET_NOT_REQUIRED; + val = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED; break; } break; From 42501604363fa24d7b4dda33ff87feccf68058dd Mon Sep 17 00:00:00 2001 From: Maxim Levitsky Date: Wed, 11 Nov 2020 14:20:47 +0100 Subject: [PATCH 044/151] KVM: x86: don't expose MSR_IA32_UMWAIT_CONTROL unconditionally [ Upstream commit f4cfcd2d5aea4e96c5d483c476f3057b6b7baf6a ] This msr is only available when the host supports WAITPKG feature. This breaks a nested guest, if the L1 hypervisor is set to ignore unknown msrs, because the only other safety check that the kernel does is that it attempts to read the msr and rejects it if it gets an exception. Cc: stable@vger.kernel.org Fixes: 6e3ba4abce ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL") Signed-off-by: Maxim Levitsky Message-Id: <20200523161455.3940-3-mlevitsk@redhat.com> Reviewed-by: Sean Christopherson Signed-off-by: Paolo Bonzini (cherry picked from commit f4cfcd2d5aea4e96c5d483c476f3057b6b7baf6a use boot_cpu_has for checking the feature) Signed-off-by: Jack Wang Acked-by: Paolo Bonzini Signed-off-by: Sasha Levin --- arch/x86/kvm/x86.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 12e83297ea02..880a24889291 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5235,6 +5235,10 @@ static void kvm_init_msr_list(void) if (!kvm_x86_ops->rdtscp_supported()) continue; break; + case MSR_IA32_UMWAIT_CONTROL: + if (!boot_cpu_has(X86_FEATURE_WAITPKG)) + continue; + break; case MSR_IA32_RTIT_CTL: case MSR_IA32_RTIT_STATUS: if (!kvm_x86_ops->pt_supported()) From b668352c4aade7388d1844f88f20ccfe15fa9485 Mon Sep 17 00:00:00 2001 From: Masashi Honma Date: Sun, 9 Aug 2020 08:32:58 +0900 Subject: [PATCH 045/151] ath9k_htc: Use appropriate rs_datalen type commit 5024f21c159f8c1668f581fff37140741c0b1ba9 upstream. kernel test robot says: drivers/net/wireless/ath/ath9k/htc_drv_txrx.c:987:20: sparse: warning: incorrect type in assignment (different base types) drivers/net/wireless/ath/ath9k/htc_drv_txrx.c:987:20: sparse: expected restricted __be16 [usertype] rs_datalen drivers/net/wireless/ath/ath9k/htc_drv_txrx.c:987:20: sparse: got unsigned short [usertype] drivers/net/wireless/ath/ath9k/htc_drv_txrx.c:988:13: sparse: warning: restricted __be16 degrades to integer drivers/net/wireless/ath/ath9k/htc_drv_txrx.c:1001:13: sparse: warning: restricted __be16 degrades to integer Indeed rs_datalen has host byte order, so modify it's own type. Reported-by: kernel test robot Fixes: cd486e627e67 ("ath9k_htc: Discard undersized packets") Signed-off-by: Masashi Honma Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20200808233258.4596-1-masashi.honma@gmail.com Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/ath/ath9k/htc_drv_txrx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c index 118e5550b10c..628f45c8c06f 100644 --- a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c +++ b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c @@ -973,7 +973,7 @@ static bool ath9k_rx_prepare(struct ath9k_htc_priv *priv, struct ath_htc_rx_status *rxstatus; struct ath_rx_status rx_stats; bool decrypt_error = false; - __be16 rs_datalen; + u16 rs_datalen; bool is_phyerr; if (skb->len < HTC_RX_FRAME_HEADER_SIZE) { From 0fc0befe0bfa512a24124c7525de12d6d621b2b7 Mon Sep 17 00:00:00 2001 From: Srinivas Kandagatla Date: Fri, 23 Oct 2020 10:58:49 +0100 Subject: [PATCH 046/151] ASoC: qcom: sdm845: set driver name correctly [ Upstream commit 3f48b6eba15ea342ef4cb420b580f5ed6605669f ] With the current state of code, we would endup with something like below in /proc/asound/cards for 2 machines based on this driver. Machine 1: 0 [DB845c ]: DB845c - DB845c DB845c Machine 2: 0 [LenovoYOGAC6301]: Lenovo-YOGA-C63 - Lenovo-YOGA-C630-13Q50 LENOVO-81JL-LenovoYOGAC630_13Q50-LNVNB161216 This is not very UCM friendly both w.r.t to common up configs and card identification, and UCM2 became totally not usefull with just one ucm sdm845.conf for two machines which have different setups w.r.t HDMI and other dais. Reasons for such thing is partly because Qualcomm machine drivers never cared to set driver_name. This patch sets up driver name for the this driver to sort out the UCM integration issues! after this patch contents of /proc/asound/cards: Machine 1: 0 [DB845c ]: sdm845 - DB845c DB845c Machine 2: 0 [LenovoYOGAC6301]: sdm845 - Lenovo-YOGA-C630-13Q50 LENOVO-81JL-LenovoYOGAC630_13Q50-LNVNB161216 with this its possible to align with what UCM2 expects and we can have sdm845/DB845.conf sdm845/LENOVO-81JL-LenovoYOGAC630_13Q50-LNVNB161216.conf ... for board variants. This should scale much better! Signed-off-by: Srinivas Kandagatla Link: https://lore.kernel.org/r/20201023095849.22894-1-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/qcom/sdm845.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/soc/qcom/sdm845.c b/sound/soc/qcom/sdm845.c index 7e6c41e63d8e..23e1de61e92e 100644 --- a/sound/soc/qcom/sdm845.c +++ b/sound/soc/qcom/sdm845.c @@ -16,6 +16,7 @@ #include "qdsp6/q6afe.h" #include "../codecs/rt5663.h" +#define DRIVER_NAME "sdm845" #define DEFAULT_SAMPLE_RATE_48K 48000 #define DEFAULT_MCLK_RATE 24576000 #define TDM_BCLK_RATE 6144000 @@ -407,6 +408,7 @@ static int sdm845_snd_platform_probe(struct platform_device *pdev) goto data_alloc_fail; } + card->driver_name = DRIVER_NAME; card->dapm_widgets = sdm845_snd_widgets; card->num_dapm_widgets = ARRAY_SIZE(sdm845_snd_widgets); card->dev = dev; From e22142a9a2a964e8d487c716b46cb5784961e52b Mon Sep 17 00:00:00 2001 From: Olivier Moysan Date: Tue, 20 Oct 2020 17:01:09 +0200 Subject: [PATCH 047/151] ASoC: cs42l51: manage mclk shutdown delay [ Upstream commit 20afe581c9b980848ad097c4d54dde9bec7593ef ] A delay must be introduced before the shutdown down of the mclk, as stated in CS42L51 datasheet. Otherwise the codec may produce some noise after the end of DAPM power down sequence. The delay between DAC and CLOCK_SUPPLY widgets is too short. Add a delay in mclk shutdown request to manage the shutdown delay explicitly. From experiments, at least 10ms delay is necessary. Set delay to 20ms as recommended in Documentation/timers/timers-howto.rst when using msleep(). Signed-off-by: Olivier Moysan Link: https://lore.kernel.org/r/20201020150109.482-1-olivier.moysan@st.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/codecs/cs42l51.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/sound/soc/codecs/cs42l51.c b/sound/soc/codecs/cs42l51.c index 55408c8fcb4e..cdd7ae90c2b5 100644 --- a/sound/soc/codecs/cs42l51.c +++ b/sound/soc/codecs/cs42l51.c @@ -247,8 +247,28 @@ static const struct snd_soc_dapm_widget cs42l51_dapm_widgets[] = { &cs42l51_adcr_mux_controls), }; +static int mclk_event(struct snd_soc_dapm_widget *w, + struct snd_kcontrol *kcontrol, int event) +{ + struct snd_soc_component *comp = snd_soc_dapm_to_component(w->dapm); + struct cs42l51_private *cs42l51 = snd_soc_component_get_drvdata(comp); + + switch (event) { + case SND_SOC_DAPM_PRE_PMU: + return clk_prepare_enable(cs42l51->mclk_handle); + case SND_SOC_DAPM_POST_PMD: + /* Delay mclk shutdown to fulfill power-down sequence requirements */ + msleep(20); + clk_disable_unprepare(cs42l51->mclk_handle); + break; + } + + return 0; +} + static const struct snd_soc_dapm_widget cs42l51_dapm_mclk_widgets[] = { - SND_SOC_DAPM_CLOCK_SUPPLY("MCLK") + SND_SOC_DAPM_SUPPLY("MCLK", SND_SOC_NOPM, 0, 0, mclk_event, + SND_SOC_DAPM_PRE_PMU | SND_SOC_DAPM_POST_PMD), }; static const struct snd_soc_dapm_route cs42l51_routes[] = { From fe2dc1093c6166de95560b658f8441630d9059cb Mon Sep 17 00:00:00 2001 From: Heikki Krogerus Date: Tue, 6 Oct 2020 16:02:50 +0300 Subject: [PATCH 048/151] usb: dwc3: pci: add support for the Intel Alder Lake-S [ Upstream commit 1384ab4fee12c4c4f8bd37bc9f8686881587b286 ] This patch adds the necessary PCI ID for Intel Alder Lake-S devices. Signed-off-by: Heikki Krogerus Signed-off-by: Felipe Balbi Signed-off-by: Sasha Levin --- drivers/usb/dwc3/dwc3-pci.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c index ba88039449e0..58b8801ce881 100644 --- a/drivers/usb/dwc3/dwc3-pci.c +++ b/drivers/usb/dwc3/dwc3-pci.c @@ -40,6 +40,7 @@ #define PCI_DEVICE_ID_INTEL_TGPLP 0xa0ee #define PCI_DEVICE_ID_INTEL_TGPH 0x43ee #define PCI_DEVICE_ID_INTEL_JSP 0x4dee +#define PCI_DEVICE_ID_INTEL_ADLS 0x7ae1 #define PCI_INTEL_BXT_DSM_GUID "732b85d5-b7a7-4a1b-9ba0-4bbd00ffd511" #define PCI_INTEL_BXT_FUNC_PMU_PWR 4 @@ -367,6 +368,9 @@ static const struct pci_device_id dwc3_pci_id_table[] = { { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_JSP), (kernel_ulong_t) &dwc3_pci_intel_properties, }, + { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_ADLS), + (kernel_ulong_t) &dwc3_pci_intel_properties, }, + { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_NL_USB), (kernel_ulong_t) &dwc3_pci_amd_properties, }, { } /* Terminating Entry */ From e6049035419111cacb662a8b919ba1c5b743c5a7 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Thu, 22 Oct 2020 12:26:08 +0530 Subject: [PATCH 049/151] opp: Reduce the size of critical section in _opp_table_kref_release() [ Upstream commit e0df59de670b48a923246fae1f972317b84b2764 ] There is a lot of stuff here which can be done outside of the big opp_table_lock, do that. This helps avoiding few circular dependency lockdeps around debugfs and interconnects. Reported-by: Rob Clark Reported-by: Dmitry Osipenko Signed-off-by: Viresh Kumar Signed-off-by: Sasha Levin --- drivers/opp/core.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 8867bab72e17..088c93dc0085 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -1046,6 +1046,10 @@ static void _opp_table_kref_release(struct kref *kref) struct opp_table *opp_table = container_of(kref, struct opp_table, kref); struct opp_device *opp_dev, *temp; + /* Drop the lock as soon as we can */ + list_del(&opp_table->node); + mutex_unlock(&opp_table_lock); + _of_clear_opp_table(opp_table); /* Release clk */ @@ -1067,10 +1071,7 @@ static void _opp_table_kref_release(struct kref *kref) mutex_destroy(&opp_table->genpd_virt_dev_lock); mutex_destroy(&opp_table->lock); - list_del(&opp_table->node); kfree(opp_table); - - mutex_unlock(&opp_table_lock); } void dev_pm_opp_put_opp_table(struct opp_table *opp_table) From 1737ea0c577565a40476de1d8cc0660a7c504527 Mon Sep 17 00:00:00 2001 From: Evgeny Novikov Date: Fri, 2 Oct 2020 18:01:55 +0300 Subject: [PATCH 050/151] usb: gadget: goku_udc: fix potential crashes in probe [ Upstream commit 0d66e04875c5aae876cf3d4f4be7978fa2b00523 ] goku_probe() goes to error label "err" and invokes goku_remove() in case of failures of pci_enable_device(), pci_resource_start() and ioremap(). goku_remove() gets a device from pci_get_drvdata(pdev) and works with it without any checks, in particular it dereferences a corresponding pointer. But goku_probe() did not set this device yet. So, one can expect various crashes. The patch moves setting the device just after allocation of memory for it. Found by Linux Driver Verification project (linuxtesting.org). Reported-by: Pavel Andrianov Signed-off-by: Evgeny Novikov Signed-off-by: Felipe Balbi Signed-off-by: Sasha Levin --- drivers/usb/gadget/udc/goku_udc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/gadget/udc/goku_udc.c b/drivers/usb/gadget/udc/goku_udc.c index c3721225b61e..b706ad3034bc 100644 --- a/drivers/usb/gadget/udc/goku_udc.c +++ b/drivers/usb/gadget/udc/goku_udc.c @@ -1757,6 +1757,7 @@ static int goku_probe(struct pci_dev *pdev, const struct pci_device_id *id) goto err; } + pci_set_drvdata(pdev, dev); spin_lock_init(&dev->lock); dev->pdev = pdev; dev->gadget.ops = &goku_ops; @@ -1790,7 +1791,6 @@ static int goku_probe(struct pci_dev *pdev, const struct pci_device_id *id) } dev->regs = (struct goku_udc_regs __iomem *) base; - pci_set_drvdata(pdev, dev); INFO(dev, "%s\n", driver_desc); INFO(dev, "version: " DRIVER_VERSION " %s\n", dmastr()); INFO(dev, "irq %d, pci mem %p\n", pdev->irq, base); From 9110e2f2633dc9383a3a4711a0067094f6948783 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Fri, 2 Oct 2020 14:25:01 +0100 Subject: [PATCH 051/151] selftests/ftrace: check for do_sys_openat2 in user-memory test [ Upstream commit e3e40312567087fbe6880f316cb2b0e1f3d8a82c ] More recent libc implementations are now using openat/openat2 system calls so also add do_sys_openat2 to the tracing so that the test passes on these systems because do_sys_open may not be called. Thanks to Masami Hiramatsu for the help on getting this fix to work correctly. Signed-off-by: Colin Ian King Acked-by: Masami Hiramatsu Acked-by: Steven Rostedt (VMware) Signed-off-by: Shuah Khan Signed-off-by: Sasha Levin --- .../selftests/ftrace/test.d/kprobe/kprobe_args_user.tc | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_user.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_user.tc index 0f60087583d8..a753c73d869a 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_user.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_user.tc @@ -11,12 +11,16 @@ grep -A10 "fetcharg:" README | grep -q '\[u\]' || exit_unsupported :;: "user-memory access syntax and ustring working on user memory";: echo 'p:myevent do_sys_open path=+0($arg2):ustring path2=+u0($arg2):string' \ > kprobe_events +echo 'p:myevent2 do_sys_openat2 path=+0($arg2):ustring path2=+u0($arg2):string' \ + >> kprobe_events grep myevent kprobe_events | \ grep -q 'path=+0($arg2):ustring path2=+u0($arg2):string' echo 1 > events/kprobes/myevent/enable +echo 1 > events/kprobes/myevent2/enable echo > /dev/null echo 0 > events/kprobes/myevent/enable +echo 0 > events/kprobes/myevent2/enable grep myevent trace | grep -q 'path="/dev/null" path2="/dev/null"' From 9b7e6b670df7ae6803223fc93c90d268804334e6 Mon Sep 17 00:00:00 2001 From: Tommi Rantala Date: Thu, 8 Oct 2020 15:26:22 +0300 Subject: [PATCH 052/151] selftests: pidfd: fix compilation errors due to wait.h [ Upstream commit 1948172fdba5ad643529ddcd00a601c0caa913ed ] Drop unneeded header inclusion to fix pidfd compilation errors seen in Fedora 32: In file included from pidfd_open_test.c:9: ../../../../usr/include/linux/wait.h:17:16: error: expected identifier before numeric constant 17 | #define P_ALL 0 | ^ Signed-off-by: Tommi Rantala Reviewed-by: Kees Cook Acked-by: Christian Brauner Signed-off-by: Shuah Khan Signed-off-by: Sasha Levin --- tools/testing/selftests/pidfd/pidfd_open_test.c | 1 - tools/testing/selftests/pidfd/pidfd_poll_test.c | 1 - 2 files changed, 2 deletions(-) diff --git a/tools/testing/selftests/pidfd/pidfd_open_test.c b/tools/testing/selftests/pidfd/pidfd_open_test.c index b9fe75fc3e51..8a59438ccc78 100644 --- a/tools/testing/selftests/pidfd/pidfd_open_test.c +++ b/tools/testing/selftests/pidfd/pidfd_open_test.c @@ -6,7 +6,6 @@ #include #include #include -#include #include #include #include diff --git a/tools/testing/selftests/pidfd/pidfd_poll_test.c b/tools/testing/selftests/pidfd/pidfd_poll_test.c index 4b115444dfe9..610811275357 100644 --- a/tools/testing/selftests/pidfd/pidfd_poll_test.c +++ b/tools/testing/selftests/pidfd/pidfd_poll_test.c @@ -3,7 +3,6 @@ #define _GNU_SOURCE #include #include -#include #include #include #include From 0a4c091673cad1fd32f0ad7b5518228f37cc4290 Mon Sep 17 00:00:00 2001 From: Kai-Heng Feng Date: Tue, 27 Oct 2020 21:00:37 +0800 Subject: [PATCH 053/151] ALSA: hda: Separate runtime and system suspend [ Upstream commit f5dac54d9d93826a776dffc848df76746f7135bb ] Both pm_runtime_force_suspend() and pm_runtime_force_resume() have some implicit checks, so it can make code flow more straightforward if we separate runtime and system suspend callbacks. High Definition Audio Specification, 4.5.9.3 Codec Wake From System S3 states that codec can wake the system up from S3 if WAKEEN is toggled. Since HDA controller has different wakeup settings for runtime and system susend, we also need to explicitly disable direct-complete which can be enabled automatically by PCI core. In addition to that, avoid waking up codec if runtime resume is for system suspend, to not break direct-complete for codecs. While at it, also remove AZX_DCAPS_SUSPEND_SPURIOUS_WAKEUP, as the original bug commit a6630529aecb ("ALSA: hda: Workaround for spurious wakeups on some Intel platforms") solves doesn't happen with this patch. Signed-off-by: Kai-Heng Feng Link: https://lore.kernel.org/r/20201027130038.16463-3-kai.heng.feng@canonical.com Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin --- sound/pci/hda/hda_controller.h | 3 +- sound/pci/hda/hda_intel.c | 62 +++++++++++++++++++--------------- 2 files changed, 36 insertions(+), 29 deletions(-) diff --git a/sound/pci/hda/hda_controller.h b/sound/pci/hda/hda_controller.h index a356fb0e5773..9da7a06d024f 100644 --- a/sound/pci/hda/hda_controller.h +++ b/sound/pci/hda/hda_controller.h @@ -41,7 +41,7 @@ /* 24 unused */ #define AZX_DCAPS_COUNT_LPIB_DELAY (1 << 25) /* Take LPIB as delay */ #define AZX_DCAPS_PM_RUNTIME (1 << 26) /* runtime PM support */ -#define AZX_DCAPS_SUSPEND_SPURIOUS_WAKEUP (1 << 27) /* Workaround for spurious wakeups after suspend */ +/* 27 unused */ #define AZX_DCAPS_CORBRP_SELF_CLEAR (1 << 28) /* CORBRP clears itself after reset */ #define AZX_DCAPS_NO_MSI64 (1 << 29) /* Stick to 32-bit MSIs */ #define AZX_DCAPS_SEPARATE_STREAM_TAG (1 << 30) /* capture and playback use separate stream tag */ @@ -143,6 +143,7 @@ struct azx { unsigned int align_buffer_size:1; unsigned int region_requested:1; unsigned int disabled:1; /* disabled by vga_switcheroo */ + unsigned int pm_prepared:1; /* GTS present */ unsigned int gts_present:1; diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 9a1968932b78..ab32d4811c9e 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -295,8 +295,7 @@ enum { /* PCH for HSW/BDW; with runtime PM */ /* no i915 binding for this as HSW/BDW has another controller for HDMI */ #define AZX_DCAPS_INTEL_PCH \ - (AZX_DCAPS_INTEL_PCH_BASE | AZX_DCAPS_PM_RUNTIME |\ - AZX_DCAPS_SUSPEND_SPURIOUS_WAKEUP) + (AZX_DCAPS_INTEL_PCH_BASE | AZX_DCAPS_PM_RUNTIME) /* HSW HDMI */ #define AZX_DCAPS_INTEL_HASWELL \ @@ -984,7 +983,7 @@ static void __azx_runtime_suspend(struct azx *chip) display_power(chip, false); } -static void __azx_runtime_resume(struct azx *chip, bool from_rt) +static void __azx_runtime_resume(struct azx *chip) { struct hda_intel *hda = container_of(chip, struct hda_intel, chip); struct hdac_bus *bus = azx_bus(chip); @@ -1001,7 +1000,8 @@ static void __azx_runtime_resume(struct azx *chip, bool from_rt) azx_init_pci(chip); hda_intel_init_chip(chip, true); - if (from_rt) { + /* Avoid codec resume if runtime resume is for system suspend */ + if (!chip->pm_prepared) { list_for_each_codec(codec, &chip->bus) { if (codec->relaxed_resume) continue; @@ -1017,6 +1017,29 @@ static void __azx_runtime_resume(struct azx *chip, bool from_rt) } #ifdef CONFIG_PM_SLEEP +static int azx_prepare(struct device *dev) +{ + struct snd_card *card = dev_get_drvdata(dev); + struct azx *chip; + + chip = card->private_data; + chip->pm_prepared = 1; + + /* HDA controller always requires different WAKEEN for runtime suspend + * and system suspend, so don't use direct-complete here. + */ + return 0; +} + +static void azx_complete(struct device *dev) +{ + struct snd_card *card = dev_get_drvdata(dev); + struct azx *chip; + + chip = card->private_data; + chip->pm_prepared = 0; +} + static int azx_suspend(struct device *dev) { struct snd_card *card = dev_get_drvdata(dev); @@ -1028,15 +1051,7 @@ static int azx_suspend(struct device *dev) chip = card->private_data; bus = azx_bus(chip); - snd_power_change_state(card, SNDRV_CTL_POWER_D3hot); - /* An ugly workaround: direct call of __azx_runtime_suspend() and - * __azx_runtime_resume() for old Intel platforms that suffer from - * spurious wakeups after S3 suspend - */ - if (chip->driver_caps & AZX_DCAPS_SUSPEND_SPURIOUS_WAKEUP) - __azx_runtime_suspend(chip); - else - pm_runtime_force_suspend(dev); + __azx_runtime_suspend(chip); if (bus->irq >= 0) { free_irq(bus->irq, chip); bus->irq = -1; @@ -1064,11 +1079,7 @@ static int azx_resume(struct device *dev) if (azx_acquire_irq(chip, 1) < 0) return -EIO; - if (chip->driver_caps & AZX_DCAPS_SUSPEND_SPURIOUS_WAKEUP) - __azx_runtime_resume(chip, false); - else - pm_runtime_force_resume(dev); - snd_power_change_state(card, SNDRV_CTL_POWER_D0); + __azx_runtime_resume(chip); trace_azx_resume(chip); return 0; @@ -1116,10 +1127,7 @@ static int azx_runtime_suspend(struct device *dev) chip = card->private_data; /* enable controller wake up event */ - if (snd_power_get_state(card) == SNDRV_CTL_POWER_D0) { - azx_writew(chip, WAKEEN, azx_readw(chip, WAKEEN) | - STATESTS_INT_MASK); - } + azx_writew(chip, WAKEEN, azx_readw(chip, WAKEEN) | STATESTS_INT_MASK); __azx_runtime_suspend(chip); trace_azx_runtime_suspend(chip); @@ -1130,18 +1138,14 @@ static int azx_runtime_resume(struct device *dev) { struct snd_card *card = dev_get_drvdata(dev); struct azx *chip; - bool from_rt = snd_power_get_state(card) == SNDRV_CTL_POWER_D0; if (!azx_is_pm_ready(card)) return 0; chip = card->private_data; - __azx_runtime_resume(chip, from_rt); + __azx_runtime_resume(chip); /* disable controller Wake Up event*/ - if (from_rt) { - azx_writew(chip, WAKEEN, azx_readw(chip, WAKEEN) & - ~STATESTS_INT_MASK); - } + azx_writew(chip, WAKEEN, azx_readw(chip, WAKEEN) & ~STATESTS_INT_MASK); trace_azx_runtime_resume(chip); return 0; @@ -1175,6 +1179,8 @@ static int azx_runtime_idle(struct device *dev) static const struct dev_pm_ops azx_pm = { SET_SYSTEM_SLEEP_PM_OPS(azx_suspend, azx_resume) #ifdef CONFIG_PM_SLEEP + .prepare = azx_prepare, + .complete = azx_complete, .freeze_noirq = azx_freeze_noirq, .thaw_noirq = azx_thaw_noirq, #endif From 42eaa22aaf2e802f96255b0ae0f601e6bd6d9232 Mon Sep 17 00:00:00 2001 From: Kai-Heng Feng Date: Tue, 27 Oct 2020 21:00:38 +0800 Subject: [PATCH 054/151] ALSA: hda: Reinstate runtime_allow() for all hda controllers [ Upstream commit 9fc149c3bce7bdbb94948a8e6bd025e3b3538603 ] The broken jack detection should be fixed by commit a6e7d0a4bdb0 ("ALSA: hda: fix jack detection with Realtek codecs when in D3"), let's try enabling runtime PM by default again. Signed-off-by: Kai-Heng Feng Link: https://lore.kernel.org/r/20201027130038.16463-4-kai.heng.feng@canonical.com Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin --- sound/pci/hda/hda_intel.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index ab32d4811c9e..192e580561ef 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2328,6 +2328,7 @@ static int azx_probe_continue(struct azx *chip) if (azx_has_pm_runtime(chip)) { pm_runtime_use_autosuspend(&pci->dev); + pm_runtime_allow(&pci->dev); pm_runtime_put_autosuspend(&pci->dev); } From 99dcfc517d17c3ec15a3438176a39565f1222204 Mon Sep 17 00:00:00 2001 From: Bob Peterson Date: Tue, 27 Oct 2020 10:10:01 -0500 Subject: [PATCH 055/151] gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free [ Upstream commit d0f17d3883f1e3f085d38572c2ea8edbd5150172 ] Function gfs2_clear_rgrpd calls kfree(rgd->rd_bits) before calling return_all_reservations, but return_all_reservations still dereferences rgd->rd_bits in __rs_deltree. Fix that by moving the call to kfree below the call to return_all_reservations. Signed-off-by: Bob Peterson Signed-off-by: Andreas Gruenbacher Signed-off-by: Sasha Levin --- fs/gfs2/rgrp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index 2466bb44a23c..e23735084ad1 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -736,9 +736,9 @@ void gfs2_clear_rgrpd(struct gfs2_sbd *sdp) } gfs2_free_clones(rgd); + return_all_reservations(rgd); kfree(rgd->rd_bits); rgd->rd_bits = NULL; - return_all_reservations(rgd); kmem_cache_free(gfs2_rgrpd_cachep, rgd); } } From edeff05a1f10891a07a944928f20f9d4c5848c80 Mon Sep 17 00:00:00 2001 From: Bob Peterson Date: Tue, 27 Oct 2020 10:10:02 -0500 Subject: [PATCH 056/151] gfs2: Add missing truncate_inode_pages_final for sd_aspace [ Upstream commit a9dd945ccef07a904e412f208f8de708a3d7159e ] Gfs2 creates an address space for its rgrps called sd_aspace, but it never called truncate_inode_pages_final on it. This confused vfs greatly which tried to reference the address space after gfs2 had freed the superblock that contained it. This patch adds a call to truncate_inode_pages_final for sd_aspace, thus avoiding the use-after-free. Signed-off-by: Bob Peterson Signed-off-by: Andreas Gruenbacher Signed-off-by: Sasha Levin --- fs/gfs2/super.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 5935ce5ae563..50c925d9c610 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -689,6 +689,7 @@ restart: gfs2_jindex_free(sdp); /* Take apart glock structures and buffer lists */ gfs2_gl_hash_clear(sdp); + truncate_inode_pages_final(&sdp->sd_aspace); gfs2_delete_debugfs_file(sdp); /* Unmount the locking protocol */ gfs2_lm_unmount(sdp); From 325455358e54b894c356133b0001586eb330259b Mon Sep 17 00:00:00 2001 From: Bob Peterson Date: Wed, 28 Oct 2020 13:42:18 -0500 Subject: [PATCH 057/151] gfs2: check for live vs. read-only file system in gfs2_fitrim [ Upstream commit c5c68724696e7d2f8db58a5fce3673208d35c485 ] Before this patch, gfs2_fitrim was not properly checking for a "live" file system. If the file system had something to trim and the file system was read-only (or spectator) it would start the trim, but when it starts the transaction, gfs2_trans_begin returns -EROFS (read-only file system) and it errors out. However, if the file system was already trimmed so there's no work to do, it never called gfs2_trans_begin. That code is bypassed so it never returns the error. Instead, it returns a good return code with 0 work. All this makes for inconsistent behavior: The same fstrim command can return -EROFS in one case and 0 in another. This tripped up xfstests generic/537 which reports the error as: +fstrim with unrecovered metadata just ate your filesystem This patch adds a check for a "live" (iow, active journal, iow, RW) file system, and if not, returns the error properly. Signed-off-by: Bob Peterson Signed-off-by: Andreas Gruenbacher Signed-off-by: Sasha Levin --- fs/gfs2/rgrp.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index e23735084ad1..3d5aa0c10a4c 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -1410,6 +1410,9 @@ int gfs2_fitrim(struct file *filp, void __user *argp) if (!capable(CAP_SYS_ADMIN)) return -EPERM; + if (!test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags)) + return -EROFS; + if (!blk_queue_discard(q)) return -EOPNOTSUPP; From 7f6df0b085cef919718af4599deebbb9cec7d87c Mon Sep 17 00:00:00 2001 From: Keita Suzuki Date: Tue, 27 Oct 2020 07:31:24 +0000 Subject: [PATCH 058/151] scsi: hpsa: Fix memory leak in hpsa_init_one() [ Upstream commit af61bc1e33d2c0ec22612b46050f5b58ac56a962 ] When hpsa_scsi_add_host() fails, h->lastlogicals is leaked since it is missing a free() in the error handler. Fix this by adding free() when hpsa_scsi_add_host() fails. Link: https://lore.kernel.org/r/20201027073125.14229-1-keitasuzuki.park@sslab.ics.keio.ac.jp Tested-by: Don Brace Acked-by: Don Brace Signed-off-by: Keita Suzuki Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/hpsa.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c index e67cb4561aac..bac705990a96 100644 --- a/drivers/scsi/hpsa.c +++ b/drivers/scsi/hpsa.c @@ -8854,7 +8854,7 @@ reinit_after_soft_reset: /* hook into SCSI subsystem */ rc = hpsa_scsi_add_host(h); if (rc) - goto clean7; /* perf, sg, cmd, irq, shost, pci, lu, aer/h */ + goto clean8; /* lastlogicals, perf, sg, cmd, irq, shost, pci, lu, aer/h */ /* Monitor the controller for firmware lockups */ h->heartbeat_sample_interval = HEARTBEAT_SAMPLE_INTERVAL; @@ -8869,6 +8869,8 @@ reinit_after_soft_reset: HPSA_EVENT_MONITOR_INTERVAL); return 0; +clean8: /* lastlogicals, perf, sg, cmd, irq, shost, pci, lu, aer/h */ + kfree(h->lastlogicals); clean7: /* perf, sg, cmd, irq, shost, pci, lu, aer/h */ hpsa_free_performant_mode(h); h->access.set_intr_mask(h, HPSA_INTR_OFF); From f449b902badbb9791a7305476a1103b3854d2c3d Mon Sep 17 00:00:00 2001 From: Evan Quan Date: Wed, 28 Oct 2020 15:29:59 +0800 Subject: [PATCH 059/151] drm/amdgpu: perform srbm soft reset always on SDMA resume [ Upstream commit 253475c455eb5f8da34faa1af92709e7bb414624 ] This can address the random SDMA hang after pci config reset seen on Hawaii. Signed-off-by: Evan Quan Tested-by: Sandeep Raghuraman Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c index 4af9acc2dc4f..450ad7d5e21a 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c @@ -1071,22 +1071,19 @@ static int cik_sdma_soft_reset(void *handle) { u32 srbm_soft_reset = 0; struct amdgpu_device *adev = (struct amdgpu_device *)handle; - u32 tmp = RREG32(mmSRBM_STATUS2); + u32 tmp; - if (tmp & SRBM_STATUS2__SDMA_BUSY_MASK) { - /* sdma0 */ - tmp = RREG32(mmSDMA0_F32_CNTL + SDMA0_REGISTER_OFFSET); - tmp |= SDMA0_F32_CNTL__HALT_MASK; - WREG32(mmSDMA0_F32_CNTL + SDMA0_REGISTER_OFFSET, tmp); - srbm_soft_reset |= SRBM_SOFT_RESET__SOFT_RESET_SDMA_MASK; - } - if (tmp & SRBM_STATUS2__SDMA1_BUSY_MASK) { - /* sdma1 */ - tmp = RREG32(mmSDMA0_F32_CNTL + SDMA1_REGISTER_OFFSET); - tmp |= SDMA0_F32_CNTL__HALT_MASK; - WREG32(mmSDMA0_F32_CNTL + SDMA1_REGISTER_OFFSET, tmp); - srbm_soft_reset |= SRBM_SOFT_RESET__SOFT_RESET_SDMA1_MASK; - } + /* sdma0 */ + tmp = RREG32(mmSDMA0_F32_CNTL + SDMA0_REGISTER_OFFSET); + tmp |= SDMA0_F32_CNTL__HALT_MASK; + WREG32(mmSDMA0_F32_CNTL + SDMA0_REGISTER_OFFSET, tmp); + srbm_soft_reset |= SRBM_SOFT_RESET__SOFT_RESET_SDMA_MASK; + + /* sdma1 */ + tmp = RREG32(mmSDMA0_F32_CNTL + SDMA1_REGISTER_OFFSET); + tmp |= SDMA0_F32_CNTL__HALT_MASK; + WREG32(mmSDMA0_F32_CNTL + SDMA1_REGISTER_OFFSET, tmp); + srbm_soft_reset |= SRBM_SOFT_RESET__SOFT_RESET_SDMA1_MASK; if (srbm_soft_reset) { tmp = RREG32(mmSRBM_SOFT_RESET); From 48083640a47bf66fdc21a664de5083f5e7be0092 Mon Sep 17 00:00:00 2001 From: Evan Quan Date: Fri, 16 Oct 2020 10:45:26 +0800 Subject: [PATCH 060/151] drm/amd/pm: perform SMC reset on suspend/hibernation [ Upstream commit 277b080f98803cb73a83fb234f0be83a10e63958 ] So that the succeeding resume can be performed based on a clean state. Signed-off-by: Evan Quan Tested-by: Sandeep Raghuraman Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c | 4 ++++ drivers/gpu/drm/amd/powerplay/inc/hwmgr.h | 1 + drivers/gpu/drm/amd/powerplay/inc/smumgr.h | 2 ++ .../gpu/drm/amd/powerplay/smumgr/ci_smumgr.c | 24 +++++++++++++++++++ drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c | 8 +++++++ 5 files changed, 39 insertions(+) diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c index 35e6cbe805eb..7cde55854b65 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c @@ -1533,6 +1533,10 @@ int smu7_disable_dpm_tasks(struct pp_hwmgr *hwmgr) PP_ASSERT_WITH_CODE((tmp_result == 0), "Failed to reset to default!", result = tmp_result); + tmp_result = smum_stop_smc(hwmgr); + PP_ASSERT_WITH_CODE((tmp_result == 0), + "Failed to stop smc!", result = tmp_result); + tmp_result = smu7_force_switch_to_arbf0(hwmgr); PP_ASSERT_WITH_CODE((tmp_result == 0), "Failed to force to switch arbf0!", result = tmp_result); diff --git a/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h b/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h index 7bf9a14bfa0b..f6490a128438 100644 --- a/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h +++ b/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h @@ -229,6 +229,7 @@ struct pp_smumgr_func { bool (*is_hw_avfs_present)(struct pp_hwmgr *hwmgr); int (*update_dpm_settings)(struct pp_hwmgr *hwmgr, void *profile_setting); int (*smc_table_manager)(struct pp_hwmgr *hwmgr, uint8_t *table, uint16_t table_id, bool rw); /*rw: true for read, false for write */ + int (*stop_smc)(struct pp_hwmgr *hwmgr); }; struct pp_hwmgr_func { diff --git a/drivers/gpu/drm/amd/powerplay/inc/smumgr.h b/drivers/gpu/drm/amd/powerplay/inc/smumgr.h index c5288831aa15..05a55e850b5e 100644 --- a/drivers/gpu/drm/amd/powerplay/inc/smumgr.h +++ b/drivers/gpu/drm/amd/powerplay/inc/smumgr.h @@ -114,4 +114,6 @@ extern int smum_update_dpm_settings(struct pp_hwmgr *hwmgr, void *profile_settin extern int smum_smc_table_manager(struct pp_hwmgr *hwmgr, uint8_t *table, uint16_t table_id, bool rw); +extern int smum_stop_smc(struct pp_hwmgr *hwmgr); + #endif diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c b/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c index 09a3d8ae4449..0f4f27a89020 100644 --- a/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c +++ b/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c @@ -2936,6 +2936,29 @@ static int ci_update_smc_table(struct pp_hwmgr *hwmgr, uint32_t type) return 0; } +static void ci_reset_smc(struct pp_hwmgr *hwmgr) +{ + PHM_WRITE_INDIRECT_FIELD(hwmgr->device, CGS_IND_REG__SMC, + SMC_SYSCON_RESET_CNTL, + rst_reg, 1); +} + + +static void ci_stop_smc_clock(struct pp_hwmgr *hwmgr) +{ + PHM_WRITE_INDIRECT_FIELD(hwmgr->device, CGS_IND_REG__SMC, + SMC_SYSCON_CLOCK_CNTL_0, + ck_disable, 1); +} + +static int ci_stop_smc(struct pp_hwmgr *hwmgr) +{ + ci_reset_smc(hwmgr); + ci_stop_smc_clock(hwmgr); + + return 0; +} + const struct pp_smumgr_func ci_smu_funcs = { .name = "ci_smu", .smu_init = ci_smu_init, @@ -2960,4 +2983,5 @@ const struct pp_smumgr_func ci_smu_funcs = { .is_dpm_running = ci_is_dpm_running, .update_dpm_settings = ci_update_dpm_settings, .update_smc_table = ci_update_smc_table, + .stop_smc = ci_stop_smc, }; diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c b/drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c index 4240aeec9000..83d06f8e99ec 100644 --- a/drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c +++ b/drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c @@ -217,3 +217,11 @@ int smum_smc_table_manager(struct pp_hwmgr *hwmgr, uint8_t *table, uint16_t tabl return -EINVAL; } + +int smum_stop_smc(struct pp_hwmgr *hwmgr) +{ + if (hwmgr->smumgr_funcs->stop_smc) + return hwmgr->smumgr_funcs->stop_smc(hwmgr); + + return 0; +} From c1cbb64c100d4863cd641e9354f0c8d5bbc8bb18 Mon Sep 17 00:00:00 2001 From: Evan Quan Date: Tue, 27 Oct 2020 10:24:18 +0800 Subject: [PATCH 061/151] drm/amd/pm: do not use ixFEATURE_STATUS for checking smc running [ Upstream commit 786436b453001dafe81025389f96bf9dac1e9690 ] This reverts commit f87812284172a9809820d10143b573d833cd3f75 ("drm/amdgpu: Fix bug where DPM is not enabled after hibernate and resume"). It was intended to fix Hawaii S4(hibernation) issue but break S3. As ixFEATURE_STATUS is filled with garbage data on resume which can be only cleared by reloading smc firmware(but that will involve many changes). So, we will revert this S4 fix and seek a new way. Signed-off-by: Evan Quan Tested-by: Sandeep Raghuraman Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c b/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c index 0f4f27a89020..42c8f8731a50 100644 --- a/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c +++ b/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c @@ -2725,10 +2725,7 @@ static int ci_initialize_mc_reg_table(struct pp_hwmgr *hwmgr) static bool ci_is_dpm_running(struct pp_hwmgr *hwmgr) { - return (1 == PHM_READ_INDIRECT_FIELD(hwmgr->device, - CGS_IND_REG__SMC, FEATURE_STATUS, - VOLTAGE_CONTROLLER_ON)) - ? true : false; + return ci_is_smc_ram_running(hwmgr); } static int ci_smu_init(struct pp_hwmgr *hwmgr) From 67bb2e4d41def89cf642936992a6c657d5ff1ed9 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Fri, 9 Oct 2020 13:25:41 +0200 Subject: [PATCH 062/151] mac80211: fix use of skb payload instead of header [ Upstream commit 14f46c1e5108696ec1e5a129e838ecedf108c7bf ] When ieee80211_skb_resize() is called from ieee80211_build_hdr() the skb has no 802.11 header yet, in fact it consist only of the payload as the ethernet frame is removed. As such, we're using the payload data for ieee80211_is_mgmt(), which is of course completely wrong. This didn't really hurt us because these are always data frames, so we could only have added more tailroom than we needed if we determined it was a management frame and sdata->crypto_tx_tailroom_needed_cnt was false. However, syzbot found that of course there need not be any payload, so we're using at best uninitialized memory for the check. Fix this to pass explicitly the kind of frame that we have instead of checking there, by replacing the "bool may_encrypt" argument with an argument that can carry the three possible states - it's not going to be encrypted, it's a management frame, or it's a data frame (and then we check sdata->crypto_tx_tailroom_needed_cnt). Reported-by: syzbot+32fd1a1bfe355e93f1e2@syzkaller.appspotmail.com Signed-off-by: Johannes Berg Link: https://lore.kernel.org/r/20201009132538.e1fd7f802947.I799b288466ea2815f9d4c84349fae697dca2f189@changeid Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin --- net/mac80211/tx.c | 37 ++++++++++++++++++++++++------------- 1 file changed, 24 insertions(+), 13 deletions(-) diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index f029e75ec815..30a0c7c6224b 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -1944,19 +1944,24 @@ static bool ieee80211_tx(struct ieee80211_sub_if_data *sdata, /* device xmit handlers */ +enum ieee80211_encrypt { + ENCRYPT_NO, + ENCRYPT_MGMT, + ENCRYPT_DATA, +}; + static int ieee80211_skb_resize(struct ieee80211_sub_if_data *sdata, struct sk_buff *skb, - int head_need, bool may_encrypt) + int head_need, + enum ieee80211_encrypt encrypt) { struct ieee80211_local *local = sdata->local; - struct ieee80211_hdr *hdr; bool enc_tailroom; int tail_need = 0; - hdr = (struct ieee80211_hdr *) skb->data; - enc_tailroom = may_encrypt && - (sdata->crypto_tx_tailroom_needed_cnt || - ieee80211_is_mgmt(hdr->frame_control)); + enc_tailroom = encrypt == ENCRYPT_MGMT || + (encrypt == ENCRYPT_DATA && + sdata->crypto_tx_tailroom_needed_cnt); if (enc_tailroom) { tail_need = IEEE80211_ENCRYPT_TAILROOM; @@ -1988,23 +1993,29 @@ void ieee80211_xmit(struct ieee80211_sub_if_data *sdata, { struct ieee80211_local *local = sdata->local; struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); - struct ieee80211_hdr *hdr; + struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data; int headroom; - bool may_encrypt; + enum ieee80211_encrypt encrypt; - may_encrypt = !(info->flags & IEEE80211_TX_INTFL_DONT_ENCRYPT); + if (info->flags & IEEE80211_TX_INTFL_DONT_ENCRYPT) + encrypt = ENCRYPT_NO; + else if (ieee80211_is_mgmt(hdr->frame_control)) + encrypt = ENCRYPT_MGMT; + else + encrypt = ENCRYPT_DATA; headroom = local->tx_headroom; - if (may_encrypt) + if (encrypt != ENCRYPT_NO) headroom += sdata->encrypt_headroom; headroom -= skb_headroom(skb); headroom = max_t(int, 0, headroom); - if (ieee80211_skb_resize(sdata, skb, headroom, may_encrypt)) { + if (ieee80211_skb_resize(sdata, skb, headroom, encrypt)) { ieee80211_free_txskb(&local->hw, skb); return; } + /* reload after potential resize */ hdr = (struct ieee80211_hdr *) skb->data; info->control.vif = &sdata->vif; @@ -2808,7 +2819,7 @@ static struct sk_buff *ieee80211_build_hdr(struct ieee80211_sub_if_data *sdata, head_need += sdata->encrypt_headroom; head_need += local->tx_headroom; head_need = max_t(int, 0, head_need); - if (ieee80211_skb_resize(sdata, skb, head_need, true)) { + if (ieee80211_skb_resize(sdata, skb, head_need, ENCRYPT_DATA)) { ieee80211_free_txskb(&local->hw, skb); skb = NULL; return ERR_PTR(-ENOMEM); @@ -3482,7 +3493,7 @@ static bool ieee80211_xmit_fast(struct ieee80211_sub_if_data *sdata, if (unlikely(ieee80211_skb_resize(sdata, skb, max_t(int, extra_head + hw_headroom - skb_headroom(skb), 0), - false))) { + ENCRYPT_NO))) { kfree_skb(skb); return true; } From a3f0db0d2320ecec68eedd9c540c694a468c6229 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Fri, 9 Oct 2020 13:58:22 +0200 Subject: [PATCH 063/151] cfg80211: initialize wdev data earlier [ Upstream commit 9bdaf3b91efd229dd272b228e13df10310c80d19 ] There's a race condition in the netdev registration in that NETDEV_REGISTER actually happens after the netdev is available, and so if we initialize things only there, we might get called with an uninitialized wdev through nl80211 - not using a wdev but using a netdev interface index. I found this while looking into a syzbot report, but it doesn't really seem to be related, and unfortunately there's no repro for it (yet). I can't (yet) explain how it managed to get into cfg80211_release_pmsr() from nl80211_netlink_notify() without the wdev having been initialized, as the latter only iterates the wdevs that are linked into the rdev, which even without the change here happened after init. However, looking at this, it seems fairly clear that the init needs to be done earlier, otherwise we might even re-init on a netns move, when data might still be pending. Signed-off-by: Johannes Berg Link: https://lore.kernel.org/r/20201009135821.fdcbba3aad65.Ie9201d91dbcb7da32318812effdc1561aeaf4cdc@changeid Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin --- net/wireless/core.c | 57 +++++++++++++++++++++++------------------- net/wireless/core.h | 5 ++-- net/wireless/nl80211.c | 3 ++- 3 files changed, 36 insertions(+), 29 deletions(-) diff --git a/net/wireless/core.c b/net/wireless/core.c index ee5bb8d8af04..5d151e8f8932 100644 --- a/net/wireless/core.c +++ b/net/wireless/core.c @@ -1224,8 +1224,7 @@ void cfg80211_stop_iface(struct wiphy *wiphy, struct wireless_dev *wdev, } EXPORT_SYMBOL(cfg80211_stop_iface); -void cfg80211_init_wdev(struct cfg80211_registered_device *rdev, - struct wireless_dev *wdev) +void cfg80211_init_wdev(struct wireless_dev *wdev) { mutex_init(&wdev->mtx); INIT_LIST_HEAD(&wdev->event_list); @@ -1236,6 +1235,30 @@ void cfg80211_init_wdev(struct cfg80211_registered_device *rdev, spin_lock_init(&wdev->pmsr_lock); INIT_WORK(&wdev->pmsr_free_wk, cfg80211_pmsr_free_wk); +#ifdef CONFIG_CFG80211_WEXT + wdev->wext.default_key = -1; + wdev->wext.default_mgmt_key = -1; + wdev->wext.connect.auth_type = NL80211_AUTHTYPE_AUTOMATIC; +#endif + + if (wdev->wiphy->flags & WIPHY_FLAG_PS_ON_BY_DEFAULT) + wdev->ps = true; + else + wdev->ps = false; + /* allow mac80211 to determine the timeout */ + wdev->ps_timeout = -1; + + if ((wdev->iftype == NL80211_IFTYPE_STATION || + wdev->iftype == NL80211_IFTYPE_P2P_CLIENT || + wdev->iftype == NL80211_IFTYPE_ADHOC) && !wdev->use_4addr) + wdev->netdev->priv_flags |= IFF_DONT_BRIDGE; + + INIT_WORK(&wdev->disconnect_wk, cfg80211_autodisconnect_wk); +} + +void cfg80211_register_wdev(struct cfg80211_registered_device *rdev, + struct wireless_dev *wdev) +{ /* * We get here also when the interface changes network namespaces, * as it's registered into the new one, but we don't want it to @@ -1269,6 +1292,11 @@ static int cfg80211_netdev_notifier_call(struct notifier_block *nb, switch (state) { case NETDEV_POST_INIT: SET_NETDEV_DEVTYPE(dev, &wiphy_type); + wdev->netdev = dev; + /* can only change netns with wiphy */ + dev->features |= NETIF_F_NETNS_LOCAL; + + cfg80211_init_wdev(wdev); break; case NETDEV_REGISTER: /* @@ -1276,35 +1304,12 @@ static int cfg80211_netdev_notifier_call(struct notifier_block *nb, * called within code protected by it when interfaces * are added with nl80211. */ - /* can only change netns with wiphy */ - dev->features |= NETIF_F_NETNS_LOCAL; - if (sysfs_create_link(&dev->dev.kobj, &rdev->wiphy.dev.kobj, "phy80211")) { pr_err("failed to add phy80211 symlink to netdev!\n"); } - wdev->netdev = dev; -#ifdef CONFIG_CFG80211_WEXT - wdev->wext.default_key = -1; - wdev->wext.default_mgmt_key = -1; - wdev->wext.connect.auth_type = NL80211_AUTHTYPE_AUTOMATIC; -#endif - if (wdev->wiphy->flags & WIPHY_FLAG_PS_ON_BY_DEFAULT) - wdev->ps = true; - else - wdev->ps = false; - /* allow mac80211 to determine the timeout */ - wdev->ps_timeout = -1; - - if ((wdev->iftype == NL80211_IFTYPE_STATION || - wdev->iftype == NL80211_IFTYPE_P2P_CLIENT || - wdev->iftype == NL80211_IFTYPE_ADHOC) && !wdev->use_4addr) - dev->priv_flags |= IFF_DONT_BRIDGE; - - INIT_WORK(&wdev->disconnect_wk, cfg80211_autodisconnect_wk); - - cfg80211_init_wdev(rdev, wdev); + cfg80211_register_wdev(rdev, wdev); break; case NETDEV_GOING_DOWN: cfg80211_leave(rdev, wdev); diff --git a/net/wireless/core.h b/net/wireless/core.h index ed487e324571..d83c8e009448 100644 --- a/net/wireless/core.h +++ b/net/wireless/core.h @@ -210,8 +210,9 @@ struct wiphy *wiphy_idx_to_wiphy(int wiphy_idx); int cfg80211_switch_netns(struct cfg80211_registered_device *rdev, struct net *net); -void cfg80211_init_wdev(struct cfg80211_registered_device *rdev, - struct wireless_dev *wdev); +void cfg80211_init_wdev(struct wireless_dev *wdev); +void cfg80211_register_wdev(struct cfg80211_registered_device *rdev, + struct wireless_dev *wdev); static inline void wdev_lock(struct wireless_dev *wdev) __acquires(wdev) diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 672b70730e89..dbac5c0995a0 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -3654,7 +3654,8 @@ static int nl80211_new_interface(struct sk_buff *skb, struct genl_info *info) * P2P Device and NAN do not have a netdev, so don't go * through the netdev notifier and must be added here */ - cfg80211_init_wdev(rdev, wdev); + cfg80211_init_wdev(wdev); + cfg80211_register_wdev(rdev, wdev); break; default: break; From e57c04697030192ab68277b07db97f4a7b530161 Mon Sep 17 00:00:00 2001 From: Ye Bin Date: Fri, 9 Oct 2020 15:02:15 +0800 Subject: [PATCH 064/151] cfg80211: regulatory: Fix inconsistent format argument [ Upstream commit db18d20d1cb0fde16d518fb5ccd38679f174bc04 ] Fix follow warning: [net/wireless/reg.c:3619]: (warning) %d in format string (no. 2) requires 'int' but the argument type is 'unsigned int'. Reported-by: Hulk Robot Signed-off-by: Ye Bin Link: https://lore.kernel.org/r/20201009070215.63695-1-yebin10@huawei.com Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin --- net/wireless/reg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/wireless/reg.c b/net/wireless/reg.c index 20a8e6af88c4..0f3b57a73670 100644 --- a/net/wireless/reg.c +++ b/net/wireless/reg.c @@ -3405,7 +3405,7 @@ static void print_rd_rules(const struct ieee80211_regdomain *rd) power_rule = ®_rule->power_rule; if (reg_rule->flags & NL80211_RRF_AUTO_BW) - snprintf(bw, sizeof(bw), "%d KHz, %d KHz AUTO", + snprintf(bw, sizeof(bw), "%d KHz, %u KHz AUTO", freq_range->max_bandwidth_khz, reg_get_max_bandwidth(rd, reg_rule)); else From bf1cedc12f580bfb0b3b99a67482e0e4abb97675 Mon Sep 17 00:00:00 2001 From: Qiujun Huang Date: Sat, 31 Oct 2020 16:57:14 +0800 Subject: [PATCH 065/151] tracing: Fix the checking of stackidx in __ftrace_trace_stack [ Upstream commit 906695e59324635c62b5ae59df111151a546ca66 ] The array size is FTRACE_KSTACK_NESTING, so the index FTRACE_KSTACK_NESTING is illegal too. And fix two typos by the way. Link: https://lkml.kernel.org/r/20201031085714.2147-1-hqjagain@gmail.com Signed-off-by: Qiujun Huang Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Sasha Levin --- kernel/trace/trace.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 2a357bda45cf..f7cac11a9005 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -2510,7 +2510,7 @@ trace_event_buffer_lock_reserve(struct ring_buffer **current_rb, /* * If tracing is off, but we have triggers enabled * we still need to look at the event data. Use the temp_buffer - * to store the trace event for the tigger to use. It's recusive + * to store the trace event for the trigger to use. It's recursive * safe and will not be recorded anywhere. */ if (!entry && trace_file->flags & EVENT_FILE_FL_TRIGGER_COND) { @@ -2832,7 +2832,7 @@ static void __ftrace_trace_stack(struct ring_buffer *buffer, stackidx = __this_cpu_inc_return(ftrace_stack_reserve) - 1; /* This should never happen. If it does, yell once and skip */ - if (WARN_ON_ONCE(stackidx > FTRACE_KSTACK_NESTING)) + if (WARN_ON_ONCE(stackidx >= FTRACE_KSTACK_NESTING)) goto out; /* From b61e157d9f6491652a799f73451a9104e1bee23b Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Thu, 24 Sep 2020 12:45:59 +0200 Subject: [PATCH 066/151] scsi: scsi_dh_alua: Avoid crash during alua_bus_detach() [ Upstream commit 5faf50e9e9fdc2117c61ff7e20da49cd6a29e0ca ] alua_bus_detach() might be running concurrently with alua_rtpg_work(), so we might trip over h->sdev == NULL and call BUG_ON(). The correct way of handling it is to not set h->sdev to NULL in alua_bus_detach(), and call rcu_synchronize() before the final delete to ensure that all concurrent threads have left the critical section. Then we can get rid of the BUG_ON() and replace it with a simple if condition. Link: https://lore.kernel.org/r/1600167537-12509-1-git-send-email-jitendra.khasdev@oracle.com Link: https://lore.kernel.org/r/20200924104559.26753-1-hare@suse.de Cc: Brian Bunker Acked-by: Brian Bunker Tested-by: Jitendra Khasdev Reviewed-by: Jitendra Khasdev Signed-off-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/device_handler/scsi_dh_alua.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c index f32da0ca529e..308bda2e9c00 100644 --- a/drivers/scsi/device_handler/scsi_dh_alua.c +++ b/drivers/scsi/device_handler/scsi_dh_alua.c @@ -658,8 +658,8 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg) rcu_read_lock(); list_for_each_entry_rcu(h, &tmp_pg->dh_list, node) { - /* h->sdev should always be valid */ - BUG_ON(!h->sdev); + if (!h->sdev) + continue; h->sdev->access_state = desc[0]; } rcu_read_unlock(); @@ -705,7 +705,8 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg) pg->expiry = 0; rcu_read_lock(); list_for_each_entry_rcu(h, &pg->dh_list, node) { - BUG_ON(!h->sdev); + if (!h->sdev) + continue; h->sdev->access_state = (pg->state & SCSI_ACCESS_STATE_MASK); if (pg->pref) @@ -1147,7 +1148,6 @@ static void alua_bus_detach(struct scsi_device *sdev) spin_lock(&h->pg_lock); pg = rcu_dereference_protected(h->pg, lockdep_is_held(&h->pg_lock)); rcu_assign_pointer(h->pg, NULL); - h->sdev = NULL; spin_unlock(&h->pg_lock); if (pg) { spin_lock_irq(&pg->lock); @@ -1156,6 +1156,7 @@ static void alua_bus_detach(struct scsi_device *sdev) kref_put(&pg->kref, release_port_group); } sdev->handler_data = NULL; + synchronize_rcu(); kfree(h); } From c473b3e56c1d8f6acb98e42fbd917bab6e8d67b8 Mon Sep 17 00:00:00 2001 From: Sreekanth Reddy Date: Mon, 2 Nov 2020 12:57:46 +0530 Subject: [PATCH 067/151] scsi: mpt3sas: Fix timeouts observed while reenabling IRQ [ Upstream commit 5feed64f9199ff90c4239971733f23f30aeb2484 ] While reenabling the IRQ after irq poll there may be small time window where HBA firmware has posted some replies and raise the interrupts but driver has not received the interrupts. So we may observe I/O timeouts as the driver has not processed the replies as interrupts got missed while reenabling the IRQ. To fix this issue the driver has to go for one more round of processing the reply descriptors from reply descriptor post queue after enabling the IRQ. Link: https://lore.kernel.org/r/20201102072746.27410-1-sreekanth.reddy@broadcom.com Reported-by: Tomas Henzl Reviewed-by: Tomas Henzl Signed-off-by: Sreekanth Reddy Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/mpt3sas/mpt3sas_base.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 3d58d24de6b6..8be8c510fdf7 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -1641,6 +1641,13 @@ _base_irqpoll(struct irq_poll *irqpoll, int budget) reply_q->irq_poll_scheduled = false; reply_q->irq_line_enable = true; enable_irq(reply_q->os_irq); + /* + * Go for one more round of processing the + * reply descriptor post queue incase if HBA + * Firmware has posted some reply descriptors + * while reenabling the IRQ. + */ + _base_process_reply_queue(reply_q); } return num_entries; From 0ca279c859d7478143803f660c667b191c30c09e Mon Sep 17 00:00:00 2001 From: Chao Leng Date: Thu, 22 Oct 2020 10:15:00 +0800 Subject: [PATCH 068/151] nvme: introduce nvme_sync_io_queues [ Upstream commit 04800fbff4764ab7b32c49d19628605a5d4cb85c ] Introduce sync io queues for some scenarios which just only need sync io queues not sync all queues. Signed-off-by: Chao Leng Reviewed-by: Sagi Grimberg Signed-off-by: Christoph Hellwig Signed-off-by: Sasha Levin --- drivers/nvme/host/core.c | 8 ++++++-- drivers/nvme/host/nvme.h | 1 + 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index ce69aaea581a..7a964271959d 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4226,8 +4226,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl) } EXPORT_SYMBOL_GPL(nvme_start_queues); - -void nvme_sync_queues(struct nvme_ctrl *ctrl) +void nvme_sync_io_queues(struct nvme_ctrl *ctrl) { struct nvme_ns *ns; @@ -4235,7 +4234,12 @@ void nvme_sync_queues(struct nvme_ctrl *ctrl) list_for_each_entry(ns, &ctrl->namespaces, list) blk_sync_queue(ns->queue); up_read(&ctrl->namespaces_rwsem); +} +EXPORT_SYMBOL_GPL(nvme_sync_io_queues); +void nvme_sync_queues(struct nvme_ctrl *ctrl) +{ + nvme_sync_io_queues(ctrl); if (ctrl->admin_q) blk_sync_queue(ctrl->admin_q); } diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index d7132d8cb7c5..e392d6cd92ce 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -494,6 +494,7 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl); void nvme_start_queues(struct nvme_ctrl *ctrl); void nvme_kill_queues(struct nvme_ctrl *ctrl); void nvme_sync_queues(struct nvme_ctrl *ctrl); +void nvme_sync_io_queues(struct nvme_ctrl *ctrl); void nvme_unfreeze(struct nvme_ctrl *ctrl); void nvme_wait_freeze(struct nvme_ctrl *ctrl); int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout); From d0e888a20dfdcfcf1313e0cc834ac553a3c5508c Mon Sep 17 00:00:00 2001 From: Chao Leng Date: Thu, 22 Oct 2020 10:15:08 +0800 Subject: [PATCH 069/151] nvme-rdma: avoid race between time out and tear down [ Upstream commit 3017013dcc82a4862bd1e140f8b762cfc594008d ] Now use teardown_lock to serialize for time out and tear down. This may cause abnormal: first cancel all request in tear down, then time out may complete the request again, but the request may already be freed or restarted. To avoid race between time out and tear down, in tear down process, first we quiesce the queue, and then delete the timer and cancel the time out work for the queue. At the same time we need to delete teardown_lock. Signed-off-by: Chao Leng Reviewed-by: Sagi Grimberg Signed-off-by: Christoph Hellwig Signed-off-by: Sasha Levin --- drivers/nvme/host/rdma.c | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index e957ad0a07f5..cfd437f7750e 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -110,7 +110,6 @@ struct nvme_rdma_ctrl { struct sockaddr_storage src_addr; struct nvme_ctrl ctrl; - struct mutex teardown_lock; bool use_inline_data; u32 io_queues[HCTX_MAX_TYPES]; }; @@ -933,8 +932,8 @@ out_free_io_queues: static void nvme_rdma_teardown_admin_queue(struct nvme_rdma_ctrl *ctrl, bool remove) { - mutex_lock(&ctrl->teardown_lock); blk_mq_quiesce_queue(ctrl->ctrl.admin_q); + blk_sync_queue(ctrl->ctrl.admin_q); nvme_rdma_stop_queue(&ctrl->queues[0]); if (ctrl->ctrl.admin_tagset) { blk_mq_tagset_busy_iter(ctrl->ctrl.admin_tagset, @@ -944,16 +943,15 @@ static void nvme_rdma_teardown_admin_queue(struct nvme_rdma_ctrl *ctrl, if (remove) blk_mq_unquiesce_queue(ctrl->ctrl.admin_q); nvme_rdma_destroy_admin_queue(ctrl, remove); - mutex_unlock(&ctrl->teardown_lock); } static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl, bool remove) { - mutex_lock(&ctrl->teardown_lock); if (ctrl->ctrl.queue_count > 1) { nvme_start_freeze(&ctrl->ctrl); nvme_stop_queues(&ctrl->ctrl); + nvme_sync_io_queues(&ctrl->ctrl); nvme_rdma_stop_io_queues(ctrl); if (ctrl->ctrl.tagset) { blk_mq_tagset_busy_iter(ctrl->ctrl.tagset, @@ -964,7 +962,6 @@ static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl, nvme_start_queues(&ctrl->ctrl); nvme_rdma_destroy_io_queues(ctrl, remove); } - mutex_unlock(&ctrl->teardown_lock); } static void nvme_rdma_free_ctrl(struct nvme_ctrl *nctrl) @@ -1728,16 +1725,12 @@ static void nvme_rdma_complete_timed_out(struct request *rq) { struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq); struct nvme_rdma_queue *queue = req->queue; - struct nvme_rdma_ctrl *ctrl = queue->ctrl; - /* fence other contexts that may complete the command */ - mutex_lock(&ctrl->teardown_lock); nvme_rdma_stop_queue(queue); if (!blk_mq_request_completed(rq)) { nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD; blk_mq_complete_request(rq); } - mutex_unlock(&ctrl->teardown_lock); } static enum blk_eh_timer_return @@ -2029,7 +2022,6 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev, return ERR_PTR(-ENOMEM); ctrl->ctrl.opts = opts; INIT_LIST_HEAD(&ctrl->list); - mutex_init(&ctrl->teardown_lock); if (!(opts->mask & NVMF_OPT_TRSVCID)) { opts->trsvcid = From 531b55cce9cd72b7e1c085c4c6d5b3b925d10f16 Mon Sep 17 00:00:00 2001 From: Chao Leng Date: Thu, 22 Oct 2020 10:15:15 +0800 Subject: [PATCH 070/151] nvme-tcp: avoid race between time out and tear down [ Upstream commit d6f66210f4b1aa2f5944f0e34e0f8db44f499f92 ] Now use teardown_lock to serialize for time out and tear down. This may cause abnormal: first cancel all request in tear down, then time out may complete the request again, but the request may already be freed or restarted. To avoid race between time out and tear down, in tear down process, first we quiesce the queue, and then delete the timer and cancel the time out work for the queue. At the same time we need to delete teardown_lock. Signed-off-by: Chao Leng Reviewed-by: Sagi Grimberg Signed-off-by: Christoph Hellwig Signed-off-by: Sasha Levin --- drivers/nvme/host/tcp.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index e159b78b5f3b..76440f26c145 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -110,7 +110,6 @@ struct nvme_tcp_ctrl { struct sockaddr_storage src_addr; struct nvme_ctrl ctrl; - struct mutex teardown_lock; struct work_struct err_work; struct delayed_work connect_work; struct nvme_tcp_request async_req; @@ -1797,8 +1796,8 @@ out_free_queue: static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl, bool remove) { - mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); blk_mq_quiesce_queue(ctrl->admin_q); + blk_sync_queue(ctrl->admin_q); nvme_tcp_stop_queue(ctrl, 0); if (ctrl->admin_tagset) { blk_mq_tagset_busy_iter(ctrl->admin_tagset, @@ -1808,18 +1807,17 @@ static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl, if (remove) blk_mq_unquiesce_queue(ctrl->admin_q); nvme_tcp_destroy_admin_queue(ctrl, remove); - mutex_unlock(&to_tcp_ctrl(ctrl)->teardown_lock); } static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl, bool remove) { - mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); if (ctrl->queue_count <= 1) - goto out; + return; blk_mq_quiesce_queue(ctrl->admin_q); nvme_start_freeze(ctrl); nvme_stop_queues(ctrl); + nvme_sync_io_queues(ctrl); nvme_tcp_stop_io_queues(ctrl); if (ctrl->tagset) { blk_mq_tagset_busy_iter(ctrl->tagset, @@ -1829,8 +1827,6 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl, if (remove) nvme_start_queues(ctrl); nvme_tcp_destroy_io_queues(ctrl, remove); -out: - mutex_unlock(&to_tcp_ctrl(ctrl)->teardown_lock); } static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl) @@ -2074,14 +2070,11 @@ static void nvme_tcp_complete_timed_out(struct request *rq) struct nvme_tcp_request *req = blk_mq_rq_to_pdu(rq); struct nvme_ctrl *ctrl = &req->queue->ctrl->ctrl; - /* fence other contexts that may complete the command */ - mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue)); if (!blk_mq_request_completed(rq)) { nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD; blk_mq_complete_request(rq); } - mutex_unlock(&to_tcp_ctrl(ctrl)->teardown_lock); } static enum blk_eh_timer_return @@ -2344,7 +2337,6 @@ static struct nvme_ctrl *nvme_tcp_create_ctrl(struct device *dev, nvme_tcp_reconnect_ctrl_work); INIT_WORK(&ctrl->err_work, nvme_tcp_error_recovery_work); INIT_WORK(&ctrl->ctrl.reset_work, nvme_reset_ctrl_work); - mutex_init(&ctrl->teardown_lock); if (!(opts->mask & NVMF_OPT_TRSVCID)) { opts->trsvcid = From 9d14f5225dbbf4cb5cd915898f41ba4a42ba1489 Mon Sep 17 00:00:00 2001 From: Sagi Grimberg Date: Thu, 22 Oct 2020 10:15:23 +0800 Subject: [PATCH 071/151] nvme-rdma: avoid repeated request completion [ Upstream commit fdf58e02adecbef4c7cbb2073d8ea225e6fd5f26 ] The request may be executed asynchronously, and rq->state may be changed to IDLE. To avoid repeated request completion, only MQ_RQ_COMPLETE of rq->state is checked in nvme_rdma_complete_timed_out. It is not safe, so need adding check IDLE for rq->state. Signed-off-by: Sagi Grimberg Signed-off-by: Chao Leng Signed-off-by: Christoph Hellwig Signed-off-by: Sasha Levin --- drivers/nvme/host/rdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index cfd437f7750e..8a62c2fe5a5e 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -1727,7 +1727,7 @@ static void nvme_rdma_complete_timed_out(struct request *rq) struct nvme_rdma_queue *queue = req->queue; nvme_rdma_stop_queue(queue); - if (!blk_mq_request_completed(rq)) { + if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) { nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD; blk_mq_complete_request(rq); } From a889cd3d350d8acc3c0d744f9f7af76ea4f994ab Mon Sep 17 00:00:00 2001 From: Sagi Grimberg Date: Thu, 22 Oct 2020 10:15:31 +0800 Subject: [PATCH 072/151] nvme-tcp: avoid repeated request completion [ Upstream commit 0a8a2c85b83589a5c10bc5564b796836bf4b4984 ] The request may be executed asynchronously, and rq->state may be changed to IDLE. To avoid repeated request completion, only MQ_RQ_COMPLETE of rq->state is checked in nvme_tcp_complete_timed_out. It is not safe, so need adding check IDLE for rq->state. Signed-off-by: Sagi Grimberg Signed-off-by: Chao Leng Signed-off-by: Christoph Hellwig Signed-off-by: Sasha Levin --- drivers/nvme/host/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 76440f26c145..a31c6e1f6063 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -2071,7 +2071,7 @@ static void nvme_tcp_complete_timed_out(struct request *rq) struct nvme_ctrl *ctrl = &req->queue->ctrl->ctrl; nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue)); - if (!blk_mq_request_completed(rq)) { + if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) { nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD; blk_mq_complete_request(rq); } From 984d77507439844d08062b27b46861790a40ed40 Mon Sep 17 00:00:00 2001 From: Suravee Suthikulpanit Date: Thu, 15 Oct 2020 02:50:02 +0000 Subject: [PATCH 073/151] iommu/amd: Increase interrupt remapping table limit to 512 entries [ Upstream commit 73db2fc595f358460ce32bcaa3be1f0cce4a2db1 ] Certain device drivers allocate IO queues on a per-cpu basis. On AMD EPYC platform, which can support up-to 256 cpu threads, this can exceed the current MAX_IRQ_PER_TABLE limit of 256, and result in the error message: AMD-Vi: Failed to allocate IRTE This has been observed with certain NVME devices. AMD IOMMU hardware can actually support upto 512 interrupt remapping table entries. Therefore, update the driver to match the hardware limit. Please note that this also increases the size of interrupt remapping table to 8KB per device when using the 128-bit IRTE format. Signed-off-by: Suravee Suthikulpanit Link: https://lore.kernel.org/r/20201015025002.87997-1-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/amd_iommu_types.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index 0679896b9e2e..3ec090adcdae 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -406,7 +406,11 @@ extern bool amd_iommu_np_cache; /* Only true if all IOMMUs support device IOTLBs */ extern bool amd_iommu_iotlb_sup; -#define MAX_IRQS_PER_TABLE 256 +/* + * AMD IOMMU hardware only support 512 IRTEs despite + * the architectural limitation of 2048 entries. + */ +#define MAX_IRQS_PER_TABLE 512 #define IRQ_TABLE_ALIGNMENT 128 struct irq_remap_table { From 4d6f536e34d669bd150f7a1883eb203dd4ae2679 Mon Sep 17 00:00:00 2001 From: Qian Cai Date: Wed, 28 Oct 2020 14:27:42 -0400 Subject: [PATCH 074/151] s390/smp: move rcu_cpu_starting() earlier [ Upstream commit de5d9dae150ca1c1b5c7676711a9ca139d1a8dec ] The call to rcu_cpu_starting() in smp_init_secondary() is not early enough in the CPU-hotplug onlining process, which results in lockdep splats as follows: WARNING: suspicious RCU usage ----------------------------- kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/1/0. Call Trace: show_stack+0x158/0x1f0 dump_stack+0x1f2/0x238 __lock_acquire+0x2640/0x4dd0 lock_acquire+0x3a8/0xd08 _raw_spin_lock_irqsave+0xc0/0xf0 clockevents_register_device+0xa8/0x528 init_cpu_timer+0x33e/0x468 smp_init_secondary+0x11a/0x328 smp_start_secondary+0x82/0x88 This is avoided by moving the call to rcu_cpu_starting up near the beginning of the smp_init_secondary() function. Note that the raw_smp_processor_id() is required in order to avoid calling into lockdep before RCU has declared the CPU to be watched for readers. Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/ Signed-off-by: Qian Cai Acked-by: Paul E. McKenney Signed-off-by: Heiko Carstens Signed-off-by: Sasha Levin --- arch/s390/kernel/smp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c index ad426cc656e5..66d7ba61803c 100644 --- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -845,13 +845,14 @@ void __init smp_detect_cpus(void) static void smp_init_secondary(void) { - int cpu = smp_processor_id(); + int cpu = raw_smp_processor_id(); S390_lowcore.last_update_clock = get_tod_clock(); restore_access_regs(S390_lowcore.access_regs_save_area); set_cpu_flag(CIF_ASCE_PRIMARY); set_cpu_flag(CIF_ASCE_SECONDARY); cpu_init(); + rcu_cpu_starting(cpu); preempt_disable(); init_cpu_timer(); vtime_init(); From c6be53caf1c89ed3346e1d24627fd64093c5bb65 Mon Sep 17 00:00:00 2001 From: Zhang Qilong Date: Sat, 31 Oct 2020 11:03:53 +0800 Subject: [PATCH 075/151] vfio: platform: fix reference leak in vfio_platform_open [ Upstream commit bb742ad01961a3b9d1f9d19375487b879668b6b2 ] pm_runtime_get_sync() will increment pm usage counter even it failed. Forgetting to call pm_runtime_put will result in reference leak in vfio_platform_open, so we should fix it. Signed-off-by: Zhang Qilong Acked-by: Eric Auger Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin --- drivers/vfio/platform/vfio_platform_common.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index e8f2bdbe0542..152e5188183c 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -267,7 +267,7 @@ static int vfio_platform_open(void *device_data) ret = pm_runtime_get_sync(vdev->device); if (ret < 0) - goto err_pm; + goto err_rst; ret = vfio_platform_call_reset(vdev, &extra_dbg); if (ret && vdev->reset_required) { @@ -284,7 +284,6 @@ static int vfio_platform_open(void *device_data) err_rst: pm_runtime_put(vdev->device); -err_pm: vfio_platform_irq_cleanup(vdev); err_irq: vfio_platform_regions_cleanup(vdev); From b66c7cdedd1e9b2bcda3f0cf39a05aa57d3a5948 Mon Sep 17 00:00:00 2001 From: Fred Gao Date: Tue, 3 Nov 2020 02:01:20 +0800 Subject: [PATCH 076/151] vfio/pci: Bypass IGD init in case of -ENODEV [ Upstream commit e4eccb853664de7bcf9518fb658f35e748bf1f68 ] Bypass the IGD initialization when -ENODEV returns, that should be the case if opregion is not available for IGD or within discrete graphics device's option ROM, or host/lpc bridge is not found. Then use of -ENODEV here means no special device resources found which needs special care for VFIO, but we still allow other normal device resource access. Cc: Zhenyu Wang Cc: Xiong Zhang Cc: Hang Yuan Cc: Stuart Summers Signed-off-by: Fred Gao Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin --- drivers/vfio/pci/vfio_pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index a72fd5309b09..443a35dde7f5 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -334,7 +334,7 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev) pdev->vendor == PCI_VENDOR_ID_INTEL && IS_ENABLED(CONFIG_VFIO_PCI_IGD)) { ret = vfio_pci_igd_init(vdev); - if (ret) { + if (ret && ret != -ENODEV) { pci_warn(pdev, "Failed to setup Intel IGD regions\n"); goto disable_exit; } From 4faa1fabc64500d03688b5df8c21858fd1f6f060 Mon Sep 17 00:00:00 2001 From: Qii Wang Date: Fri, 30 Oct 2020 19:58:01 +0800 Subject: [PATCH 077/151] i2c: mediatek: move dma reset before i2c reset [ Upstream commit aafced673c06b7c77040c1df42e2e965be5d0376 ] The i2c driver default do dma reset after i2c reset, but sometimes i2c reset will trigger dma tx2rx, then apdma write data to dram which has been i2c_put_dma_safe_msg_buf(kfree). Move dma reset before i2c reset in mtk_i2c_init_hw to fix it. Signed-off-by: Qii Wang Signed-off-by: Wolfram Sang Signed-off-by: Sasha Levin --- drivers/i2c/busses/i2c-mt65xx.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/i2c/busses/i2c-mt65xx.c b/drivers/i2c/busses/i2c-mt65xx.c index 2152ec5f535c..5a9f0d17f52c 100644 --- a/drivers/i2c/busses/i2c-mt65xx.c +++ b/drivers/i2c/busses/i2c-mt65xx.c @@ -389,6 +389,10 @@ static void mtk_i2c_init_hw(struct mtk_i2c *i2c) { u16 control_reg; + writel(I2C_DMA_HARD_RST, i2c->pdmabase + OFFSET_RST); + udelay(50); + writel(I2C_DMA_CLR_FLAG, i2c->pdmabase + OFFSET_RST); + mtk_i2c_writew(i2c, I2C_SOFT_RST, OFFSET_SOFTRESET); /* Set ioconfig */ @@ -419,10 +423,6 @@ static void mtk_i2c_init_hw(struct mtk_i2c *i2c) mtk_i2c_writew(i2c, control_reg, OFFSET_CONTROL); mtk_i2c_writew(i2c, I2C_DELAY_LEN, OFFSET_DELAY_LEN); - - writel(I2C_DMA_HARD_RST, i2c->pdmabase + OFFSET_RST); - udelay(50); - writel(I2C_DMA_CLR_FLAG, i2c->pdmabase + OFFSET_RST); } /* From ab10b7def4211e7b4b0ea0c18b8302119902d70e Mon Sep 17 00:00:00 2001 From: Veerabadhran Gopalakrishnan Date: Thu, 29 Oct 2020 19:59:46 +0530 Subject: [PATCH 078/151] amd/amdgpu: Disable VCN DPG mode for Picasso [ Upstream commit c6d2b0fbb893d5c7dda405aa0e7bcbecf1c75f98 ] Concurrent operation of VCN and JPEG decoder in DPG mode is causing ring timeout due to power state. Signed-off-by: Veerabadhran Gopalakrishnan Reviewed-by: Leo Liu Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/soc15.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c index c086262cc181..317aa257c06b 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc15.c +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c @@ -1144,8 +1144,7 @@ static int soc15_common_early_init(void *handle) adev->pg_flags = AMD_PG_SUPPORT_SDMA | AMD_PG_SUPPORT_MMHUB | - AMD_PG_SUPPORT_VCN | - AMD_PG_SUPPORT_VCN_DPG; + AMD_PG_SUPPORT_VCN; } else { adev->cg_flags = AMD_CG_SUPPORT_GFX_MGCG | AMD_CG_SUPPORT_GFX_MGLS | From 6d8b43376990ae9f24e08b5e76c14dcd0cf2cef3 Mon Sep 17 00:00:00 2001 From: Tommi Rantala Date: Thu, 8 Oct 2020 15:26:30 +0300 Subject: [PATCH 079/151] selftests: proc: fix warning: _GNU_SOURCE redefined [ Upstream commit f3ae6c6e8a3ea49076d826c64e63ea78fbf9db43 ] Makefile already contains -D_GNU_SOURCE, so we can remove it from the *.c files. Signed-off-by: Tommi Rantala Signed-off-by: Shuah Khan Signed-off-by: Sasha Levin --- tools/testing/selftests/proc/proc-loadavg-001.c | 1 - tools/testing/selftests/proc/proc-self-syscall.c | 1 - tools/testing/selftests/proc/proc-uptime-002.c | 1 - 3 files changed, 3 deletions(-) diff --git a/tools/testing/selftests/proc/proc-loadavg-001.c b/tools/testing/selftests/proc/proc-loadavg-001.c index 471e2aa28077..fb4fe9188806 100644 --- a/tools/testing/selftests/proc/proc-loadavg-001.c +++ b/tools/testing/selftests/proc/proc-loadavg-001.c @@ -14,7 +14,6 @@ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ /* Test that /proc/loadavg correctly reports last pid in pid namespace. */ -#define _GNU_SOURCE #include #include #include diff --git a/tools/testing/selftests/proc/proc-self-syscall.c b/tools/testing/selftests/proc/proc-self-syscall.c index 9f6d000c0245..8511dcfe67c7 100644 --- a/tools/testing/selftests/proc/proc-self-syscall.c +++ b/tools/testing/selftests/proc/proc-self-syscall.c @@ -13,7 +13,6 @@ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ -#define _GNU_SOURCE #include #include #include diff --git a/tools/testing/selftests/proc/proc-uptime-002.c b/tools/testing/selftests/proc/proc-uptime-002.c index 30e2b7849089..e7ceabed7f51 100644 --- a/tools/testing/selftests/proc/proc-uptime-002.c +++ b/tools/testing/selftests/proc/proc-uptime-002.c @@ -15,7 +15,6 @@ */ // Test that values in /proc/uptime increment monotonically // while shifting across CPUs. -#define _GNU_SOURCE #undef NDEBUG #include #include From 37a048d790c318c35a04c1980bdc4ac7d74c385a Mon Sep 17 00:00:00 2001 From: Sean Anderson Date: Thu, 22 Oct 2020 16:30:12 -0400 Subject: [PATCH 080/151] riscv: Set text_offset correctly for M-Mode [ Upstream commit 79605f1394261995c2b955c906a5a20fb27cdc84 ] M-Mode Linux is loaded at the start of RAM, not 2MB later. Perhaps this should be calculated based on PAGE_OFFSET somehow? Even better would be to deprecate text_offset and instead introduce something absolute. Signed-off-by: Sean Anderson Signed-off-by: Palmer Dabbelt Signed-off-by: Sasha Levin --- arch/riscv/kernel/head.S | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S index 72f89b7590dd..344793159b97 100644 --- a/arch/riscv/kernel/head.S +++ b/arch/riscv/kernel/head.S @@ -26,12 +26,17 @@ ENTRY(_start) /* reserved */ .word 0 .balign 8 +#ifdef CONFIG_RISCV_M_MODE + /* Image load offset (0MB) from start of RAM for M-mode */ + .dword 0 +#else #if __riscv_xlen == 64 /* Image load offset(2MB) from start of RAM */ .dword 0x200000 #else /* Image load offset(4MB) from start of RAM */ .dword 0x400000 +#endif #endif /* Effective size of kernel image */ .dword _end - _start From 713a3a94bee068a822b374dc935612f33e382acc Mon Sep 17 00:00:00 2001 From: Ulrich Hecht Date: Mon, 28 Sep 2020 17:59:50 +0200 Subject: [PATCH 081/151] i2c: sh_mobile: implement atomic transfers [ Upstream commit a49cc1fe9d64a2dc4e19b599204f403e5d25f44b ] Implements atomic transfers to fix reboot/shutdown on r8a7790 Lager and similar boards. Signed-off-by: Ulrich Hecht Tested-by: Wolfram Sang Tested-by: Geert Uytterhoeven [wsa: some whitespace fixing] Signed-off-by: Wolfram Sang Signed-off-by: Sasha Levin --- drivers/i2c/busses/i2c-sh_mobile.c | 84 +++++++++++++++++++++++------- 1 file changed, 65 insertions(+), 19 deletions(-) diff --git a/drivers/i2c/busses/i2c-sh_mobile.c b/drivers/i2c/busses/i2c-sh_mobile.c index 8777af4c695e..d5dd58c27ce5 100644 --- a/drivers/i2c/busses/i2c-sh_mobile.c +++ b/drivers/i2c/busses/i2c-sh_mobile.c @@ -129,6 +129,7 @@ struct sh_mobile_i2c_data { int sr; bool send_stop; bool stop_after_dma; + bool atomic_xfer; struct resource *res; struct dma_chan *dma_tx; @@ -333,13 +334,15 @@ static unsigned char i2c_op(struct sh_mobile_i2c_data *pd, enum sh_mobile_i2c_op ret = iic_rd(pd, ICDR); break; case OP_RX_STOP: /* enable DTE interrupt, issue stop */ - iic_wr(pd, ICIC, - ICIC_DTEE | ICIC_WAITE | ICIC_ALE | ICIC_TACKE); + if (!pd->atomic_xfer) + iic_wr(pd, ICIC, + ICIC_DTEE | ICIC_WAITE | ICIC_ALE | ICIC_TACKE); iic_wr(pd, ICCR, ICCR_ICE | ICCR_RACK); break; case OP_RX_STOP_DATA: /* enable DTE interrupt, read data, issue stop */ - iic_wr(pd, ICIC, - ICIC_DTEE | ICIC_WAITE | ICIC_ALE | ICIC_TACKE); + if (!pd->atomic_xfer) + iic_wr(pd, ICIC, + ICIC_DTEE | ICIC_WAITE | ICIC_ALE | ICIC_TACKE); ret = iic_rd(pd, ICDR); iic_wr(pd, ICCR, ICCR_ICE | ICCR_RACK); break; @@ -435,7 +438,8 @@ static irqreturn_t sh_mobile_i2c_isr(int irq, void *dev_id) if (wakeup) { pd->sr |= SW_DONE; - wake_up(&pd->wait); + if (!pd->atomic_xfer) + wake_up(&pd->wait); } /* defeat write posting to avoid spurious WAIT interrupts */ @@ -587,6 +591,9 @@ static void start_ch(struct sh_mobile_i2c_data *pd, struct i2c_msg *usr_msg, pd->pos = -1; pd->sr = 0; + if (pd->atomic_xfer) + return; + pd->dma_buf = i2c_get_dma_safe_msg_buf(pd->msg, 8); if (pd->dma_buf) sh_mobile_i2c_xfer_dma(pd); @@ -643,15 +650,13 @@ static int poll_busy(struct sh_mobile_i2c_data *pd) return i ? 0 : -ETIMEDOUT; } -static int sh_mobile_i2c_xfer(struct i2c_adapter *adapter, - struct i2c_msg *msgs, - int num) +static int sh_mobile_xfer(struct sh_mobile_i2c_data *pd, + struct i2c_msg *msgs, int num) { - struct sh_mobile_i2c_data *pd = i2c_get_adapdata(adapter); struct i2c_msg *msg; int err = 0; int i; - long timeout; + long time_left; /* Wake up device and enable clock */ pm_runtime_get_sync(pd->dev); @@ -668,15 +673,35 @@ static int sh_mobile_i2c_xfer(struct i2c_adapter *adapter, if (do_start) i2c_op(pd, OP_START); - /* The interrupt handler takes care of the rest... */ - timeout = wait_event_timeout(pd->wait, - pd->sr & (ICSR_TACK | SW_DONE), - adapter->timeout); + if (pd->atomic_xfer) { + unsigned long j = jiffies + pd->adap.timeout; - /* 'stop_after_dma' tells if DMA transfer was complete */ - i2c_put_dma_safe_msg_buf(pd->dma_buf, pd->msg, pd->stop_after_dma); + time_left = time_before_eq(jiffies, j); + while (time_left && + !(pd->sr & (ICSR_TACK | SW_DONE))) { + unsigned char sr = iic_rd(pd, ICSR); - if (!timeout) { + if (sr & (ICSR_AL | ICSR_TACK | + ICSR_WAIT | ICSR_DTE)) { + sh_mobile_i2c_isr(0, pd); + udelay(150); + } else { + cpu_relax(); + } + time_left = time_before_eq(jiffies, j); + } + } else { + /* The interrupt handler takes care of the rest... */ + time_left = wait_event_timeout(pd->wait, + pd->sr & (ICSR_TACK | SW_DONE), + pd->adap.timeout); + + /* 'stop_after_dma' tells if DMA xfer was complete */ + i2c_put_dma_safe_msg_buf(pd->dma_buf, pd->msg, + pd->stop_after_dma); + } + + if (!time_left) { dev_err(pd->dev, "Transfer request timed out\n"); if (pd->dma_direction != DMA_NONE) sh_mobile_i2c_cleanup_dma(pd); @@ -702,14 +727,35 @@ static int sh_mobile_i2c_xfer(struct i2c_adapter *adapter, return err ?: num; } +static int sh_mobile_i2c_xfer(struct i2c_adapter *adapter, + struct i2c_msg *msgs, + int num) +{ + struct sh_mobile_i2c_data *pd = i2c_get_adapdata(adapter); + + pd->atomic_xfer = false; + return sh_mobile_xfer(pd, msgs, num); +} + +static int sh_mobile_i2c_xfer_atomic(struct i2c_adapter *adapter, + struct i2c_msg *msgs, + int num) +{ + struct sh_mobile_i2c_data *pd = i2c_get_adapdata(adapter); + + pd->atomic_xfer = true; + return sh_mobile_xfer(pd, msgs, num); +} + static u32 sh_mobile_i2c_func(struct i2c_adapter *adapter) { return I2C_FUNC_I2C | I2C_FUNC_SMBUS_EMUL | I2C_FUNC_PROTOCOL_MANGLING; } static const struct i2c_algorithm sh_mobile_i2c_algorithm = { - .functionality = sh_mobile_i2c_func, - .master_xfer = sh_mobile_i2c_xfer, + .functionality = sh_mobile_i2c_func, + .master_xfer = sh_mobile_i2c_xfer, + .master_xfer_atomic = sh_mobile_i2c_xfer_atomic, }; static const struct i2c_adapter_quirks sh_mobile_i2c_quirks = { From 572e545d80eaca124e92149559e428a499d55c06 Mon Sep 17 00:00:00 2001 From: Jerry Snitselaar Date: Thu, 15 Oct 2020 14:44:30 -0700 Subject: [PATCH 082/151] tpm_tis: Disable interrupts on ThinkPad T490s [ Upstream commit b154ce11ead925de6a94feb3b0317fafeefa0ebc ] There is a misconfiguration in the bios of the gpio pin used for the interrupt in the T490s. When interrupts are enabled in the tpm_tis driver code this results in an interrupt storm. This was initially reported when we attempted to enable the interrupt code in the tpm_tis driver, which previously wasn't setting a flag to enable it. Due to the reports of the interrupt storm that code was reverted and we went back to polling instead of using interrupts. Now that we know the T490s problem is a firmware issue, add code to check if the system is a T490s and disable interrupts if that is the case. This will allow us to enable interrupts for everyone else. If the user has a fixed bios they can force the enabling of interrupts with tpm_tis.interrupts=1 on the kernel command line. Cc: Peter Huewe Cc: Jason Gunthorpe Cc: Hans de Goede Signed-off-by: Jerry Snitselaar Reviewed-by: James Bottomley Reviewed-by: Hans de Goede Reviewed-by: Jarkko Sakkinen Signed-off-by: Jarkko Sakkinen Signed-off-by: Sasha Levin --- drivers/char/tpm/tpm_tis.c | 29 +++++++++++++++++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index e7df342a317d..c722e3b3121a 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -27,6 +27,7 @@ #include #include #include +#include #include "tpm.h" #include "tpm_tis_core.h" @@ -49,8 +50,8 @@ static inline struct tpm_tis_tcg_phy *to_tpm_tis_tcg_phy(struct tpm_tis_data *da return container_of(data, struct tpm_tis_tcg_phy, priv); } -static bool interrupts = true; -module_param(interrupts, bool, 0444); +static int interrupts = -1; +module_param(interrupts, int, 0444); MODULE_PARM_DESC(interrupts, "Enable interrupts"); static bool itpm; @@ -63,6 +64,28 @@ module_param(force, bool, 0444); MODULE_PARM_DESC(force, "Force device probe rather than using ACPI entry"); #endif +static int tpm_tis_disable_irq(const struct dmi_system_id *d) +{ + if (interrupts == -1) { + pr_notice("tpm_tis: %s detected: disabling interrupts.\n", d->ident); + interrupts = 0; + } + + return 0; +} + +static const struct dmi_system_id tpm_tis_dmi_table[] = { + { + .callback = tpm_tis_disable_irq, + .ident = "ThinkPad T490s", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T490s"), + }, + }, + {} +}; + #if defined(CONFIG_PNP) && defined(CONFIG_ACPI) static int has_hid(struct acpi_device *dev, const char *hid) { @@ -192,6 +215,8 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info) int irq = -1; int rc; + dmi_check_system(tpm_tis_dmi_table); + rc = check_acpi_tpm2(dev); if (rc) return rc; From 3322f7289e50d438669206b74edba7b819cdce3b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Martin=20Hundeb=C3=B8ll?= Date: Thu, 5 Nov 2020 10:06:15 +0100 Subject: [PATCH 083/151] spi: bcm2835: remove use of uninitialized gpio flags variable MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit bc7f2cd7559c5595dc38b909ae9a8d43e0215994 upstream. Removing the duplicate gpio chip select level handling in bcm2835_spi_setup() left the lflags variable uninitialized. Avoid trhe use of such variable by passing default flags to gpiochip_request_own_desc(). Fixes: 5e31ba0c0543 ("spi: bcm2835: fix gpio cs level inversion") Signed-off-by: Martin Hundebøll Link: https://lore.kernel.org/r/20201105090615.620315-1-martin@geanix.com Signed-off-by: Mark Brown Cc: Nathan Chancellor Signed-off-by: Greg Kroah-Hartman --- drivers/spi/spi-bcm2835.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c index cfd9176e6413..9ae1c96f4d3d 100644 --- a/drivers/spi/spi-bcm2835.c +++ b/drivers/spi/spi-bcm2835.c @@ -1179,7 +1179,6 @@ static int bcm2835_spi_setup(struct spi_device *spi) struct spi_controller *ctlr = spi->controller; struct bcm2835_spi *bs = spi_controller_get_devdata(ctlr); struct gpio_chip *chip; - enum gpio_lookup_flags lflags; u32 cs; /* @@ -1247,7 +1246,7 @@ static int bcm2835_spi_setup(struct spi_device *spi) spi->cs_gpiod = gpiochip_request_own_desc(chip, 8 - spi->chip_select, DRV_NAME, - lflags, + GPIO_LOOKUP_FLAGS_DEFAULT, GPIOD_OUT_LOW); if (IS_ERR(spi->cs_gpiod)) return PTR_ERR(spi->cs_gpiod); From 58953e87343dff07041c1cad1fd710805c4eec24 Mon Sep 17 00:00:00 2001 From: Chunyan Zhang Date: Fri, 10 Jan 2020 16:39:02 +0800 Subject: [PATCH 084/151] tick/common: Touch watchdog in tick_unfreeze() on all CPUs commit 5167c506d62dd9ffab73eba23c79b0a8845c9fe1 upstream. Suspend to IDLE invokes tick_unfreeze() on resume. tick_unfreeze() on the first resuming CPU resumes timekeeping, which also has the side effect of resetting the softlockup watchdog on this CPU. But on the secondary CPUs the watchdog is not reset in the resume / unfreeze() path, which can result in false softlockup warnings on those CPUs depending on the time spent in suspend. Prevent this by clearing the softlock watchdog in the unfreeze path also on the secondary resuming CPUs. [ tglx: Massaged changelog ] Signed-off-by: Chunyan Zhang Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20200110083902.27276-1-chunyan.zhang@unisoc.com Signed-off-by: Greg Kroah-Hartman --- kernel/time/tick-common.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c index 59225b484e4e..7e5d3524e924 100644 --- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -558,6 +559,7 @@ void tick_unfreeze(void) trace_suspend_resume(TPS("timekeeping_freeze"), smp_processor_id(), false); } else { + touch_softlockup_watchdog(); tick_resume_local(); } From 88ccabbd2066a619ea0794a86397b5b1454a29e3 Mon Sep 17 00:00:00 2001 From: Baolin Wang Date: Tue, 18 Aug 2020 11:41:58 +0800 Subject: [PATCH 085/151] mfd: sprd: Add wakeup capability for PMIC IRQ commit a75bfc824a2d33f57ebdc003bfe6b7a9e11e9cb9 upstream. When changing to use suspend-to-idle to save power, the PMIC irq can not wakeup the system due to lack of wakeup capability, which will cause the sub-irqs (such as power key) of the PMIC can not wake up the system. Thus we can add the wakeup capability for PMIC irq to solve this issue, as well as removing the IRQF_NO_SUSPEND flag to allow PMIC irq to be a wakeup source. Reported-by: Chunyan Zhang Signed-off-by: Baolin Wang Tested-by: Chunyan Zhang Signed-off-by: Lee Jones Signed-off-by: Greg Kroah-Hartman --- drivers/mfd/sprd-sc27xx-spi.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/drivers/mfd/sprd-sc27xx-spi.c b/drivers/mfd/sprd-sc27xx-spi.c index c0529a1cd5ea..20529ff48f00 100644 --- a/drivers/mfd/sprd-sc27xx-spi.c +++ b/drivers/mfd/sprd-sc27xx-spi.c @@ -204,7 +204,7 @@ static int sprd_pmic_probe(struct spi_device *spi) } ret = devm_regmap_add_irq_chip(&spi->dev, ddata->regmap, ddata->irq, - IRQF_ONESHOT | IRQF_NO_SUSPEND, 0, + IRQF_ONESHOT, 0, &ddata->irq_chip, &ddata->irq_data); if (ret) { dev_err(&spi->dev, "Failed to add PMIC irq chip %d\n", ret); @@ -220,9 +220,34 @@ static int sprd_pmic_probe(struct spi_device *spi) return ret; } + device_init_wakeup(&spi->dev, true); return 0; } +#ifdef CONFIG_PM_SLEEP +static int sprd_pmic_suspend(struct device *dev) +{ + struct sprd_pmic *ddata = dev_get_drvdata(dev); + + if (device_may_wakeup(dev)) + enable_irq_wake(ddata->irq); + + return 0; +} + +static int sprd_pmic_resume(struct device *dev) +{ + struct sprd_pmic *ddata = dev_get_drvdata(dev); + + if (device_may_wakeup(dev)) + disable_irq_wake(ddata->irq); + + return 0; +} +#endif + +static SIMPLE_DEV_PM_OPS(sprd_pmic_pm_ops, sprd_pmic_suspend, sprd_pmic_resume); + static const struct of_device_id sprd_pmic_match[] = { { .compatible = "sprd,sc2731", .data = &sc2731_data }, {}, @@ -234,6 +259,7 @@ static struct spi_driver sprd_pmic_driver = { .name = "sc27xx-pmic", .bus = &spi_bus_type, .of_match_table = sprd_pmic_match, + .pm = &sprd_pmic_pm_ops, }, .probe = sprd_pmic_probe, }; From c0be7a34c8897779236fdbeb68a3aff0668116d9 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 14 Oct 2020 13:46:38 +0300 Subject: [PATCH 086/151] pinctrl: intel: Set default bias in case no particular value given [ Upstream commit f3c75e7a9349d1d33eb53ddc1b31640994969f73 ] When GPIO library asks pin control to set the bias, it doesn't pass any value of it and argument is considered boolean (and this is true for ACPI GpioIo() / GpioInt() resources, by the way). Thus, individual drivers must behave well, when they got the resistance value of 1 Ohm, i.e. transforming it to sane default. In case of Intel pin control hardware the 5 kOhm sounds plausible because on one hand it's a minimum of resistors present in all hardware generations and at the same time it's high enough to minimize leakage current (will be only 200 uA with the above choice). Fixes: e57725eabf87 ("pinctrl: intel: Add support for hardware debouncer") Reported-by: Jamie McClymont Signed-off-by: Andy Shevchenko Acked-by: Mika Westerberg Signed-off-by: Sasha Levin --- drivers/pinctrl/intel/pinctrl-intel.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/pinctrl/intel/pinctrl-intel.c b/drivers/pinctrl/intel/pinctrl-intel.c index 83981ad66a71..4e89bbf6b76a 100644 --- a/drivers/pinctrl/intel/pinctrl-intel.c +++ b/drivers/pinctrl/intel/pinctrl-intel.c @@ -662,6 +662,10 @@ static int intel_config_set_pull(struct intel_pinctrl *pctrl, unsigned int pin, value |= PADCFG1_TERM_UP; + /* Set default strength value in case none is given */ + if (arg == 1) + arg = 5000; + switch (arg) { case 20000: value |= PADCFG1_TERM_20K << PADCFG1_TERM_SHIFT; @@ -684,6 +688,10 @@ static int intel_config_set_pull(struct intel_pinctrl *pctrl, unsigned int pin, case PIN_CONFIG_BIAS_PULL_DOWN: value &= ~(PADCFG1_TERM_UP | PADCFG1_TERM_MASK); + /* Set default strength value in case none is given */ + if (arg == 1) + arg = 5000; + switch (arg) { case 20000: value |= PADCFG1_TERM_20K << PADCFG1_TERM_SHIFT; From 443ae3655f8ce6a611cdcdf0122a251468146054 Mon Sep 17 00:00:00 2001 From: Andrew Jeffery Date: Thu, 22 Oct 2020 01:43:59 +0100 Subject: [PATCH 087/151] ARM: 9019/1: kprobes: Avoid fortify_panic() when copying optprobe template [ Upstream commit 9fa2e7af3d53a4b769136eccc32c02e128a4ee51 ] Setting both CONFIG_KPROBES=y and CONFIG_FORTIFY_SOURCE=y on ARM leads to a panic in memcpy() when injecting a kprobe despite the fixes found in commit e46daee53bb5 ("ARM: 8806/1: kprobes: Fix false positive with FORTIFY_SOURCE") and commit 0ac569bf6a79 ("ARM: 8834/1: Fix: kprobes: optimized kprobes illegal instruction"). arch/arm/include/asm/kprobes.h effectively declares the target type of the optprobe_template_entry assembly label as a u32 which leads memcpy()'s __builtin_object_size() call to determine that the pointed-to object is of size four. However, the symbol is used as a handle for the optimised probe assembly template that is at least 96 bytes in size. The symbol's use despite its type blows up the memcpy() in ARM's arch_prepare_optimized_kprobe() with a false-positive fortify_panic() when it should instead copy the optimised probe template into place: ``` $ sudo perf probe -a aspeed_g6_pinctrl_probe [ 158.457252] detected buffer overflow in memcpy [ 158.458069] ------------[ cut here ]------------ [ 158.458283] kernel BUG at lib/string.c:1153! [ 158.458436] Internal error: Oops - BUG: 0 [#1] SMP ARM [ 158.458768] Modules linked in: [ 158.459043] CPU: 1 PID: 99 Comm: perf Not tainted 5.9.0-rc7-00038-gc53ebf8167e9 #158 [ 158.459296] Hardware name: Generic DT based system [ 158.459529] PC is at fortify_panic+0x18/0x20 [ 158.459658] LR is at __irq_work_queue_local+0x3c/0x74 [ 158.459831] pc : [<8047451c>] lr : [<8020ecd4>] psr: 60000013 [ 158.460032] sp : be2d1d50 ip : be2d1c58 fp : be2d1d5c [ 158.460174] r10: 00000006 r9 : 00000000 r8 : 00000060 [ 158.460348] r7 : 8011e434 r6 : b9e0b800 r5 : 7f000000 r4 : b9fe4f0c [ 158.460557] r3 : 80c04cc8 r2 : 00000000 r1 : be7c03cc r0 : 00000022 [ 158.460801] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 158.461037] Control: 10c5387d Table: b9cd806a DAC: 00000051 [ 158.461251] Process perf (pid: 99, stack limit = 0x81c71a69) [ 158.461472] Stack: (0xbe2d1d50 to 0xbe2d2000) [ 158.461757] 1d40: be2d1d84 be2d1d60 8011e724 80474510 [ 158.462104] 1d60: b9e0b800 b9fe4f0c 00000000 b9fe4f14 80c8ec80 be235000 be2d1d9c be2d1d88 [ 158.462436] 1d80: 801cee44 8011e57c b9fe4f0c 00000000 be2d1dc4 be2d1da0 801d0ad0 801cedec [ 158.462742] 1da0: 00000000 00000000 b9fe4f00 ffffffea 00000000 be235000 be2d1de4 be2d1dc8 [ 158.463087] 1dc0: 80204604 801d0738 00000000 00000000 b9fe4004 ffffffea be2d1e94 be2d1de8 [ 158.463428] 1de0: 80205434 80204570 00385c00 00000000 00000000 00000000 be2d1e14 be2d1e08 [ 158.463880] 1e00: 802ba014 b9fe4f00 b9e718c0 b9fe4f84 b9e71ec8 be2d1e24 00000000 00385c00 [ 158.464365] 1e20: 00000000 626f7270 00000065 802b905c be2d1e94 0000002e 00000000 802b9914 [ 158.464829] 1e40: be2d1e84 be2d1e50 802b9914 8028ff78 804629d0 b9e71ec0 0000002e b9e71ec0 [ 158.465141] 1e60: be2d1ea8 80c04cc8 00000cc0 b9e713c4 00000002 80205834 80205834 0000002e [ 158.465488] 1e80: be235000 be235000 be2d1ea4 be2d1e98 80205854 80204e94 be2d1ecc be2d1ea8 [ 158.465806] 1ea0: 801ee4a0 80205840 00000002 80c04cc8 00000000 0000002e 0000002e 00000000 [ 158.466110] 1ec0: be2d1f0c be2d1ed0 801ee5c8 801ee428 00000000 be2d0000 006b1fd0 00000051 [ 158.466398] 1ee0: 00000000 b9eedf00 0000002e 80204410 006b1fd0 be2d1f60 00000000 00000004 [ 158.466763] 1f00: be2d1f24 be2d1f10 8020442c 801ee4c4 80205834 802c613c be2d1f5c be2d1f28 [ 158.467102] 1f20: 802c60ac 8020441c be2d1fac be2d1f38 8010c764 802e9888 be2d1f5c b9eedf00 [ 158.467447] 1f40: b9eedf00 006b1fd0 0000002e 00000000 be2d1f94 be2d1f60 802c634c 802c5fec [ 158.467812] 1f60: 00000000 00000000 00000000 80c04cc8 006b1fd0 00000003 76f7a610 00000004 [ 158.468155] 1f80: 80100284 be2d0000 be2d1fa4 be2d1f98 802c63ec 802c62e8 00000000 be2d1fa8 [ 158.468508] 1fa0: 80100080 802c63e0 006b1fd0 00000003 00000003 006b1fd0 0000002e 00000000 [ 158.468858] 1fc0: 006b1fd0 00000003 76f7a610 00000004 006b1fb0 0026d348 00000017 7ef2738c [ 158.469202] 1fe0: 76f3431c 7ef272d8 0014ec50 76f34338 60000010 00000003 00000000 00000000 [ 158.469461] Backtrace: [ 158.469683] [<80474504>] (fortify_panic) from [<8011e724>] (arch_prepare_optimized_kprobe+0x1b4/0x1f8) [ 158.470021] [<8011e570>] (arch_prepare_optimized_kprobe) from [<801cee44>] (alloc_aggr_kprobe+0x64/0x70) [ 158.470287] r9:be235000 r8:80c8ec80 r7:b9fe4f14 r6:00000000 r5:b9fe4f0c r4:b9e0b800 [ 158.470478] [<801cede0>] (alloc_aggr_kprobe) from [<801d0ad0>] (register_kprobe+0x3a4/0x5a0) [ 158.470685] r5:00000000 r4:b9fe4f0c [ 158.470790] [<801d072c>] (register_kprobe) from [<80204604>] (__register_trace_kprobe+0xa0/0xa4) [ 158.471001] r9:be235000 r8:00000000 r7:ffffffea r6:b9fe4f00 r5:00000000 r4:00000000 [ 158.471188] [<80204564>] (__register_trace_kprobe) from [<80205434>] (trace_kprobe_create+0x5ac/0x9ac) [ 158.471408] r7:ffffffea r6:b9fe4004 r5:00000000 r4:00000000 [ 158.471553] [<80204e88>] (trace_kprobe_create) from [<80205854>] (create_or_delete_trace_kprobe+0x20/0x3c) [ 158.471766] r10:be235000 r9:be235000 r8:0000002e r7:80205834 r6:80205834 r5:00000002 [ 158.471949] r4:b9e713c4 [ 158.472027] [<80205834>] (create_or_delete_trace_kprobe) from [<801ee4a0>] (trace_run_command+0x84/0x9c) [ 158.472255] [<801ee41c>] (trace_run_command) from [<801ee5c8>] (trace_parse_run_command+0x110/0x1f8) [ 158.472471] r6:00000000 r5:0000002e r4:0000002e [ 158.472594] [<801ee4b8>] (trace_parse_run_command) from [<8020442c>] (probes_write+0x1c/0x28) [ 158.472800] r10:00000004 r9:00000000 r8:be2d1f60 r7:006b1fd0 r6:80204410 r5:0000002e [ 158.472968] r4:b9eedf00 [ 158.473046] [<80204410>] (probes_write) from [<802c60ac>] (vfs_write+0xcc/0x1e8) [ 158.473226] [<802c5fe0>] (vfs_write) from [<802c634c>] (ksys_write+0x70/0xf8) [ 158.473400] r8:00000000 r7:0000002e r6:006b1fd0 r5:b9eedf00 r4:b9eedf00 [ 158.473567] [<802c62dc>] (ksys_write) from [<802c63ec>] (sys_write+0x18/0x1c) [ 158.473745] r9:be2d0000 r8:80100284 r7:00000004 r6:76f7a610 r5:00000003 r4:006b1fd0 [ 158.473932] [<802c63d4>] (sys_write) from [<80100080>] (ret_fast_syscall+0x0/0x54) [ 158.474126] Exception stack(0xbe2d1fa8 to 0xbe2d1ff0) [ 158.474305] 1fa0: 006b1fd0 00000003 00000003 006b1fd0 0000002e 00000000 [ 158.474573] 1fc0: 006b1fd0 00000003 76f7a610 00000004 006b1fb0 0026d348 00000017 7ef2738c [ 158.474811] 1fe0: 76f3431c 7ef272d8 0014ec50 76f34338 [ 158.475171] Code: e24cb004 e1a01000 e59f0004 ebf40dd3 (e7f001f2) [ 158.475847] ---[ end trace 55a5b31c08a29f00 ]--- [ 158.476088] Kernel panic - not syncing: Fatal exception [ 158.476375] CPU0: stopping [ 158.476709] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G D 5.9.0-rc7-00038-gc53ebf8167e9 #158 [ 158.477176] Hardware name: Generic DT based system [ 158.477411] Backtrace: [ 158.477604] [<8010dd28>] (dump_backtrace) from [<8010dfd4>] (show_stack+0x20/0x24) [ 158.477990] r7:00000000 r6:60000193 r5:00000000 r4:80c2f634 [ 158.478323] [<8010dfb4>] (show_stack) from [<8046390c>] (dump_stack+0xcc/0xe8) [ 158.478686] [<80463840>] (dump_stack) from [<80110750>] (handle_IPI+0x334/0x3a0) [ 158.479063] r7:00000000 r6:00000004 r5:80b65cc8 r4:80c78278 [ 158.479352] [<8011041c>] (handle_IPI) from [<801013f8>] (gic_handle_irq+0x88/0x94) [ 158.479757] r10:10c5387d r9:80c01ed8 r8:00000000 r7:c0802000 r6:80c0537c r5:000003ff [ 158.480146] r4:c080200c r3:fffffff4 [ 158.480364] [<80101370>] (gic_handle_irq) from [<80100b6c>] (__irq_svc+0x6c/0x90) [ 158.480748] Exception stack(0x80c01ed8 to 0x80c01f20) [ 158.481031] 1ec0: 000128bc 00000000 [ 158.481499] 1ee0: be7b8174 8011d3a0 80c00000 00000000 80c04cec 80c04d28 80c5d7c2 80a026d4 [ 158.482091] 1f00: 10c5387d 80c01f34 80c01f38 80c01f28 80109554 80109558 60000013 ffffffff [ 158.482621] r9:80c00000 r8:80c5d7c2 r7:80c01f0c r6:ffffffff r5:60000013 r4:80109558 [ 158.482983] [<80109518>] (arch_cpu_idle) from [<80818780>] (default_idle_call+0x38/0x120) [ 158.483360] [<80818748>] (default_idle_call) from [<801585a8>] (do_idle+0xd4/0x158) [ 158.483945] r5:00000000 r4:80c00000 [ 158.484237] [<801584d4>] (do_idle) from [<801588f4>] (cpu_startup_entry+0x28/0x2c) [ 158.484784] r9:80c78000 r8:00000000 r7:80c78000 r6:80c78040 r5:80c04cc0 r4:000000d6 [ 158.485328] [<801588cc>] (cpu_startup_entry) from [<80810a78>] (rest_init+0x9c/0xbc) [ 158.485930] [<808109dc>] (rest_init) from [<80b00ae4>] (arch_call_rest_init+0x18/0x1c) [ 158.486503] r5:80c04cc0 r4:00000001 [ 158.486857] [<80b00acc>] (arch_call_rest_init) from [<80b00fcc>] (start_kernel+0x46c/0x548) [ 158.487589] [<80b00b60>] (start_kernel) from [<00000000>] (0x0) ``` Fixes: e46daee53bb5 ("ARM: 8806/1: kprobes: Fix false positive with FORTIFY_SOURCE") Fixes: 0ac569bf6a79 ("ARM: 8834/1: Fix: kprobes: optimized kprobes illegal instruction") Suggested-by: Kees Cook Signed-off-by: Andrew Jeffery Tested-by: Luka Oreskovic Tested-by: Joel Stanley Reviewed-by: Joel Stanley Acked-by: Masami Hiramatsu Cc: Luka Oreskovic Cc: Juraj Vijtiuk Signed-off-by: Russell King Signed-off-by: Sasha Levin --- arch/arm/include/asm/kprobes.h | 22 +++++++++++----------- arch/arm/probes/kprobes/opt-arm.c | 18 +++++++++--------- 2 files changed, 20 insertions(+), 20 deletions(-) diff --git a/arch/arm/include/asm/kprobes.h b/arch/arm/include/asm/kprobes.h index 213607a1f45c..e26a278d301a 100644 --- a/arch/arm/include/asm/kprobes.h +++ b/arch/arm/include/asm/kprobes.h @@ -44,20 +44,20 @@ int kprobe_exceptions_notify(struct notifier_block *self, unsigned long val, void *data); /* optinsn template addresses */ -extern __visible kprobe_opcode_t optprobe_template_entry; -extern __visible kprobe_opcode_t optprobe_template_val; -extern __visible kprobe_opcode_t optprobe_template_call; -extern __visible kprobe_opcode_t optprobe_template_end; -extern __visible kprobe_opcode_t optprobe_template_sub_sp; -extern __visible kprobe_opcode_t optprobe_template_add_sp; -extern __visible kprobe_opcode_t optprobe_template_restore_begin; -extern __visible kprobe_opcode_t optprobe_template_restore_orig_insn; -extern __visible kprobe_opcode_t optprobe_template_restore_end; +extern __visible kprobe_opcode_t optprobe_template_entry[]; +extern __visible kprobe_opcode_t optprobe_template_val[]; +extern __visible kprobe_opcode_t optprobe_template_call[]; +extern __visible kprobe_opcode_t optprobe_template_end[]; +extern __visible kprobe_opcode_t optprobe_template_sub_sp[]; +extern __visible kprobe_opcode_t optprobe_template_add_sp[]; +extern __visible kprobe_opcode_t optprobe_template_restore_begin[]; +extern __visible kprobe_opcode_t optprobe_template_restore_orig_insn[]; +extern __visible kprobe_opcode_t optprobe_template_restore_end[]; #define MAX_OPTIMIZED_LENGTH 4 #define MAX_OPTINSN_SIZE \ - ((unsigned long)&optprobe_template_end - \ - (unsigned long)&optprobe_template_entry) + ((unsigned long)optprobe_template_end - \ + (unsigned long)optprobe_template_entry) #define RELATIVEJUMP_SIZE 4 struct arch_optimized_insn { diff --git a/arch/arm/probes/kprobes/opt-arm.c b/arch/arm/probes/kprobes/opt-arm.c index 7a449df0b359..c78180172120 100644 --- a/arch/arm/probes/kprobes/opt-arm.c +++ b/arch/arm/probes/kprobes/opt-arm.c @@ -85,21 +85,21 @@ asm ( "optprobe_template_end:\n"); #define TMPL_VAL_IDX \ - ((unsigned long *)&optprobe_template_val - (unsigned long *)&optprobe_template_entry) + ((unsigned long *)optprobe_template_val - (unsigned long *)optprobe_template_entry) #define TMPL_CALL_IDX \ - ((unsigned long *)&optprobe_template_call - (unsigned long *)&optprobe_template_entry) + ((unsigned long *)optprobe_template_call - (unsigned long *)optprobe_template_entry) #define TMPL_END_IDX \ - ((unsigned long *)&optprobe_template_end - (unsigned long *)&optprobe_template_entry) + ((unsigned long *)optprobe_template_end - (unsigned long *)optprobe_template_entry) #define TMPL_ADD_SP \ - ((unsigned long *)&optprobe_template_add_sp - (unsigned long *)&optprobe_template_entry) + ((unsigned long *)optprobe_template_add_sp - (unsigned long *)optprobe_template_entry) #define TMPL_SUB_SP \ - ((unsigned long *)&optprobe_template_sub_sp - (unsigned long *)&optprobe_template_entry) + ((unsigned long *)optprobe_template_sub_sp - (unsigned long *)optprobe_template_entry) #define TMPL_RESTORE_BEGIN \ - ((unsigned long *)&optprobe_template_restore_begin - (unsigned long *)&optprobe_template_entry) + ((unsigned long *)optprobe_template_restore_begin - (unsigned long *)optprobe_template_entry) #define TMPL_RESTORE_ORIGN_INSN \ - ((unsigned long *)&optprobe_template_restore_orig_insn - (unsigned long *)&optprobe_template_entry) + ((unsigned long *)optprobe_template_restore_orig_insn - (unsigned long *)optprobe_template_entry) #define TMPL_RESTORE_END \ - ((unsigned long *)&optprobe_template_restore_end - (unsigned long *)&optprobe_template_entry) + ((unsigned long *)optprobe_template_restore_end - (unsigned long *)optprobe_template_entry) /* * ARM can always optimize an instruction when using ARM ISA, except @@ -234,7 +234,7 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *or } /* Copy arch-dep-instance from template. */ - memcpy(code, (unsigned long *)&optprobe_template_entry, + memcpy(code, (unsigned long *)optprobe_template_entry, TMPL_END_IDX * sizeof(kprobe_opcode_t)); /* Adjust buffer according to instruction. */ From d2e61c5202e63fd591023525965a7379ba493385 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Wed, 28 Oct 2020 18:15:05 +0100 Subject: [PATCH 088/151] bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE [ Upstream commit 080b6f40763565f65ebb9540219c71ce885cf568 ] Commit 3193c0836 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()") introduced a __no_fgcse macro that expands to a function scope __attribute__((optimize("-fno-gcse"))), to disable a GCC specific optimization that was causing trouble on x86 builds, and was not expected to have any positive effect in the first place. However, as the GCC manual documents, __attribute__((optimize)) is not for production use, and results in all other optimization options to be forgotten for the function in question. This can cause all kinds of trouble, but in one particular reported case, it causes -fno-asynchronous-unwind-tables to be disregarded, resulting in .eh_frame info to be emitted for the function. This reverts commit 3193c0836, and instead, it disables the -fgcse optimization for the entire source file, but only when building for X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the original commit states that CONFIG_RETPOLINE=n triggers the issue, whereas CONFIG_RETPOLINE=y performs better without the optimization, so it is kept disabled in both cases. Fixes: 3193c0836f20 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()") Signed-off-by: Ard Biesheuvel Signed-off-by: Alexei Starovoitov Tested-by: Geert Uytterhoeven Reviewed-by: Nick Desaulniers Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@mail.gmail.com/ Link: https://lore.kernel.org/bpf/20201028171506.15682-2-ardb@kernel.org Signed-off-by: Sasha Levin --- include/linux/compiler-gcc.h | 2 -- include/linux/compiler_types.h | 4 ---- kernel/bpf/Makefile | 6 +++++- kernel/bpf/core.c | 2 +- 4 files changed, 6 insertions(+), 8 deletions(-) diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index d7ee4c6bad48..e8579412ad21 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -170,5 +170,3 @@ #else #define __diag_GCC_8(s) #endif - -#define __no_fgcse __attribute__((optimize("-fno-gcse"))) diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 72393a8c1a6c..77433633572e 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -212,10 +212,6 @@ struct ftrace_likely_data { #define asm_inline asm #endif -#ifndef __no_fgcse -# define __no_fgcse -#endif - /* Are two types/vars the same type (ignoring qualifiers)? */ #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index e1d9adb212f9..b0d78bc0b197 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -1,6 +1,10 @@ # SPDX-License-Identifier: GPL-2.0 obj-y := core.o -CFLAGS_core.o += $(call cc-disable-warning, override-init) +ifneq ($(CONFIG_BPF_JIT_ALWAYS_ON),y) +# ___bpf_prog_run() needs GCSE disabled on x86; see 3193c0836f203 for details +cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse +endif +CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy) obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index ef0e1e3e66f4..56bc96f5ad20 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -1299,7 +1299,7 @@ bool bpf_opcode_in_insntable(u8 code) * * Decode and execute eBPF instructions. */ -static u64 __no_fgcse ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack) +static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack) { #define BPF_INSN_2_LBL(x, y) [BPF_##x | BPF_##y] = &&x##_##y #define BPF_INSN_3_LBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = &&x##_##y##_##z From e74e514c8ccaf05dd717144fb351f03795637a59 Mon Sep 17 00:00:00 2001 From: Billy Tsai Date: Fri, 30 Oct 2020 13:54:50 +0800 Subject: [PATCH 089/151] pinctrl: aspeed: Fix GPI only function problem. [ Upstream commit 9b92f5c51e9a41352d665f6f956bd95085a56a83 ] Some gpio pin at aspeed soc is input only and the prefix name of these pin is "GPI" only. This patch fine-tune the condition of GPIO check from "GPIO" to "GPI" and it will fix the usage error of banks D and E in the AST2400/AST2500 and banks T and U in the AST2600. Fixes: 4d3d0e4272d8 ("pinctrl: Add core support for Aspeed SoCs") Signed-off-by: Billy Tsai Reviewed-by: Andrew Jeffery Link: https://lore.kernel.org/r/20201030055450.29613-1-billy_tsai@aspeedtech.com Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin --- drivers/pinctrl/aspeed/pinctrl-aspeed.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/pinctrl/aspeed/pinctrl-aspeed.c b/drivers/pinctrl/aspeed/pinctrl-aspeed.c index 54933665b5f8..93b5654ff282 100644 --- a/drivers/pinctrl/aspeed/pinctrl-aspeed.c +++ b/drivers/pinctrl/aspeed/pinctrl-aspeed.c @@ -277,13 +277,14 @@ int aspeed_pinmux_set_mux(struct pinctrl_dev *pctldev, unsigned int function, static bool aspeed_expr_is_gpio(const struct aspeed_sig_expr *expr) { /* - * The signal type is GPIO if the signal name has "GPIO" as a prefix. + * The signal type is GPIO if the signal name has "GPI" as a prefix. * strncmp (rather than strcmp) is used to implement the prefix * requirement. * - * expr->signal might look like "GPIOT3" in the GPIO case. + * expr->signal might look like "GPIOB1" in the GPIO case. + * expr->signal might look like "GPIT0" in the GPI case. */ - return strncmp(expr->signal, "GPIO", 4) == 0; + return strncmp(expr->signal, "GPI", 3) == 0; } static bool aspeed_gpio_in_exprs(const struct aspeed_sig_expr **exprs) From b9e8f9d139bd7ea83e2e896ea08ae757cad5e7a8 Mon Sep 17 00:00:00 2001 From: Maor Gottlieb Date: Wed, 21 Oct 2020 08:42:49 +0300 Subject: [PATCH 090/151] net/mlx5: Fix deletion of duplicate rules [ Upstream commit 465e7baab6d93b399344f5868f84c177ab5cd16f ] When a rule is duplicated, the refcount of the rule is increased so only the second deletion of the rule should cause destruction of the FTE. Currently, the FTE will be destroyed in the first deletion of rule since the modify_mask will be 0. Fix it and call to destroy FTE only if all the rules (FTE's children) have been removed. Fixes: 718ce4d601db ("net/mlx5: Consolidate update FTE for all removal changes") Signed-off-by: Maor Gottlieb Reviewed-by: Mark Bloch Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin --- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 9ac2f52187ea..16511f648553 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -1923,10 +1923,11 @@ void mlx5_del_flow_rules(struct mlx5_flow_handle *handle) down_write_ref_node(&fte->node, false); for (i = handle->num_rules - 1; i >= 0; i--) tree_remove_node(&handle->rule[i]->node, true); - if (fte->modify_mask && fte->dests_size) { - modify_fte(fte); + if (fte->dests_size) { + if (fte->modify_mask) + modify_fte(fte); up_write_ref_node(&fte->node, false); - } else { + } else if (list_empty(&fte->node.children)) { del_hw_fte(&fte->node); /* Avoid double call to del_hw_fte */ fte->node.del_hw_func = NULL; From dfcb33773877f31a40ba0583e622935ef105c90b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 23 Oct 2020 10:41:07 -0400 Subject: [PATCH 091/151] SUNRPC: Fix general protection fault in trace_rpc_xdr_overflow() [ Upstream commit d321ff589c16d8c2207485a6d7fbdb14e873d46e ] The TP_fast_assign() section is careful enough not to dereference xdr->rqst if it's NULL. The TP_STRUCT__entry section is not. Fixes: 5582863f450c ("SUNRPC: Add XDR overflow trace event") Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Sasha Levin --- include/trace/events/sunrpc.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 28df77a948e5..f16e9fb97e9f 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -357,10 +357,10 @@ TRACE_EVENT(rpc_xdr_overflow, __field(size_t, tail_len) __field(unsigned int, page_len) __field(unsigned int, len) - __string(progname, - xdr->rqst->rq_task->tk_client->cl_program->name) - __string(procedure, - xdr->rqst->rq_task->tk_msg.rpc_proc->p_name) + __string(progname, xdr->rqst ? + xdr->rqst->rq_task->tk_client->cl_program->name : "unknown") + __string(procedure, xdr->rqst ? + xdr->rqst->rq_task->tk_msg.rpc_proc->p_name : "unknown") ), TP_fast_assign( From c602ad2b52dcbca5af08e5137bd5575c039b52e3 Mon Sep 17 00:00:00 2001 From: David Verbeiren Date: Wed, 4 Nov 2020 12:23:32 +0100 Subject: [PATCH 092/151] bpf: Zero-fill re-used per-cpu map element [ Upstream commit d3bec0138bfbe58606fc1d6f57a4cdc1a20218db ] Zero-fill element values for all other cpus than current, just as when not using prealloc. This is the only way the bpf program can ensure known initial values for all cpus ('onallcpus' cannot be set when coming from the bpf program). The scenario is: bpf program inserts some elements in a per-cpu map, then deletes some (or userspace does). When later adding new elements using bpf_map_update_elem(), the bpf program can only set the value of the new elements for the current cpu. When prealloc is enabled, previously deleted elements are re-used. Without the fix, values for other cpus remain whatever they were when the re-used entry was previously freed. A selftest is added to validate correct operation in above scenario as well as in case of LRU per-cpu map element re-use. Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements") Signed-off-by: David Verbeiren Signed-off-by: Alexei Starovoitov Acked-by: Matthieu Baerts Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20201104112332.15191-1-david.verbeiren@tessares.net Signed-off-by: Sasha Levin --- kernel/bpf/hashtab.c | 30 ++- .../selftests/bpf/prog_tests/map_init.c | 214 ++++++++++++++++++ .../selftests/bpf/progs/test_map_init.c | 33 +++ 3 files changed, 275 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/map_init.c create mode 100644 tools/testing/selftests/bpf/progs/test_map_init.c diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 728ffec52cf3..03a67583f6fb 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -709,6 +709,32 @@ static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr, } } +static void pcpu_init_value(struct bpf_htab *htab, void __percpu *pptr, + void *value, bool onallcpus) +{ + /* When using prealloc and not setting the initial value on all cpus, + * zero-fill element values for other cpus (just as what happens when + * not using prealloc). Otherwise, bpf program has no way to ensure + * known initial values for cpus other than current one + * (onallcpus=false always when coming from bpf prog). + */ + if (htab_is_prealloc(htab) && !onallcpus) { + u32 size = round_up(htab->map.value_size, 8); + int current_cpu = raw_smp_processor_id(); + int cpu; + + for_each_possible_cpu(cpu) { + if (cpu == current_cpu) + bpf_long_memcpy(per_cpu_ptr(pptr, cpu), value, + size); + else + memset(per_cpu_ptr(pptr, cpu), 0, size); + } + } else { + pcpu_copy_value(htab, pptr, value, onallcpus); + } +} + static bool fd_htab_map_needs_adjust(const struct bpf_htab *htab) { return htab->map.map_type == BPF_MAP_TYPE_HASH_OF_MAPS && @@ -779,7 +805,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, } } - pcpu_copy_value(htab, pptr, value, onallcpus); + pcpu_init_value(htab, pptr, value, onallcpus); if (!prealloc) htab_elem_set_ptr(l_new, key_size, pptr); @@ -1075,7 +1101,7 @@ static int __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key, pcpu_copy_value(htab, htab_elem_get_ptr(l_old, key_size), value, onallcpus); } else { - pcpu_copy_value(htab, htab_elem_get_ptr(l_new, key_size), + pcpu_init_value(htab, htab_elem_get_ptr(l_new, key_size), value, onallcpus); hlist_nulls_add_head_rcu(&l_new->hash_node, head); l_new = NULL; diff --git a/tools/testing/selftests/bpf/prog_tests/map_init.c b/tools/testing/selftests/bpf/prog_tests/map_init.c new file mode 100644 index 000000000000..14a31109dd0e --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/map_init.c @@ -0,0 +1,214 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2020 Tessares SA */ + +#include +#include "test_map_init.skel.h" + +#define TEST_VALUE 0x1234 +#define FILL_VALUE 0xdeadbeef + +static int nr_cpus; +static int duration; + +typedef unsigned long long map_key_t; +typedef unsigned long long map_value_t; +typedef struct { + map_value_t v; /* padding */ +} __bpf_percpu_val_align pcpu_map_value_t; + + +static int map_populate(int map_fd, int num) +{ + pcpu_map_value_t value[nr_cpus]; + int i, err; + map_key_t key; + + for (i = 0; i < nr_cpus; i++) + bpf_percpu(value, i) = FILL_VALUE; + + for (key = 1; key <= num; key++) { + err = bpf_map_update_elem(map_fd, &key, value, BPF_NOEXIST); + if (!ASSERT_OK(err, "bpf_map_update_elem")) + return -1; + } + + return 0; +} + +static struct test_map_init *setup(enum bpf_map_type map_type, int map_sz, + int *map_fd, int populate) +{ + struct test_map_init *skel; + int err; + + skel = test_map_init__open(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + return NULL; + + err = bpf_map__set_type(skel->maps.hashmap1, map_type); + if (!ASSERT_OK(err, "bpf_map__set_type")) + goto error; + + err = bpf_map__set_max_entries(skel->maps.hashmap1, map_sz); + if (!ASSERT_OK(err, "bpf_map__set_max_entries")) + goto error; + + err = test_map_init__load(skel); + if (!ASSERT_OK(err, "skel_load")) + goto error; + + *map_fd = bpf_map__fd(skel->maps.hashmap1); + if (CHECK(*map_fd < 0, "bpf_map__fd", "failed\n")) + goto error; + + err = map_populate(*map_fd, populate); + if (!ASSERT_OK(err, "map_populate")) + goto error_map; + + return skel; + +error_map: + close(*map_fd); +error: + test_map_init__destroy(skel); + return NULL; +} + +/* executes bpf program that updates map with key, value */ +static int prog_run_insert_elem(struct test_map_init *skel, map_key_t key, + map_value_t value) +{ + struct test_map_init__bss *bss; + + bss = skel->bss; + + bss->inKey = key; + bss->inValue = value; + bss->inPid = getpid(); + + if (!ASSERT_OK(test_map_init__attach(skel), "skel_attach")) + return -1; + + /* Let tracepoint trigger */ + syscall(__NR_getpgid); + + test_map_init__detach(skel); + + return 0; +} + +static int check_values_one_cpu(pcpu_map_value_t *value, map_value_t expected) +{ + int i, nzCnt = 0; + map_value_t val; + + for (i = 0; i < nr_cpus; i++) { + val = bpf_percpu(value, i); + if (val) { + if (CHECK(val != expected, "map value", + "unexpected for cpu %d: 0x%llx\n", i, val)) + return -1; + nzCnt++; + } + } + + if (CHECK(nzCnt != 1, "map value", "set for %d CPUs instead of 1!\n", + nzCnt)) + return -1; + + return 0; +} + +/* Add key=1 elem with values set for all CPUs + * Delete elem key=1 + * Run bpf prog that inserts new key=1 elem with value=0x1234 + * (bpf prog can only set value for current CPU) + * Lookup Key=1 and check value is as expected for all CPUs: + * value set by bpf prog for one CPU, 0 for all others + */ +static void test_pcpu_map_init(void) +{ + pcpu_map_value_t value[nr_cpus]; + struct test_map_init *skel; + int map_fd, err; + map_key_t key; + + /* max 1 elem in map so insertion is forced to reuse freed entry */ + skel = setup(BPF_MAP_TYPE_PERCPU_HASH, 1, &map_fd, 1); + if (!ASSERT_OK_PTR(skel, "prog_setup")) + return; + + /* delete element so the entry can be re-used*/ + key = 1; + err = bpf_map_delete_elem(map_fd, &key); + if (!ASSERT_OK(err, "bpf_map_delete_elem")) + goto cleanup; + + /* run bpf prog that inserts new elem, re-using the slot just freed */ + err = prog_run_insert_elem(skel, key, TEST_VALUE); + if (!ASSERT_OK(err, "prog_run_insert_elem")) + goto cleanup; + + /* check that key=1 was re-created by bpf prog */ + err = bpf_map_lookup_elem(map_fd, &key, value); + if (!ASSERT_OK(err, "bpf_map_lookup_elem")) + goto cleanup; + + /* and has expected values */ + check_values_one_cpu(value, TEST_VALUE); + +cleanup: + test_map_init__destroy(skel); +} + +/* Add key=1 and key=2 elems with values set for all CPUs + * Run bpf prog that inserts new key=3 elem + * (only for current cpu; other cpus should have initial value = 0) + * Lookup Key=1 and check value is as expected for all CPUs + */ +static void test_pcpu_lru_map_init(void) +{ + pcpu_map_value_t value[nr_cpus]; + struct test_map_init *skel; + int map_fd, err; + map_key_t key; + + /* Set up LRU map with 2 elements, values filled for all CPUs. + * With these 2 elements, the LRU map is full + */ + skel = setup(BPF_MAP_TYPE_LRU_PERCPU_HASH, 2, &map_fd, 2); + if (!ASSERT_OK_PTR(skel, "prog_setup")) + return; + + /* run bpf prog that inserts new key=3 element, re-using LRU slot */ + key = 3; + err = prog_run_insert_elem(skel, key, TEST_VALUE); + if (!ASSERT_OK(err, "prog_run_insert_elem")) + goto cleanup; + + /* check that key=3 replaced one of earlier elements */ + err = bpf_map_lookup_elem(map_fd, &key, value); + if (!ASSERT_OK(err, "bpf_map_lookup_elem")) + goto cleanup; + + /* and has expected values */ + check_values_one_cpu(value, TEST_VALUE); + +cleanup: + test_map_init__destroy(skel); +} + +void test_map_init(void) +{ + nr_cpus = bpf_num_possible_cpus(); + if (nr_cpus <= 1) { + printf("%s:SKIP: >1 cpu needed for this test\n", __func__); + test__skip(); + return; + } + + if (test__start_subtest("pcpu_map_init")) + test_pcpu_map_init(); + if (test__start_subtest("pcpu_lru_map_init")) + test_pcpu_lru_map_init(); +} diff --git a/tools/testing/selftests/bpf/progs/test_map_init.c b/tools/testing/selftests/bpf/progs/test_map_init.c new file mode 100644 index 000000000000..c89d28ead673 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_map_init.c @@ -0,0 +1,33 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2020 Tessares SA */ + +#include "vmlinux.h" +#include + +__u64 inKey = 0; +__u64 inValue = 0; +__u32 inPid = 0; + +struct { + __uint(type, BPF_MAP_TYPE_PERCPU_HASH); + __uint(max_entries, 2); + __type(key, __u64); + __type(value, __u64); +} hashmap1 SEC(".maps"); + + +SEC("tp/syscalls/sys_enter_getpgid") +int sysenter_getpgid(const void *ctx) +{ + /* Just do it for once, when called from our own test prog. This + * ensures the map value is only updated for a single CPU. + */ + int cur_pid = bpf_get_current_pid_tgid() >> 32; + + if (cur_pid == inPid) + bpf_map_update_elem(&hashmap1, &inKey, &inValue, BPF_NOEXIST); + + return 0; +} + +char _license[] SEC("license") = "GPL"; From 81dcfdb9a01566fd8cd31adce52fb4e448bfd15d Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 9 Nov 2020 18:30:59 +0100 Subject: [PATCH 093/151] nbd: fix a block_device refcount leak in nbd_release [ Upstream commit 2bd645b2d3f0bacadaa6037f067538e1cd4e42ef ] bdget_disk needs to be paired with bdput to not leak a reference on the block device inode. Fixes: 08ba91ee6e2c ("nbd: Add the nbd NBD_DISCONNECT_ON_CLOSE config flag.") Signed-off-by: Christoph Hellwig Reviewed-by: Josef Bacik Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin --- drivers/block/nbd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 62a873718b5b..a3037fe54c3a 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -1503,6 +1503,7 @@ static void nbd_release(struct gendisk *disk, fmode_t mode) if (test_bit(NBD_RT_DISCONNECT_ON_CLOSE, &nbd->config->runtime_flags) && bdev->bd_openers == 0) nbd_disconnect_and_put(nbd); + bdput(bdev); nbd_config_put(nbd); nbd_put(nbd); From a8ee686597fb255f997e1f62eea8307a50aeea73 Mon Sep 17 00:00:00 2001 From: Vinicius Costa Gomes Date: Fri, 25 Sep 2020 11:35:37 -0700 Subject: [PATCH 094/151] igc: Fix returning wrong statistics [ Upstream commit 6b7ed22ae4c96a415001f0c3116ebee15bb8491a ] 'igc_update_stats()' was not updating 'netdev->stats', so the returned statistics, for example, requested by: $ ip -s link show dev enp3s0 were not being updated and were always zero. Fix by returning a set of statistics that are actually being updated (adapter->stats64). Fixes: c9a11c23ceb6 ("igc: Add netdev") Signed-off-by: Vinicius Costa Gomes Tested-by: Aaron Brown Signed-off-by: Tony Nguyen Signed-off-by: Sasha Levin --- drivers/net/ethernet/intel/igc/igc_main.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index 24888676f69b..6b43e1c5b1c3 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -2222,21 +2222,23 @@ static int igc_change_mtu(struct net_device *netdev, int new_mtu) } /** - * igc_get_stats - Get System Network Statistics + * igc_get_stats64 - Get System Network Statistics * @netdev: network interface device structure + * @stats: rtnl_link_stats64 pointer * * Returns the address of the device statistics structure. * The statistics are updated here and also from the timer callback. */ -static struct net_device_stats *igc_get_stats(struct net_device *netdev) +static void igc_get_stats64(struct net_device *netdev, + struct rtnl_link_stats64 *stats) { struct igc_adapter *adapter = netdev_priv(netdev); + spin_lock(&adapter->stats64_lock); if (!test_bit(__IGC_RESETTING, &adapter->state)) igc_update_stats(adapter); - - /* only return the current stats */ - return &netdev->stats; + memcpy(stats, &adapter->stats64, sizeof(*stats)); + spin_unlock(&adapter->stats64_lock); } static netdev_features_t igc_fix_features(struct net_device *netdev, @@ -3984,7 +3986,7 @@ static const struct net_device_ops igc_netdev_ops = { .ndo_start_xmit = igc_xmit_frame, .ndo_set_mac_address = igc_set_mac, .ndo_change_mtu = igc_change_mtu, - .ndo_get_stats = igc_get_stats, + .ndo_get_stats64 = igc_get_stats64, .ndo_fix_features = igc_fix_features, .ndo_set_features = igc_set_features, .ndo_features_check = igc_features_check, From 08e213bef2919a441bc1afbea69b06234ceea9fd Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Sun, 8 Nov 2020 16:32:43 -0800 Subject: [PATCH 095/151] xfs: fix flags argument to rmap lookup when converting shared file rmaps [ Upstream commit ea8439899c0b15a176664df62aff928010fad276 ] Pass the same oldext argument (which contains the existing rmapping's unwritten state) to xfs_rmap_lookup_le_range at the start of xfs_rmap_convert_shared. At this point in the code, flags is zero, which means that we perform lookups using the wrong key. Fixes: 3f165b334e51 ("xfs: convert unwritten status of reverse mappings for shared files") Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Sasha Levin --- fs/xfs/libxfs/xfs_rmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 38e9414878b3..9d3c67b654ca 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -1379,7 +1379,7 @@ xfs_rmap_convert_shared( * record for our insertion point. This will also give us the record for * start block contiguity tests. */ - error = xfs_rmap_lookup_le_range(cur, bno, owner, offset, flags, + error = xfs_rmap_lookup_le_range(cur, bno, owner, offset, oldext, &PREV, &i); if (error) goto done; From 3bd97b33be4151b1e37e8a229ca58bcf511e9c31 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Sun, 8 Nov 2020 16:32:43 -0800 Subject: [PATCH 096/151] xfs: set the unwritten bit in rmap lookup flags in xchk_bmap_get_rmapextents [ Upstream commit 5dda3897fd90783358c4c6115ef86047d8c8f503 ] When the bmbt scrubber is looking up rmap extents, we need to set the extent flags from the bmbt record fully. This will matter once we fix the rmap btree comparison functions to check those flags correctly. Fixes: d852657ccfc0 ("xfs: cross-reference reverse-mapping btree") Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Sasha Levin --- fs/xfs/scrub/bmap.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index 392fb4df5c12..ec580c0d70fa 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -113,6 +113,8 @@ xchk_bmap_get_rmap( if (info->whichfork == XFS_ATTR_FORK) rflags |= XFS_RMAP_ATTR_FORK; + if (irec->br_state == XFS_EXT_UNWRITTEN) + rflags |= XFS_RMAP_UNWRITTEN; /* * CoW staging extents are owned (on disk) by the refcountbt, so From 7cbf708b1b9a8f70065838bd32393d3db0cd2a63 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Sun, 8 Nov 2020 16:32:44 -0800 Subject: [PATCH 097/151] xfs: fix rmap key and record comparison functions [ Upstream commit 6ff646b2ceb0eec916101877f38da0b73e3a5b7f ] Keys for extent interval records in the reverse mapping btree are supposed to be computed as follows: (physical block, owner, fork, is_btree, is_unwritten, offset) This provides users the ability to look up a reverse mapping from a bmbt record -- start with the physical block; then if there are multiple records for the same block, move on to the owner; then the inode fork type; and so on to the file offset. However, the key comparison functions incorrectly remove the fork/btree/unwritten information that's encoded in the on-disk offset. This means that lookup comparisons are only done with: (physical block, owner, offset) This means that queries can return incorrect results. On consistent filesystems this hasn't been an issue because blocks are never shared between forks or with bmbt blocks; and are never unwritten. However, this bug means that online repair cannot always detect corruption in the key information in internal rmapbt nodes. Found by fuzzing keys[1].attrfork = ones on xfs/371. Fixes: 4b8ed67794fe ("xfs: add rmap btree operations") Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Sasha Levin --- fs/xfs/libxfs/xfs_rmap_btree.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c index fc78efa52c94..3780609c7860 100644 --- a/fs/xfs/libxfs/xfs_rmap_btree.c +++ b/fs/xfs/libxfs/xfs_rmap_btree.c @@ -243,8 +243,8 @@ xfs_rmapbt_key_diff( else if (y > x) return -1; - x = XFS_RMAP_OFF(be64_to_cpu(kp->rm_offset)); - y = rec->rm_offset; + x = be64_to_cpu(kp->rm_offset); + y = xfs_rmap_irec_offset_pack(rec); if (x > y) return 1; else if (y > x) @@ -275,8 +275,8 @@ xfs_rmapbt_diff_two_keys( else if (y > x) return -1; - x = XFS_RMAP_OFF(be64_to_cpu(kp1->rm_offset)); - y = XFS_RMAP_OFF(be64_to_cpu(kp2->rm_offset)); + x = be64_to_cpu(kp1->rm_offset); + y = be64_to_cpu(kp2->rm_offset); if (x > y) return 1; else if (y > x) @@ -390,8 +390,8 @@ xfs_rmapbt_keys_inorder( return 1; else if (a > b) return 0; - a = XFS_RMAP_OFF(be64_to_cpu(k1->rmap.rm_offset)); - b = XFS_RMAP_OFF(be64_to_cpu(k2->rmap.rm_offset)); + a = be64_to_cpu(k1->rmap.rm_offset); + b = be64_to_cpu(k2->rmap.rm_offset); if (a <= b) return 1; return 0; @@ -420,8 +420,8 @@ xfs_rmapbt_recs_inorder( return 1; else if (a > b) return 0; - a = XFS_RMAP_OFF(be64_to_cpu(r1->rmap.rm_offset)); - b = XFS_RMAP_OFF(be64_to_cpu(r2->rmap.rm_offset)); + a = be64_to_cpu(r1->rmap.rm_offset); + b = be64_to_cpu(r2->rmap.rm_offset); if (a <= b) return 1; return 0; From b45f52a20879b92af2de42ed4c52d0fd72571412 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Sun, 8 Nov 2020 16:32:42 -0800 Subject: [PATCH 098/151] xfs: fix brainos in the refcount scrubber's rmap fragment processor [ Upstream commit 54e9b09e153842ab5adb8a460b891e11b39e9c3d ] Fix some serious WTF in the reference count scrubber's rmap fragment processing. The code comment says that this loop is supposed to move all fragment records starting at or before bno onto the worklist, but there's no obvious reason why nr (the number of items added) should increment starting from 1, and breaking the loop when we've added the target number seems dubious since we could have more rmap fragments that should have been added to the worklist. This seems to manifest in xfs/411 when adding one to the refcount field. Fixes: dbde19da9637 ("xfs: cross-reference the rmapbt data with the refcountbt") Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Sasha Levin --- fs/xfs/scrub/refcount.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c index 0cab11a5d390..5c6b71b75ca1 100644 --- a/fs/xfs/scrub/refcount.c +++ b/fs/xfs/scrub/refcount.c @@ -170,7 +170,6 @@ xchk_refcountbt_process_rmap_fragments( */ INIT_LIST_HEAD(&worklist); rbno = NULLAGBLOCK; - nr = 1; /* Make sure the fragments actually /are/ in agbno order. */ bno = 0; @@ -184,15 +183,14 @@ xchk_refcountbt_process_rmap_fragments( * Find all the rmaps that start at or before the refc extent, * and put them on the worklist. */ + nr = 0; list_for_each_entry_safe(frag, n, &refchk->fragments, list) { - if (frag->rm.rm_startblock > refchk->bno) - goto done; + if (frag->rm.rm_startblock > refchk->bno || nr > target_nr) + break; bno = frag->rm.rm_startblock + frag->rm.rm_blockcount; if (bno < rbno) rbno = bno; list_move_tail(&frag->list, &worklist); - if (nr == target_nr) - break; nr++; } From 0e2ad69bd4b585b30db73f8b8c87e971be1ee9b7 Mon Sep 17 00:00:00 2001 From: Sven Van Asbroeck Date: Mon, 9 Nov 2020 15:38:28 -0500 Subject: [PATCH 099/151] lan743x: fix "BUG: invalid wait context" when setting rx mode [ Upstream commit 2b52a4b65bc8f14520fe6e996ea7fb3f7e400761 ] In the net core, the struct net_device_ops -> ndo_set_rx_mode() callback is called with the dev->addr_list_lock spinlock held. However, this driver's ndo_set_rx_mode callback eventually calls lan743x_dp_write(), which acquires a mutex. Mutex acquisition may sleep, and this is not allowed when holding a spinlock. Fix by removing the dp_lock mutex entirely. Its purpose is to prevent concurrent accesses to the data port. No concurrent accesses are possible, because the dev->addr_list_lock spinlock in the core only lets through one thread at a time. Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Sven Van Asbroeck Link: https://lore.kernel.org/r/20201109203828.5115-1-TheSven73@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/ethernet/microchip/lan743x_main.c | 12 +++--------- drivers/net/ethernet/microchip/lan743x_main.h | 3 --- 2 files changed, 3 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c index a43140f7b5eb..b8e0e08b79de 100644 --- a/drivers/net/ethernet/microchip/lan743x_main.c +++ b/drivers/net/ethernet/microchip/lan743x_main.c @@ -672,14 +672,12 @@ clean_up: static int lan743x_dp_write(struct lan743x_adapter *adapter, u32 select, u32 addr, u32 length, u32 *buf) { - int ret = -EIO; u32 dp_sel; int i; - mutex_lock(&adapter->dp_lock); if (lan743x_csr_wait_for_bit(adapter, DP_SEL, DP_SEL_DPRDY_, 1, 40, 100, 100)) - goto unlock; + return -EIO; dp_sel = lan743x_csr_read(adapter, DP_SEL); dp_sel &= ~DP_SEL_MASK_; dp_sel |= select; @@ -691,13 +689,10 @@ static int lan743x_dp_write(struct lan743x_adapter *adapter, lan743x_csr_write(adapter, DP_CMD, DP_CMD_WRITE_); if (lan743x_csr_wait_for_bit(adapter, DP_SEL, DP_SEL_DPRDY_, 1, 40, 100, 100)) - goto unlock; + return -EIO; } - ret = 0; -unlock: - mutex_unlock(&adapter->dp_lock); - return ret; + return 0; } static u32 lan743x_mac_mii_access(u16 id, u16 index, int read) @@ -2674,7 +2669,6 @@ static int lan743x_hardware_init(struct lan743x_adapter *adapter, adapter->intr.irq = adapter->pdev->irq; lan743x_csr_write(adapter, INT_EN_CLR, 0xFFFFFFFF); - mutex_init(&adapter->dp_lock); ret = lan743x_gpio_init(adapter); if (ret) diff --git a/drivers/net/ethernet/microchip/lan743x_main.h b/drivers/net/ethernet/microchip/lan743x_main.h index 3b02eeae5f45..1fbcef391098 100644 --- a/drivers/net/ethernet/microchip/lan743x_main.h +++ b/drivers/net/ethernet/microchip/lan743x_main.h @@ -706,9 +706,6 @@ struct lan743x_adapter { struct lan743x_csr csr; struct lan743x_intr intr; - /* lock, used to prevent concurrent access to data port */ - struct mutex dp_lock; - struct lan743x_gpio gpio; struct lan743x_ptp ptp; From f10d238aad937f924337870006e4e0cd9f265a0a Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 11 Nov 2020 08:07:37 -0800 Subject: [PATCH 100/151] xfs: fix a missing unlock on error in xfs_fs_map_blocks [ Upstream commit 2bd3fa793aaa7e98b74e3653fdcc72fa753913b5 ] We also need to drop the iolock when invalidate_inode_pages2 fails, not only on all other error or successful cases. Fixes: 527851124d10 ("xfs: implement pNFS export operations") Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Sasha Levin --- fs/xfs/xfs_pnfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c index a339bd5fa260..f63fe8d924a3 100644 --- a/fs/xfs/xfs_pnfs.c +++ b/fs/xfs/xfs_pnfs.c @@ -134,7 +134,7 @@ xfs_fs_map_blocks( goto out_unlock; error = invalidate_inode_pages2(inode->i_mapping); if (WARN_ON_ONCE(error)) - return error; + goto out_unlock; end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + length); offset_fsb = XFS_B_TO_FSBT(mp, offset); From c0a6cc9e11f4b291fe5c01d41f65e1834730d55f Mon Sep 17 00:00:00 2001 From: Evan Nimmo Date: Tue, 10 Nov 2020 15:28:25 +1300 Subject: [PATCH 101/151] of/address: Fix of_node memory leak in of_dma_is_coherent [ Upstream commit a5bea04fcc0b3c0aec71ee1fd58fd4ff7ee36177 ] Commit dabf6b36b83a ("of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc") added a check to of_dma_is_coherent which returns early if OF_DMA_DEFAULT_COHERENT is enabled. This results in the of_node_put() being skipped causing a memory leak. Moved the of_node_get() below this check so we now we only get the node if OF_DMA_DEFAULT_COHERENT is not enabled. Fixes: dabf6b36b83a ("of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc") Signed-off-by: Evan Nimmo Link: https://lore.kernel.org/r/20201110022825.30895-1-evan.nimmo@alliedtelesis.co.nz Signed-off-by: Rob Herring Signed-off-by: Sasha Levin --- drivers/of/address.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/of/address.c b/drivers/of/address.c index 8f74c4626e0e..5abb056b2b51 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -1003,11 +1003,13 @@ EXPORT_SYMBOL_GPL(of_dma_get_range); */ bool of_dma_is_coherent(struct device_node *np) { - struct device_node *node = of_node_get(np); + struct device_node *node; if (IS_ENABLED(CONFIG_OF_DMA_DEFAULT_COHERENT)) return true; + node = of_node_get(np); + while (node) { if (of_property_read_bool(node, "dma-coherent")) { of_node_put(node); From 2ab9c76986e405a0e489baaf33886b206eecbf4d Mon Sep 17 00:00:00 2001 From: Wang Hai Date: Tue, 10 Nov 2020 22:46:14 +0800 Subject: [PATCH 102/151] cosa: Add missing kfree in error path of cosa_write [ Upstream commit 52755b66ddcef2e897778fac5656df18817b59ab ] If memory allocation for 'kbuf' succeed, cosa_write() doesn't have a corresponding kfree() in exception handling. Thus add kfree() for this function implementation. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Hulk Robot Signed-off-by: Wang Hai Acked-by: Jan "Yenya" Kasprzak Link: https://lore.kernel.org/r/20201110144614.43194-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/wan/cosa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/wan/cosa.c b/drivers/net/wan/cosa.c index af539151d663..61428076f32e 100644 --- a/drivers/net/wan/cosa.c +++ b/drivers/net/wan/cosa.c @@ -889,6 +889,7 @@ static ssize_t cosa_write(struct file *file, chan->tx_status = 1; spin_unlock_irqrestore(&cosa->lock, flags); up(&chan->wsem); + kfree(kbuf); return -ERESTARTSYS; } } From 70867a9dbf57fe0d10757e0249370df664738174 Mon Sep 17 00:00:00 2001 From: Martin Willi Date: Fri, 6 Nov 2020 08:30:30 +0100 Subject: [PATCH 103/151] vrf: Fix fast path output packet handling with async Netfilter rules [ Upstream commit 9e2b7fa2df4365e99934901da4fb4af52d81e820 ] VRF devices use an optimized direct path on output if a default qdisc is involved, calling Netfilter hooks directly. This path, however, does not consider Netfilter rules completing asynchronously, such as with NFQUEUE. The Netfilter okfn() is called for asynchronously accepted packets, but the VRF never passes that packet down the stack to send it out over the slave device. Using the slower redirect path for this seems not feasible, as we do not know beforehand if a Netfilter hook has asynchronously completing rules. Fix the use of asynchronously completing Netfilter rules in OUTPUT and POSTROUTING by using a special completion function that additionally calls dst_output() to pass the packet down the stack. Also, slightly adjust the use of nf_reset_ct() so that is called in the asynchronous case, too. Fixes: dcdd43c41e60 ("net: vrf: performance improvements for IPv4") Fixes: a9ec54d1b0cd ("net: vrf: performance improvements for IPv6") Signed-off-by: Martin Willi Link: https://lore.kernel.org/r/20201106073030.3974927-1-martin@strongswan.org Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/vrf.c | 92 +++++++++++++++++++++++++++++++++++------------ 1 file changed, 69 insertions(+), 23 deletions(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 6716deeb35e3..0c7d746c0330 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -332,8 +332,7 @@ static netdev_tx_t vrf_xmit(struct sk_buff *skb, struct net_device *dev) return ret; } -static int vrf_finish_direct(struct net *net, struct sock *sk, - struct sk_buff *skb) +static void vrf_finish_direct(struct sk_buff *skb) { struct net_device *vrf_dev = skb->dev; @@ -352,7 +351,8 @@ static int vrf_finish_direct(struct net *net, struct sock *sk, skb_pull(skb, ETH_HLEN); } - return 1; + /* reset skb device */ + nf_reset_ct(skb); } #if IS_ENABLED(CONFIG_IPV6) @@ -431,15 +431,41 @@ static struct sk_buff *vrf_ip6_out_redirect(struct net_device *vrf_dev, return skb; } +static int vrf_output6_direct_finish(struct net *net, struct sock *sk, + struct sk_buff *skb) +{ + vrf_finish_direct(skb); + + return vrf_ip6_local_out(net, sk, skb); +} + static int vrf_output6_direct(struct net *net, struct sock *sk, struct sk_buff *skb) { + int err = 1; + skb->protocol = htons(ETH_P_IPV6); - return NF_HOOK_COND(NFPROTO_IPV6, NF_INET_POST_ROUTING, - net, sk, skb, NULL, skb->dev, - vrf_finish_direct, - !(IPCB(skb)->flags & IPSKB_REROUTED)); + if (!(IPCB(skb)->flags & IPSKB_REROUTED)) + err = nf_hook(NFPROTO_IPV6, NF_INET_POST_ROUTING, net, sk, skb, + NULL, skb->dev, vrf_output6_direct_finish); + + if (likely(err == 1)) + vrf_finish_direct(skb); + + return err; +} + +static int vrf_ip6_out_direct_finish(struct net *net, struct sock *sk, + struct sk_buff *skb) +{ + int err; + + err = vrf_output6_direct(net, sk, skb); + if (likely(err == 1)) + err = vrf_ip6_local_out(net, sk, skb); + + return err; } static struct sk_buff *vrf_ip6_out_direct(struct net_device *vrf_dev, @@ -452,18 +478,15 @@ static struct sk_buff *vrf_ip6_out_direct(struct net_device *vrf_dev, skb->dev = vrf_dev; err = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net, sk, - skb, NULL, vrf_dev, vrf_output6_direct); + skb, NULL, vrf_dev, vrf_ip6_out_direct_finish); if (likely(err == 1)) err = vrf_output6_direct(net, sk, skb); - /* reset skb device */ if (likely(err == 1)) - nf_reset_ct(skb); - else - skb = NULL; + return skb; - return skb; + return NULL; } static struct sk_buff *vrf_ip6_out(struct net_device *vrf_dev, @@ -643,15 +666,41 @@ static struct sk_buff *vrf_ip_out_redirect(struct net_device *vrf_dev, return skb; } +static int vrf_output_direct_finish(struct net *net, struct sock *sk, + struct sk_buff *skb) +{ + vrf_finish_direct(skb); + + return vrf_ip_local_out(net, sk, skb); +} + static int vrf_output_direct(struct net *net, struct sock *sk, struct sk_buff *skb) { + int err = 1; + skb->protocol = htons(ETH_P_IP); - return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, - net, sk, skb, NULL, skb->dev, - vrf_finish_direct, - !(IPCB(skb)->flags & IPSKB_REROUTED)); + if (!(IPCB(skb)->flags & IPSKB_REROUTED)) + err = nf_hook(NFPROTO_IPV4, NF_INET_POST_ROUTING, net, sk, skb, + NULL, skb->dev, vrf_output_direct_finish); + + if (likely(err == 1)) + vrf_finish_direct(skb); + + return err; +} + +static int vrf_ip_out_direct_finish(struct net *net, struct sock *sk, + struct sk_buff *skb) +{ + int err; + + err = vrf_output_direct(net, sk, skb); + if (likely(err == 1)) + err = vrf_ip_local_out(net, sk, skb); + + return err; } static struct sk_buff *vrf_ip_out_direct(struct net_device *vrf_dev, @@ -664,18 +713,15 @@ static struct sk_buff *vrf_ip_out_direct(struct net_device *vrf_dev, skb->dev = vrf_dev; err = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT, net, sk, - skb, NULL, vrf_dev, vrf_output_direct); + skb, NULL, vrf_dev, vrf_ip_out_direct_finish); if (likely(err == 1)) err = vrf_output_direct(net, sk, skb); - /* reset skb device */ if (likely(err == 1)) - nf_reset_ct(skb); - else - skb = NULL; + return skb; - return skb; + return NULL; } static struct sk_buff *vrf_ip_out(struct net_device *vrf_dev, From 09b0d47b7952b9fb60f9285c3f9c0db87cee50c8 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 30 Oct 2020 12:49:45 +0100 Subject: [PATCH 104/151] perf: Fix get_recursion_context() [ Upstream commit ce0f17fc93f63ee91428af10b7b2ddef38cd19e5 ] One should use in_serving_softirq() to detect SoftIRQ context. Fixes: 96f6d4444302 ("perf_counter: avoid recursion") Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20201030151955.120572175@infradead.org Signed-off-by: Sasha Levin --- kernel/events/internal.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/events/internal.h b/kernel/events/internal.h index 3aef4191798c..6e87b358e082 100644 --- a/kernel/events/internal.h +++ b/kernel/events/internal.h @@ -210,7 +210,7 @@ static inline int get_recursion_context(int *recursion) rctx = 3; else if (in_irq()) rctx = 2; - else if (in_softirq()) + else if (in_serving_softirq()) rctx = 1; else rctx = 0; From 52e3a55bc25384ea50e2a6a20bbca19ed72d9ec2 Mon Sep 17 00:00:00 2001 From: Gao Xiang Date: Sun, 1 Nov 2020 03:51:02 +0800 Subject: [PATCH 105/151] erofs: derive atime instead of leaving it empty commit d3938ee23e97bfcac2e0eb6b356875da73d700df upstream. EROFS has _only one_ ondisk timestamp (ctime is currently documented and recorded, we might also record mtime instead with a new compat feature if needed) for each extended inode since EROFS isn't mainly for archival purposes so no need to keep all timestamps on disk especially for Android scenarios due to security concerns. Also, romfs/cramfs don't have their own on-disk timestamp, and squashfs only records mtime instead. Let's also derive access time from ondisk timestamp rather than leaving it empty, and if mtime/atime for each file are really needed for specific scenarios as well, we can also use xattrs to record them then. Link: https://lore.kernel.org/r/20201031195102.21221-1-hsiangkao@aol.com [ Gao Xiang: It'd be better to backport for user-friendly concern. ] Fixes: 431339ba9042 ("staging: erofs: add inode operations") Cc: stable # 4.19+ Reported-by: nl6720 Reviewed-by: Chao Yu Signed-off-by: Gao Xiang Signed-off-by: Greg Kroah-Hartman --- fs/erofs/inode.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index b36b414cd7a7..70fd3af7b8cb 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -107,11 +107,9 @@ static struct page *erofs_read_inode(struct inode *inode, i_gid_write(inode, le32_to_cpu(die->i_gid)); set_nlink(inode, le32_to_cpu(die->i_nlink)); - /* ns timestamp */ - inode->i_mtime.tv_sec = inode->i_ctime.tv_sec = - le64_to_cpu(die->i_ctime); - inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec = - le32_to_cpu(die->i_ctime_nsec); + /* extended inode has its own timestamp */ + inode->i_ctime.tv_sec = le64_to_cpu(die->i_ctime); + inode->i_ctime.tv_nsec = le32_to_cpu(die->i_ctime_nsec); inode->i_size = le64_to_cpu(die->i_size); @@ -149,11 +147,9 @@ static struct page *erofs_read_inode(struct inode *inode, i_gid_write(inode, le16_to_cpu(dic->i_gid)); set_nlink(inode, le16_to_cpu(dic->i_nlink)); - /* use build time to derive all file time */ - inode->i_mtime.tv_sec = inode->i_ctime.tv_sec = - sbi->build_time; - inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec = - sbi->build_time_nsec; + /* use build time for compact inodes */ + inode->i_ctime.tv_sec = sbi->build_time; + inode->i_ctime.tv_nsec = sbi->build_time_nsec; inode->i_size = le32_to_cpu(dic->i_size); if (erofs_inode_is_data_compressed(vi->datalayout)) @@ -167,6 +163,11 @@ static struct page *erofs_read_inode(struct inode *inode, goto err_out; } + inode->i_mtime.tv_sec = inode->i_ctime.tv_sec; + inode->i_atime.tv_sec = inode->i_ctime.tv_sec; + inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec; + inode->i_atime.tv_nsec = inode->i_ctime.tv_nsec; + if (!nblks) /* measure inode.i_blocks as generic filesystems */ inode->i_blocks = roundup(inode->i_size, EROFS_BLKSIZ) >> 9; From a6ca4c7ec44c675cf368d8cb32542c0dc4d491cd Mon Sep 17 00:00:00 2001 From: Kaixu Xia Date: Thu, 29 Oct 2020 23:46:36 +0800 Subject: [PATCH 106/151] ext4: correctly report "not supported" for {usr,grp}jquota when !CONFIG_QUOTA commit 174fe5ba2d1ea0d6c5ab2a7d4aa058d6d497ae4d upstream. The macro MOPT_Q is used to indicates the mount option is related to quota stuff and is defined to be MOPT_NOSUPPORT when CONFIG_QUOTA is disabled. Normally the quota options are handled explicitly, so it didn't matter that the MOPT_STRING flag was missing, even though the usrjquota and grpjquota mount options take a string argument. It's important that's present in the !CONFIG_QUOTA case, since without MOPT_STRING, the mount option matcher will match usrjquota= followed by an integer, and will otherwise skip the table entry, and so "mount option not supported" error message is never reported. [ Fixed up the commit description to better explain why the fix works. --TYT ] Fixes: 26092bf52478 ("ext4: use a table-driven handler for mount options") Signed-off-by: Kaixu Xia Link: https://lore.kernel.org/r/1603986396-28917-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 6a260cc8bce6..920658ca8777 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1756,8 +1756,8 @@ static const struct mount_opts { {Opt_noquota, (EXT4_MOUNT_QUOTA | EXT4_MOUNT_USRQUOTA | EXT4_MOUNT_GRPQUOTA | EXT4_MOUNT_PRJQUOTA), MOPT_CLEAR | MOPT_Q}, - {Opt_usrjquota, 0, MOPT_Q}, - {Opt_grpjquota, 0, MOPT_Q}, + {Opt_usrjquota, 0, MOPT_Q | MOPT_STRING}, + {Opt_grpjquota, 0, MOPT_Q | MOPT_STRING}, {Opt_offusrjquota, 0, MOPT_Q}, {Opt_offgrpjquota, 0, MOPT_Q}, {Opt_jqfmt_vfsold, QFMT_VFS_OLD, MOPT_QFMT}, From 062c9b04f6ebc27ffb84684792ae3b029b8d0b9a Mon Sep 17 00:00:00 2001 From: Joseph Qi Date: Tue, 3 Nov 2020 10:29:02 +0800 Subject: [PATCH 107/151] ext4: unlock xattr_sem properly in ext4_inline_data_truncate() commit 7067b2619017d51e71686ca9756b454de0e5826a upstream. It takes xattr_sem to check inline data again but without unlock it in case not have. So unlock it before return. Fixes: aef1c8513c1f ("ext4: let ext4_truncate handle inline data correctly") Reported-by: Dan Carpenter Cc: Tao Ma Signed-off-by: Joseph Qi Reviewed-by: Andreas Dilger Link: https://lore.kernel.org/r/1604370542-124630-1-git-send-email-joseph.qi@linux.alibaba.com Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/inline.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index 2fec62d764fa..519378a15bc6 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -1918,6 +1918,7 @@ int ext4_inline_data_truncate(struct inode *inode, int *has_inline) ext4_write_lock_xattr(inode, &no_expand); if (!ext4_has_inline_data(inode)) { + ext4_write_unlock_xattr(inode, &no_expand); *has_inline = 0; ext4_journal_stop(handle); return 0; From 8266c23124c14e1f6f6fceb320a4c07edbda7132 Mon Sep 17 00:00:00 2001 From: Dinghao Liu Date: Wed, 21 Oct 2020 13:36:55 +0800 Subject: [PATCH 108/151] btrfs: ref-verify: fix memory leak in btrfs_ref_tree_mod commit 468600c6ec28613b756193c5f780aac062f1acdf upstream. There is one error handling path that does not free ref, which may cause a minor memory leak. CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Josef Bacik Signed-off-by: Dinghao Liu Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/ref-verify.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/ref-verify.c b/fs/btrfs/ref-verify.c index 9a2f15f4c80e..bbd63535965c 100644 --- a/fs/btrfs/ref-verify.c +++ b/fs/btrfs/ref-verify.c @@ -851,6 +851,7 @@ int btrfs_ref_tree_mod(struct btrfs_fs_info *fs_info, "dropping a ref for a root that doesn't have a ref on the block"); dump_block_entry(fs_info, be); dump_ref_action(fs_info, ra); + kfree(ref); kfree(ra); goto out_unlock; } From 5af9630036efd8fdfa62e0e47e4a1d7040843044 Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Mon, 26 Oct 2020 16:57:27 -0400 Subject: [PATCH 109/151] btrfs: fix min reserved size calculation in merge_reloc_root commit fca3a45d08782a2bb85e048fb8e3128b1388d7b7 upstream. The minimum reserve size was adjusted to take into account the height of the tree we are merging, however we can have a root with a level == 0. What we want is root_level + 1 to get the number of nodes we may have to cow. This fixes the enospc_debug warning pops with btrfs/101. Nikolay: this fixes failures on btrfs/060 btrfs/062 btrfs/063 and btrfs/195 That I was seeing, the call trace was: [ 3680.515564] ------------[ cut here ]------------ [ 3680.515566] BTRFS: block rsv returned -28 [ 3680.515585] WARNING: CPU: 2 PID: 8339 at fs/btrfs/block-rsv.c:521 btrfs_use_block_rsv+0x162/0x180 [ 3680.515587] Modules linked in: [ 3680.515591] CPU: 2 PID: 8339 Comm: btrfs Tainted: G W 5.9.0-rc8-default #95 [ 3680.515593] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 [ 3680.515595] RIP: 0010:btrfs_use_block_rsv+0x162/0x180 [ 3680.515600] RSP: 0018:ffffa01ac9753910 EFLAGS: 00010282 [ 3680.515602] RAX: 0000000000000000 RBX: ffff984b34200000 RCX: 0000000000000027 [ 3680.515604] RDX: 0000000000000027 RSI: 0000000000000000 RDI: ffff984b3bd19e28 [ 3680.515606] RBP: 0000000000004000 R08: ffff984b3bd19e20 R09: 0000000000000001 [ 3680.515608] R10: 0000000000000004 R11: 0000000000000046 R12: ffff984b264fdc00 [ 3680.515609] R13: ffff984b13149000 R14: 00000000ffffffe4 R15: ffff984b34200000 [ 3680.515613] FS: 00007f4e2912b8c0(0000) GS:ffff984b3bd00000(0000) knlGS:0000000000000000 [ 3680.515615] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3680.515617] CR2: 00007fab87122150 CR3: 0000000118e42000 CR4: 00000000000006e0 [ 3680.515620] Call Trace: [ 3680.515627] btrfs_alloc_tree_block+0x8b/0x340 [ 3680.515633] ? __lock_acquire+0x51a/0xac0 [ 3680.515646] alloc_tree_block_no_bg_flush+0x4f/0x60 [ 3680.515651] __btrfs_cow_block+0x14e/0x7e0 [ 3680.515662] btrfs_cow_block+0x144/0x2c0 [ 3680.515670] merge_reloc_root+0x4d4/0x610 [ 3680.515675] ? btrfs_lookup_fs_root+0x78/0x90 [ 3680.515686] merge_reloc_roots+0xee/0x280 [ 3680.515695] relocate_block_group+0x2ce/0x5e0 [ 3680.515704] btrfs_relocate_block_group+0x16e/0x310 [ 3680.515711] btrfs_relocate_chunk+0x38/0xf0 [ 3680.515716] btrfs_shrink_device+0x200/0x560 [ 3680.515728] btrfs_rm_device+0x1ae/0x6a6 [ 3680.515744] ? _copy_from_user+0x6e/0xb0 [ 3680.515750] btrfs_ioctl+0x1afe/0x28c0 [ 3680.515755] ? find_held_lock+0x2b/0x80 [ 3680.515760] ? do_user_addr_fault+0x1f8/0x418 [ 3680.515773] ? __x64_sys_ioctl+0x77/0xb0 [ 3680.515775] __x64_sys_ioctl+0x77/0xb0 [ 3680.515781] do_syscall_64+0x31/0x70 [ 3680.515785] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: Nikolay Borisov Fixes: 44d354abf33e ("btrfs: relocation: review the call sites which can be interrupted by signal") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Nikolay Borisov Tested-by: Nikolay Borisov Signed-off-by: Josef Bacik Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/relocation.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 1bc57f7b91cf..001f13cf9ab8 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2287,6 +2287,7 @@ static noinline_for_stack int merge_reloc_root(struct reloc_control *rc, struct btrfs_root_item *root_item; struct btrfs_path *path; struct extent_buffer *leaf; + int reserve_level; int level; int max_level; int replaced = 0; @@ -2335,7 +2336,8 @@ static noinline_for_stack int merge_reloc_root(struct reloc_control *rc, * Thus the needed metadata size is at most root_level * nodesize, * and * 2 since we have two trees to COW. */ - min_reserved = fs_info->nodesize * btrfs_root_level(root_item) * 2; + reserve_level = max_t(int, 1, btrfs_root_level(root_item)); + min_reserved = fs_info->nodesize * reserve_level * 2; memset(&next_key, 0, sizeof(next_key)); while (1) { From 2033dd88529792cac57d1d64182fddcefcae1e0f Mon Sep 17 00:00:00 2001 From: Anand Jain Date: Fri, 30 Oct 2020 06:53:56 +0800 Subject: [PATCH 110/151] btrfs: dev-replace: fail mount if we don't have replace item with target device commit cf89af146b7e62af55470cf5f3ec3c56ec144a5e upstream. If there is a device BTRFS_DEV_REPLACE_DEVID without the device replace item, then it means the filesystem is inconsistent state. This is either corruption or a crafted image. Fail the mount as this needs a closer look what is actually wrong. As of now if BTRFS_DEV_REPLACE_DEVID is present without the replace item, in __btrfs_free_extra_devids() we determine that there is an extra device, and free those extra devices but continue to mount the device. However, we were wrong in keeping tack of the rw_devices so the syzbot testcase failed: WARNING: CPU: 1 PID: 3612 at fs/btrfs/volumes.c:1166 close_fs_devices.part.0+0x607/0x800 fs/btrfs/volumes.c:1166 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 3612 Comm: syz-executor.2 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 panic+0x347/0x7c0 kernel/panic.c:231 __warn.cold+0x20/0x46 kernel/panic.c:600 report_bug+0x1bd/0x210 lib/bug.c:198 handle_bug+0x38/0x90 arch/x86/kernel/traps.c:234 exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:254 asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:536 RIP: 0010:close_fs_devices.part.0+0x607/0x800 fs/btrfs/volumes.c:1166 RSP: 0018:ffffc900091777e0 EFLAGS: 00010246 RAX: 0000000000040000 RBX: ffffffffffffffff RCX: ffffc9000c8b7000 RDX: 0000000000040000 RSI: ffffffff83097f47 RDI: 0000000000000007 RBP: dffffc0000000000 R08: 0000000000000001 R09: ffff8880988a187f R10: 0000000000000000 R11: 0000000000000001 R12: ffff88809593a130 R13: ffff88809593a1ec R14: ffff8880988a1908 R15: ffff88809593a050 close_fs_devices fs/btrfs/volumes.c:1193 [inline] btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179 open_ctree+0x4984/0x4a2d fs/btrfs/disk-io.c:3434 btrfs_fill_super fs/btrfs/super.c:1316 [inline] btrfs_mount_root.cold+0x14/0x165 fs/btrfs/super.c:1672 The fix here is, when we determine that there isn't a replace item then fail the mount if there is a replace target device (devid 0). CC: stable@vger.kernel.org # 4.19+ Reported-by: syzbot+4cfe71a4da060be47502@syzkaller.appspotmail.com Signed-off-by: Anand Jain Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/dev-replace.c | 26 ++++++++++++++++++++++++-- fs/btrfs/volumes.c | 26 +++++++------------------- 2 files changed, 31 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 96843934dcbb..1cb7f5d79765 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -55,6 +55,17 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info) ret = btrfs_search_slot(NULL, dev_root, &key, path, 0, 0); if (ret) { no_valid_dev_replace_entry_found: + /* + * We don't have a replace item or it's corrupted. If there is + * a replace target, fail the mount. + */ + if (btrfs_find_device(fs_info->fs_devices, + BTRFS_DEV_REPLACE_DEVID, NULL, NULL, false)) { + btrfs_err(fs_info, + "found replace target device without a valid replace item"); + ret = -EUCLEAN; + goto out; + } ret = 0; dev_replace->replace_state = BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED; @@ -107,8 +118,19 @@ no_valid_dev_replace_entry_found: case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED: case BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED: case BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED: - dev_replace->srcdev = NULL; - dev_replace->tgtdev = NULL; + /* + * We don't have an active replace item but if there is a + * replace target, fail the mount. + */ + if (btrfs_find_device(fs_info->fs_devices, + BTRFS_DEV_REPLACE_DEVID, NULL, NULL, false)) { + btrfs_err(fs_info, + "replace devid present without an active replace item"); + ret = -EUCLEAN; + } else { + dev_replace->srcdev = NULL; + dev_replace->tgtdev = NULL; + } break; case BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED: case BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED: diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 00e7816dc9a1..808c5985904e 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1245,22 +1245,13 @@ again: continue; } - if (device->devid == BTRFS_DEV_REPLACE_DEVID) { - /* - * In the first step, keep the device which has - * the correct fsid and the devid that is used - * for the dev_replace procedure. - * In the second step, the dev_replace state is - * read from the device tree and it is known - * whether the procedure is really active or - * not, which means whether this device is - * used or whether it should be removed. - */ - if (step == 0 || test_bit(BTRFS_DEV_STATE_REPLACE_TGT, - &device->dev_state)) { - continue; - } - } + /* + * We have already validated the presence of BTRFS_DEV_REPLACE_DEVID, + * in btrfs_init_dev_replace() so just continue. + */ + if (device->devid == BTRFS_DEV_REPLACE_DEVID) + continue; + if (device->bdev) { blkdev_put(device->bdev, device->mode); device->bdev = NULL; @@ -1269,9 +1260,6 @@ again: if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) { list_del_init(&device->dev_alloc_list); clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state); - if (!test_bit(BTRFS_DEV_STATE_REPLACE_TGT, - &device->dev_state)) - fs_devices->rw_devices--; } list_del_init(&device->dev_list); fs_devices->num_devices--; From 11c14da8d005ad7d5a52fc1470b2a03febe1e4fd Mon Sep 17 00:00:00 2001 From: Andrew Jones Date: Thu, 5 Nov 2020 10:10:19 +0100 Subject: [PATCH 111/151] KVM: arm64: Don't hide ID registers from userspace MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit f81cb2c3ad41ac6d8cb2650e3d72d5f67db1aa28 upstream. ID registers are RAZ until they've been allocated a purpose, but that doesn't mean they should be removed from the KVM_GET_REG_LIST list. So far we only have one register, SYS_ID_AA64ZFR0_EL1, that is hidden from userspace when its function, SVE, is not present. Expose SYS_ID_AA64ZFR0_EL1 to userspace as RAZ when SVE is not implemented. Removing the userspace visibility checks is enough to reexpose it, as it will already return zero to userspace when SVE is not present. The register already behaves as RAZ for the guest when SVE is not present. Fixes: 73433762fcae ("KVM: arm64/sve: System register context switch and access support") Reported-by: 张东旭 Signed-off-by: Andrew Jones Signed-off-by: Marc Zyngier Cc: stable@vger.kernel.org#v5.2+ Link: https://lore.kernel.org/r/20201105091022.15373-2-drjones@redhat.com Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kvm/sys_regs.c | 18 +----------------- 1 file changed, 1 insertion(+), 17 deletions(-) diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 0ed7598dfa6a..f1f4f42e8ef4 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1132,16 +1132,6 @@ static unsigned int sve_visibility(const struct kvm_vcpu *vcpu, return REG_HIDDEN_USER | REG_HIDDEN_GUEST; } -/* Visibility overrides for SVE-specific ID registers */ -static unsigned int sve_id_visibility(const struct kvm_vcpu *vcpu, - const struct sys_reg_desc *rd) -{ - if (vcpu_has_sve(vcpu)) - return 0; - - return REG_HIDDEN_USER; -} - /* Generate the emulated ID_AA64ZFR0_EL1 value exposed to the guest */ static u64 guest_id_aa64zfr0_el1(const struct kvm_vcpu *vcpu) { @@ -1168,9 +1158,6 @@ static int get_id_aa64zfr0_el1(struct kvm_vcpu *vcpu, { u64 val; - if (WARN_ON(!vcpu_has_sve(vcpu))) - return -ENOENT; - val = guest_id_aa64zfr0_el1(vcpu); return reg_to_user(uaddr, &val, reg->id); } @@ -1183,9 +1170,6 @@ static int set_id_aa64zfr0_el1(struct kvm_vcpu *vcpu, int err; u64 val; - if (WARN_ON(!vcpu_has_sve(vcpu))) - return -ENOENT; - err = reg_from_user(&val, uaddr, id); if (err) return err; @@ -1448,7 +1432,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { ID_SANITISED(ID_AA64PFR1_EL1), ID_UNALLOCATED(4,2), ID_UNALLOCATED(4,3), - { SYS_DESC(SYS_ID_AA64ZFR0_EL1), access_id_aa64zfr0_el1, .get_user = get_id_aa64zfr0_el1, .set_user = set_id_aa64zfr0_el1, .visibility = sve_id_visibility }, + { SYS_DESC(SYS_ID_AA64ZFR0_EL1), access_id_aa64zfr0_el1, .get_user = get_id_aa64zfr0_el1, .set_user = set_id_aa64zfr0_el1, }, ID_UNALLOCATED(4,5), ID_UNALLOCATED(4,6), ID_UNALLOCATED(4,7), From 06c1895fe71b14354deb79366cb0dd6fa18a88a5 Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Wed, 7 Oct 2020 17:06:17 +0300 Subject: [PATCH 112/151] thunderbolt: Fix memory leak if ida_simple_get() fails in enumerate_services() commit a663e0df4a374b8537562a44d1cecafb472cd65b upstream. The svc->key field is not released as it should be if ida_simple_get() fails so fix that. Fixes: 9aabb68568b4 ("thunderbolt: Fix to check return value of ida_simple_get") Cc: stable@vger.kernel.org Signed-off-by: Mika Westerberg Signed-off-by: Greg Kroah-Hartman --- drivers/thunderbolt/xdomain.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/thunderbolt/xdomain.c b/drivers/thunderbolt/xdomain.c index 4e17a7c7bf0a..9e9bf8771345 100644 --- a/drivers/thunderbolt/xdomain.c +++ b/drivers/thunderbolt/xdomain.c @@ -830,6 +830,7 @@ static void enumerate_services(struct tb_xdomain *xd) id = ida_simple_get(&xd->service_ids, 0, 0, GFP_KERNEL); if (id < 0) { + kfree(svc->key); kfree(svc); break; } From 1654bf2d9f0e898c43848d2e7e8c9fad47a4614e Mon Sep 17 00:00:00 2001 From: Jing Xiangfeng Date: Thu, 15 Oct 2020 16:40:53 +0800 Subject: [PATCH 113/151] thunderbolt: Add the missed ida_simple_remove() in ring_request_msix() commit 7342ca34d931a357d408aaa25fadd031e46af137 upstream. ring_request_msix() misses to call ida_simple_remove() in an error path. Add a label 'err_ida_remove' and jump to it. Fixes: 046bee1f9ab8 ("thunderbolt: Add MSI-X support") Cc: stable@vger.kernel.org Signed-off-by: Jing Xiangfeng Reviewed-by: Andy Shevchenko Signed-off-by: Mika Westerberg Signed-off-by: Greg Kroah-Hartman --- drivers/thunderbolt/nhi.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c index 641b21b54460..73a698eec743 100644 --- a/drivers/thunderbolt/nhi.c +++ b/drivers/thunderbolt/nhi.c @@ -410,12 +410,23 @@ static int ring_request_msix(struct tb_ring *ring, bool no_suspend) ring->vector = ret; - ring->irq = pci_irq_vector(ring->nhi->pdev, ring->vector); - if (ring->irq < 0) - return ring->irq; + ret = pci_irq_vector(ring->nhi->pdev, ring->vector); + if (ret < 0) + goto err_ida_remove; + + ring->irq = ret; irqflags = no_suspend ? IRQF_NO_SUSPEND : 0; - return request_irq(ring->irq, ring_msix, irqflags, "thunderbolt", ring); + ret = request_irq(ring->irq, ring_msix, irqflags, "thunderbolt", ring); + if (ret) + goto err_ida_remove; + + return 0; + +err_ida_remove: + ida_simple_remove(&nhi->msix_ida, ring->vector); + + return ret; } static void ring_release_msix(struct tb_ring *ring) From f988e9c85cfb80a1887c6d3bdf2d9057dd28c685 Mon Sep 17 00:00:00 2001 From: Shin'ichiro Kawasaki Date: Mon, 2 Nov 2020 21:28:19 +0900 Subject: [PATCH 114/151] uio: Fix use-after-free in uio_unregister_device() commit 092561f06702dd4fdd7fb74dd3a838f1818529b7 upstream. Commit 8fd0e2a6df26 ("uio: free uio id after uio file node is freed") triggered KASAN use-after-free failure at deletion of TCM-user backstores [1]. In uio_unregister_device(), struct uio_device *idev is passed to uio_free_minor() to refer idev->minor. However, before uio_free_minor() call, idev is already freed by uio_device_release() during call to device_unregister(). To avoid reference to idev->minor after idev free, keep idev->minor value in a local variable. Also modify uio_free_minor() argument to receive the value. [1] BUG: KASAN: use-after-free in uio_unregister_device+0x166/0x190 Read of size 4 at addr ffff888105196508 by task targetcli/49158 CPU: 3 PID: 49158 Comm: targetcli Not tainted 5.10.0-rc1 #1 Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015 Call Trace: dump_stack+0xae/0xe5 ? uio_unregister_device+0x166/0x190 print_address_description.constprop.0+0x1c/0x210 ? uio_unregister_device+0x166/0x190 ? uio_unregister_device+0x166/0x190 kasan_report.cold+0x37/0x7c ? kobject_put+0x80/0x410 ? uio_unregister_device+0x166/0x190 uio_unregister_device+0x166/0x190 tcmu_destroy_device+0x1c4/0x280 [target_core_user] ? tcmu_release+0x90/0x90 [target_core_user] ? __mutex_unlock_slowpath+0xd6/0x5d0 target_free_device+0xf3/0x2e0 [target_core_mod] config_item_cleanup+0xea/0x210 configfs_rmdir+0x651/0x860 ? detach_groups.isra.0+0x380/0x380 vfs_rmdir.part.0+0xec/0x3a0 ? __lookup_hash+0x20/0x150 do_rmdir+0x252/0x320 ? do_file_open_root+0x420/0x420 ? strncpy_from_user+0xbc/0x2f0 ? getname_flags.part.0+0x8e/0x450 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f9e2bfc91fb Code: 73 01 c3 48 8b 0d 9d ec 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 54 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6d ec 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffdd2baafe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000054 RAX: ffffffffffffffda RBX: 00007f9e2beb44a0 RCX: 00007f9e2bfc91fb RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f9e1c20be90 RBP: 00007ffdd2bab000 R08: 0000000000000000 R09: 00007f9e2bdf2440 R10: 00007ffdd2baaf37 R11: 0000000000000246 R12: 00000000ffffff9c R13: 000055f9abb7e390 R14: 000055f9abcf9558 R15: 00007f9e2be7a780 Allocated by task 34735: kasan_save_stack+0x1b/0x40 __kasan_kmalloc.constprop.0+0xc2/0xd0 __uio_register_device+0xeb/0xd40 tcmu_configure_device+0x5a0/0xbc0 [target_core_user] target_configure_device+0x12f/0x760 [target_core_mod] target_dev_enable_store+0x32/0x50 [target_core_mod] configfs_write_file+0x2bb/0x450 vfs_write+0x1ce/0x610 ksys_write+0xe9/0x1b0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 49158: kasan_save_stack+0x1b/0x40 kasan_set_track+0x1c/0x30 kasan_set_free_info+0x1b/0x30 __kasan_slab_free+0x110/0x150 slab_free_freelist_hook+0x5a/0x170 kfree+0xc6/0x560 device_release+0x9b/0x210 kobject_put+0x13e/0x410 uio_unregister_device+0xf9/0x190 tcmu_destroy_device+0x1c4/0x280 [target_core_user] target_free_device+0xf3/0x2e0 [target_core_mod] config_item_cleanup+0xea/0x210 configfs_rmdir+0x651/0x860 vfs_rmdir.part.0+0xec/0x3a0 do_rmdir+0x252/0x320 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff888105196000 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 1288 bytes inside of 2048-byte region [ffff888105196000, ffff888105196800) The buggy address belongs to the page: page:0000000098e6ca81 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x105190 head:0000000098e6ca81 order:3 compound_mapcount:0 compound_pincount:0 flags: 0x17ffffc0010200(slab|head) raw: 0017ffffc0010200 dead000000000100 dead000000000122 ffff888100043040 raw: 0000000000000000 0000000000080008 00000001ffffffff ffff88810eb55c01 page dumped because: kasan: bad access detected page->mem_cgroup:ffff88810eb55c01 Memory state around the buggy address: ffff888105196400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888105196480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888105196500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888105196580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888105196600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 8fd0e2a6df26 ("uio: free uio id after uio file node is freed") Cc: stable Signed-off-by: Shin'ichiro Kawasaki Link: https://lore.kernel.org/r/20201102122819.2346270-1-shinichiro.kawasaki@wdc.com Signed-off-by: Greg Kroah-Hartman --- drivers/uio/uio.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 8313f81968d5..d91e051d1367 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -413,10 +413,10 @@ static int uio_get_minor(struct uio_device *idev) return retval; } -static void uio_free_minor(struct uio_device *idev) +static void uio_free_minor(unsigned long minor) { mutex_lock(&minor_lock); - idr_remove(&uio_idr, idev->minor); + idr_remove(&uio_idr, minor); mutex_unlock(&minor_lock); } @@ -990,7 +990,7 @@ err_request_irq: err_uio_dev_add_attributes: device_del(&idev->dev); err_device_create: - uio_free_minor(idev); + uio_free_minor(idev->minor); put_device(&idev->dev); return ret; } @@ -1004,11 +1004,13 @@ EXPORT_SYMBOL_GPL(__uio_register_device); void uio_unregister_device(struct uio_info *info) { struct uio_device *idev; + unsigned long minor; if (!info || !info->uio_dev) return; idev = info->uio_dev; + minor = idev->minor; mutex_lock(&idev->info_lock); uio_dev_del_attributes(idev); @@ -1024,7 +1026,7 @@ void uio_unregister_device(struct uio_info *info) device_unregister(&idev->dev); - uio_free_minor(idev); + uio_free_minor(minor); return; } From cbad9668929cdd15d6b005f058b8d9df1bdb7f9f Mon Sep 17 00:00:00 2001 From: Chris Brandt Date: Wed, 11 Nov 2020 08:12:09 -0500 Subject: [PATCH 115/151] usb: cdc-acm: Add DISABLE_ECHO for Renesas USB Download mode commit 6d853c9e4104b4fc8d55dc9cd3b99712aa347174 upstream. Renesas R-Car and RZ/G SoCs have a firmware download mode over USB. However, on reset a banner string is transmitted out which is not expected to be echoed back and will corrupt the protocol. Cc: stable Acked-by: Oliver Neukum Signed-off-by: Chris Brandt Link: https://lore.kernel.org/r/20201111131209.3977903-1-chris.brandt@renesas.com Signed-off-by: Greg Kroah-Hartman --- drivers/usb/class/cdc-acm.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c index ed99d98172f4..16c98e718001 100644 --- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -1706,6 +1706,15 @@ static const struct usb_device_id acm_ids[] = { { USB_DEVICE(0x0870, 0x0001), /* Metricom GS Modem */ .driver_info = NO_UNION_NORMAL, /* has no union descriptor */ }, + { USB_DEVICE(0x045b, 0x023c), /* Renesas USB Download mode */ + .driver_info = DISABLE_ECHO, /* Don't echo banner */ + }, + { USB_DEVICE(0x045b, 0x0248), /* Renesas USB Download mode */ + .driver_info = DISABLE_ECHO, /* Don't echo banner */ + }, + { USB_DEVICE(0x045b, 0x024D), /* Renesas USB Download mode */ + .driver_info = DISABLE_ECHO, /* Don't echo banner */ + }, { USB_DEVICE(0x0e8d, 0x0003), /* FIREFLY, MediaTek Inc; andrey.arapov@gmail.com */ .driver_info = NO_UNION_NORMAL, /* has no union descriptor */ }, From 57626d77ef1e788e7f78e5da4a755ecc2d0d3a09 Mon Sep 17 00:00:00 2001 From: Zhang Qilong Date: Fri, 6 Nov 2020 20:22:21 +0800 Subject: [PATCH 116/151] xhci: hisilicon: fix refercence leak in xhci_histb_probe commit 76255470ffa2795a44032e8b3c1ced11d81aa2db upstream. pm_runtime_get_sync() will increment pm usage at first and it will resume the device later. We should decrease the usage count whetever it succeeded or failed(maybe runtime of the device has error, or device is in inaccessible state, or other error state). If we do not call put operation to decrease the reference, it will result in reference leak in xhci_histb_probe. Moreover, this device cannot enter the idle state and always stay busy or other non-idle state later. So we fixed it by jumping to error handling branch. Fixes: c508f41da0788 ("xhci: hisilicon: support HiSilicon STB xHCI host controller") Signed-off-by: Zhang Qilong Link: https://lore.kernel.org/r/20201106122221.2304528-1-zhangqilong3@huawei.com Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-histb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/host/xhci-histb.c b/drivers/usb/host/xhci-histb.c index 3c4abb5a1c3f..73aba464b66a 100644 --- a/drivers/usb/host/xhci-histb.c +++ b/drivers/usb/host/xhci-histb.c @@ -241,7 +241,7 @@ static int xhci_histb_probe(struct platform_device *pdev) /* Initialize dma_mask and coherent_dma_mask to 32-bits */ ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32)); if (ret) - return ret; + goto disable_pm; hcd = usb_create_hcd(driver, dev, dev_name(dev)); if (!hcd) { From e2b2c390ec9eccc3f5b88f3178cc9df1d44fff13 Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Wed, 4 Nov 2020 15:31:36 +0000 Subject: [PATCH 117/151] virtio: virtio_console: fix DMA memory allocation for rproc serial commit 9d516aa82b7d4fbe7f6303348697960ba03a530b upstream. Since commit 086d08725d34 ("remoteproc: create vdev subdevice with specific dma memory pool"), every remoteproc has a DMA subdevice ("remoteprocX#vdevYbuffer") for each virtio device, which inherits DMA capabilities from the corresponding platform device. This allowed to associate different DMA pools with each vdev, and required from virtio drivers to perform DMA operations with the parent device (vdev->dev.parent) instead of grandparent (vdev->dev.parent->parent). virtio_rpmsg_bus was already changed in the same merge cycle with commit d999b622fcfb ("rpmsg: virtio: allocate buffer from parent"), but virtio_console did not. In fact, operations using the grandparent worked fine while the grandparent was the platform device, but since commit c774ad010873 ("remoteproc: Fix and restore the parenting hierarchy for vdev") this was changed, and now the grandparent device is the remoteproc device without any DMA capabilities. So, starting v5.8-rc1 the following warning is observed: [ 2.483925] ------------[ cut here ]------------ [ 2.489148] WARNING: CPU: 3 PID: 101 at kernel/dma/mapping.c:427 0x80e7eee8 [ 2.489152] Modules linked in: virtio_console(+) [ 2.503737] virtio_rpmsg_bus rpmsg_core [ 2.508903] [ 2.528898] [ 2.913043] [ 2.914907] ---[ end trace 93ac8746beab612c ]--- [ 2.920102] virtio-ports vport1p0: Error allocating inbufs kernel/dma/mapping.c:427 is: WARN_ON_ONCE(!dev->coherent_dma_mask); obviously because the grandparent now is remoteproc dev without any DMA caps: [ 3.104943] Parent: remoteproc0#vdev1buffer, grandparent: remoteproc0 Fix this the same way as it was for virtio_rpmsg_bus, using just the parent device (vdev->dev.parent, "remoteprocX#vdevYbuffer") for DMA operations. This also allows now to reserve DMA pools/buffers for rproc serial via Device Tree. Fixes: c774ad010873 ("remoteproc: Fix and restore the parenting hierarchy for vdev") Cc: stable@vger.kernel.org # 5.1+ Reviewed-by: Mathieu Poirier Acked-by: Jason Wang Signed-off-by: Alexander Lobakin Date: Thu, 5 Nov 2020 11:10:24 +0800 Link: https://lore.kernel.org/r/AOKowLclCbOCKxyiJ71WeNyuAAj2q8EUtxrXbyky5E@cp7-web-042.plabs.ch Signed-off-by: Greg Kroah-Hartman --- drivers/char/virtio_console.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c index 9ebce2c12c43..5eabbf73fdef 100644 --- a/drivers/char/virtio_console.c +++ b/drivers/char/virtio_console.c @@ -435,12 +435,12 @@ static struct port_buffer *alloc_buf(struct virtio_device *vdev, size_t buf_size /* * Allocate DMA memory from ancestor. When a virtio * device is created by remoteproc, the DMA memory is - * associated with the grandparent device: - * vdev => rproc => platform-dev. + * associated with the parent device: + * virtioY => remoteprocX#vdevYbuffer. */ - if (!vdev->dev.parent || !vdev->dev.parent->parent) + buf->dev = vdev->dev.parent; + if (!buf->dev) goto free_buf; - buf->dev = vdev->dev.parent->parent; /* Increase device refcnt to avoid freeing it */ get_device(buf->dev); From 761fb6829238d98117601b3fab40f777fea2455d Mon Sep 17 00:00:00 2001 From: Alexander Usyskin Date: Thu, 29 Oct 2020 11:54:42 +0200 Subject: [PATCH 118/151] mei: protect mei_cl_mtu from null dereference commit bcbc0b2e275f0a797de11a10eff495b4571863fc upstream. A receive callback is queued while the client is still connected but can still be called after the client was disconnected. Upon disconnect cl->me_cl is set to NULL, hence we need to check that ME client is not-NULL in mei_cl_mtu to avoid null dereference. Cc: Signed-off-by: Alexander Usyskin Signed-off-by: Tomas Winkler Link: https://lore.kernel.org/r/20201029095444.957924-2-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/misc/mei/client.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/misc/mei/client.h b/drivers/misc/mei/client.h index c1f9e810cf81..030d0e7b148b 100644 --- a/drivers/misc/mei/client.h +++ b/drivers/misc/mei/client.h @@ -128,11 +128,11 @@ static inline u8 mei_cl_me_id(const struct mei_cl *cl) * * @cl: host client * - * Return: mtu + * Return: mtu or 0 if client is not connected */ static inline size_t mei_cl_mtu(const struct mei_cl *cl) { - return cl->me_cl->props.max_msg_length; + return cl->me_cl ? cl->me_cl->props.max_msg_length : 0; } /** From 2192d905df0d540f6f3240046bcb06c53bcf5016 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Fri, 6 Nov 2020 11:52:05 +0300 Subject: [PATCH 119/151] futex: Don't enable IRQs unconditionally in put_pi_state() commit 1e106aa3509b86738769775969822ffc1ec21bf4 upstream. The exit_pi_state_list() function calls put_pi_state() with IRQs disabled and is not expecting that IRQs will be enabled inside the function. Use the _irqsave() variant so that IRQs are restored to the original state instead of being enabled unconditionally. Fixes: 153fbd1226fb ("futex: Fix more put_pi_state() vs. exit_pi_state_list() races") Signed-off-by: Dan Carpenter Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201106085205.GA1159983@mwanda Signed-off-by: Greg Kroah-Hartman --- kernel/futex.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 9c4f9b868a49..b6dec5f79370 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -880,8 +880,9 @@ static void put_pi_state(struct futex_pi_state *pi_state) */ if (pi_state->owner) { struct task_struct *owner; + unsigned long flags; - raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); + raw_spin_lock_irqsave(&pi_state->pi_mutex.wait_lock, flags); owner = pi_state->owner; if (owner) { raw_spin_lock(&owner->pi_lock); @@ -889,7 +890,7 @@ static void put_pi_state(struct futex_pi_state *pi_state) raw_spin_unlock(&owner->pi_lock); } rt_mutex_proxy_unlock(&pi_state->pi_mutex, owner); - raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); + raw_spin_unlock_irqrestore(&pi_state->pi_mutex.wait_lock, flags); } if (current->pi_state_cache) { From 84778a43ae59ba21d13e5dc83484ea841b6b6339 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Sat, 7 Nov 2020 00:00:49 -0500 Subject: [PATCH 120/151] jbd2: fix up sparse warnings in checkpoint code commit 05d5233df85e9621597c5838e95235107eb624a2 upstream. Add missing __acquires() and __releases() annotations. Also, in an "this should never happen" WARN_ON check, if it *does* actually happen, we need to release j_state_lock since this function is always supposed to release that lock. Otherwise, things will quickly grind to a halt after the WARN_ON trips. Fixes: 96f1e0974575 ("jbd2: avoid long hold times of j_state_lock...") Cc: stable@kernel.org Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman --- fs/jbd2/checkpoint.c | 2 ++ fs/jbd2/transaction.c | 4 +++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c index 62cf497f18eb..5ef99b9ec8be 100644 --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -106,6 +106,8 @@ static int __try_to_free_cp_buf(struct journal_head *jh) * for a checkpoint to free up some space in the log. */ void __jbd2_log_wait_for_space(journal_t *journal) +__acquires(&journal->j_state_lock) +__releases(&journal->j_state_lock) { int nblocks, space_left; /* assert_spin_locked(&journal->j_state_lock); */ diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 90453309345d..be05fb96757c 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -171,8 +171,10 @@ static void wait_transaction_switching(journal_t *journal) DEFINE_WAIT(wait); if (WARN_ON(!journal->j_running_transaction || - journal->j_running_transaction->t_state != T_SWITCH)) + journal->j_running_transaction->t_state != T_SWITCH)) { + read_unlock(&journal->j_state_lock); return; + } prepare_to_wait(&journal->j_wait_transaction_locked, &wait, TASK_UNINTERRUPTIBLE); read_unlock(&journal->j_state_lock); From bd4d106f312209f3b4bb159f2e0efa86f2072688 Mon Sep 17 00:00:00 2001 From: Laurent Dufour Date: Fri, 13 Nov 2020 22:51:53 -0800 Subject: [PATCH 121/151] mm/slub: fix panic in slab_alloc_node() commit 22e4663e916321b72972c69ca0c6b962f529bd78 upstream. While doing memory hot-unplug operation on a PowerPC VM running 1024 CPUs with 11TB of ram, I hit the following panic: BUG: Kernel NULL pointer dereference on read at 0x00000007 Faulting instruction address: 0xc000000000456048 Oops: Kernel access of bad area, sig: 11 [#2] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS= 2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp CPU: 160 PID: 1 Comm: systemd Tainted: G D 5.9.0 #1 NIP: c000000000456048 LR: c000000000455fd4 CTR: c00000000047b350 REGS: c00006028d1b77a0 TRAP: 0300 Tainted: G D (5.9.0) MSR: 8000000000009033 CR: 24004228 XER: 00000000 CFAR: c00000000000f1b0 DAR: 0000000000000007 DSISR: 40000000 IRQMASK: 0 GPR00: c000000000455fd4 c00006028d1b7a30 c000000001bec800 0000000000000000 GPR04: 0000000000000dc0 0000000000000000 00000000000374ef c00007c53df99320 GPR08: 000007c53c980000 0000000000000000 000007c53c980000 0000000000000000 GPR12: 0000000000004400 c00000001e8e4400 0000000000000000 0000000000000f6a GPR16: 0000000000000000 c000000001c25930 c000000001d62528 00000000000000c1 GPR20: c000000001d62538 c00006be469e9000 0000000fffffffe0 c0000000003c0ff8 GPR24: 0000000000000018 0000000000000000 0000000000000dc0 0000000000000000 GPR28: c00007c513755700 c000000001c236a4 c00007bc4001f800 0000000000000001 NIP [c000000000456048] __kmalloc_node+0x108/0x790 LR [c000000000455fd4] __kmalloc_node+0x94/0x790 Call Trace: kvmalloc_node+0x58/0x110 mem_cgroup_css_online+0x10c/0x270 online_css+0x48/0xd0 cgroup_apply_control_enable+0x2c4/0x470 cgroup_mkdir+0x408/0x5f0 kernfs_iop_mkdir+0x90/0x100 vfs_mkdir+0x138/0x250 do_mkdirat+0x154/0x1c0 system_call_exception+0xf8/0x200 system_call_common+0xf0/0x27c Instruction dump: e93e0000 e90d0030 39290008 7cc9402a e94d0030 e93e0000 7ce95214 7f89502a 2fbc0000 419e0018 41920230 e9270010 <89290007> 7f994800 419e0220 7ee6bb78 This pointing to the following code: mm/slub.c:2851 if (unlikely(!object || !node_match(page, node))) { c000000000456038: 00 00 bc 2f cmpdi cr7,r28,0 c00000000045603c: 18 00 9e 41 beq cr7,c000000000456054 <__kmalloc_node+0x114> node_match(): mm/slub.c:2491 if (node != NUMA_NO_NODE && page_to_nid(page) != node) c000000000456040: 30 02 92 41 beq cr4,c000000000456270 <__kmalloc_node+0x330> page_to_nid(): include/linux/mm.h:1294 c000000000456044: 10 00 27 e9 ld r9,16(r7) c000000000456048: 07 00 29 89 lbz r9,7(r9) <<<< r9 = NULL node_match(): mm/slub.c:2491 c00000000045604c: 00 48 99 7f cmpw cr7,r25,r9 c000000000456050: 20 02 9e 41 beq cr7,c000000000456270 <__kmalloc_node+0x330> The panic occurred in slab_alloc_node() when checking for the page's node: object = c->freelist; page = c->page; if (unlikely(!object || !node_match(page, node))) { object = __slab_alloc(s, gfpflags, node, addr, c); stat(s, ALLOC_SLOWPATH); The issue is that object is not NULL while page is NULL which is odd but may happen if the cache flush happened after loading object but before loading page. Thus checking for the page pointer is required too. The cache flush is done through an inter processor interrupt when a piece of memory is off-lined. That interrupt is triggered when a memory hot-unplug operation is initiated and offline_pages() is calling the slub's MEM_GOING_OFFLINE callback slab_mem_going_offline_callback() which is calling flush_cpu_slab(). If that interrupt is caught between the reading of c->freelist and the reading of c->page, this could lead to such a situation. That situation is expected and the later call to this_cpu_cmpxchg_double() will detect the change to c->freelist and redo the whole operation. In commit 6159d0f5c03e ("mm/slub.c: page is always non-NULL in node_match()") check on the page pointer has been removed assuming that page is always valid when it is called. It happens that this is not true in that particular case, so check for page before calling node_match() here. Fixes: 6159d0f5c03e ("mm/slub.c: page is always non-NULL in node_match()") Signed-off-by: Laurent Dufour Signed-off-by: Andrew Morton Acked-by: Vlastimil Babka Acked-by: Christoph Lameter Cc: Wei Yang Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Nathan Lynch Cc: Scott Cheloha Cc: Michal Hocko Cc: Link: https://lkml.kernel.org/r/20201027190406.33283-1-ldufour@linux.ibm.com Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/slub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slub.c b/mm/slub.c index d69934eac9e9..f41414571c9e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2763,7 +2763,7 @@ redo: object = c->freelist; page = c->page; - if (unlikely(!object || !node_match(page, node))) { + if (unlikely(!object || !page || !node_match(page, node))) { object = __slab_alloc(s, gfpflags, node, addr, c); stat(s, ALLOC_SLOWPATH); } else { From fa6265f8fb9e6981842b94efb715740d1b469b21 Mon Sep 17 00:00:00 2001 From: Matteo Croce Date: Fri, 13 Nov 2020 22:52:02 -0800 Subject: [PATCH 122/151] Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint" commit 8b92c4ff4423aa9900cf838d3294fcade4dbda35 upstream. Patch series "fix parsing of reboot= cmdline", v3. The parsing of the reboot= cmdline has two major errors: - a missing bound check can crash the system on reboot - parsing of the cpu number only works if specified last Fix both. This patch (of 2): This reverts commit 616feab753972b97. kstrtoint() and simple_strtoul() have a subtle difference which makes them non interchangeable: if a non digit character is found amid the parsing, the former will return an error, while the latter will just stop parsing, e.g. simple_strtoul("123xyx") = 123. The kernel cmdline reboot= argument allows to specify the CPU used for rebooting, with the syntax `s####` among the other flags, e.g. "reboot=warm,s31,force", so if this flag is not the last given, it's silently ignored as well as the subsequent ones. Fixes: 616feab75397 ("kernel/reboot.c: convert simple_strtoul to kstrtoint") Signed-off-by: Matteo Croce Signed-off-by: Andrew Morton Cc: Guenter Roeck Cc: Petr Mladek Cc: Arnd Bergmann Cc: Mike Rapoport Cc: Kees Cook Cc: Pavel Tatashin Cc: Robin Holt Cc: Fabian Frederick Cc: Greg Kroah-Hartman Cc: Link: https://lkml.kernel.org/r/20201103214025.116799-2-mcroce@linux.microsoft.com Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- kernel/reboot.c | 21 +++++++-------------- 1 file changed, 7 insertions(+), 14 deletions(-) diff --git a/kernel/reboot.c b/kernel/reboot.c index c4d472b7f1b4..4fd4b93b7a61 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -551,22 +551,15 @@ static int __init reboot_setup(char *str) break; case 's': - { - int rc; - - if (isdigit(*(str+1))) { - rc = kstrtoint(str+1, 0, &reboot_cpu); - if (rc) - return rc; - } else if (str[1] == 'm' && str[2] == 'p' && - isdigit(*(str+3))) { - rc = kstrtoint(str+3, 0, &reboot_cpu); - if (rc) - return rc; - } else + if (isdigit(*(str+1))) + reboot_cpu = simple_strtoul(str+1, NULL, 0); + else if (str[1] == 'm' && str[2] == 'p' && + isdigit(*(str+3))) + reboot_cpu = simple_strtoul(str+3, NULL, 0); + else *mode = REBOOT_SOFT; break; - } + case 'g': *mode = REBOOT_GPIO; break; From ac18b128cfd6965306666efd9084d65b8192c00d Mon Sep 17 00:00:00 2001 From: Matteo Croce Date: Fri, 13 Nov 2020 22:52:07 -0800 Subject: [PATCH 123/151] reboot: fix overflow parsing reboot cpu number commit df5b0ab3e08a156701b537809914b339b0daa526 upstream. Limit the CPU number to num_possible_cpus(), because setting it to a value lower than INT_MAX but higher than NR_CPUS produces the following error on reboot and shutdown: BUG: unable to handle page fault for address: ffffffff90ab1bb0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1c09067 P4D 1c09067 PUD 1c0a063 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 1 Comm: systemd-shutdow Not tainted 5.9.0-rc8-kvm #110 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 RIP: 0010:migrate_to_reboot_cpu+0xe/0x60 Code: ea ea 00 48 89 fa 48 c7 c7 30 57 f1 81 e9 fa ef ff ff 66 2e 0f 1f 84 00 00 00 00 00 53 8b 1d d5 ea ea 00 e8 14 33 fe ff 89 da <48> 0f a3 15 ea fc bd 00 48 89 d0 73 29 89 c2 c1 e8 06 65 48 8b 3c RSP: 0018:ffffc90000013e08 EFLAGS: 00010246 RAX: ffff88801f0a0000 RBX: 0000000077359400 RCX: 0000000000000000 RDX: 0000000077359400 RSI: 0000000000000002 RDI: ffffffff81c199e0 RBP: ffffffff81c1e3c0 R08: ffff88801f41f000 R09: ffffffff81c1e348 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 00007f32bedf8830 R14: 00000000fee1dead R15: 0000000000000000 FS: 00007f32bedf8980(0000) GS:ffff88801f480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff90ab1bb0 CR3: 000000001d057000 CR4: 00000000000006a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __do_sys_reboot.cold+0x34/0x5b do_syscall_64+0x2d/0x40 Fixes: 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic kernel") Signed-off-by: Matteo Croce Signed-off-by: Andrew Morton Cc: Arnd Bergmann Cc: Fabian Frederick Cc: Greg Kroah-Hartman Cc: Guenter Roeck Cc: Kees Cook Cc: Mike Rapoport Cc: Pavel Tatashin Cc: Petr Mladek Cc: Robin Holt Cc: Link: https://lkml.kernel.org/r/20201103214025.116799-3-mcroce@linux.microsoft.com Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- kernel/reboot.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/kernel/reboot.c b/kernel/reboot.c index 4fd4b93b7a61..ac19159d7158 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -558,6 +558,13 @@ static int __init reboot_setup(char *str) reboot_cpu = simple_strtoul(str+3, NULL, 0); else *mode = REBOOT_SOFT; + if (reboot_cpu >= num_possible_cpus()) { + pr_err("Ignoring the CPU number in reboot= option. " + "CPU %d exceeds possible cpu number %d\n", + reboot_cpu, num_possible_cpus()); + reboot_cpu = 0; + break; + } break; case 'g': From 9de4ffb701505624b9da5e03d3662f9dee6d28b7 Mon Sep 17 00:00:00 2001 From: Wengang Wang Date: Fri, 13 Nov 2020 22:52:23 -0800 Subject: [PATCH 124/151] ocfs2: initialize ip_next_orphan commit f5785283dd64867a711ca1fb1f5bb172f252ecdf upstream. Though problem if found on a lower 4.1.12 kernel, I think upstream has same issue. In one node in the cluster, there is the following callback trace: # cat /proc/21473/stack __ocfs2_cluster_lock.isra.36+0x336/0x9e0 [ocfs2] ocfs2_inode_lock_full_nested+0x121/0x520 [ocfs2] ocfs2_evict_inode+0x152/0x820 [ocfs2] evict+0xae/0x1a0 iput+0x1c6/0x230 ocfs2_orphan_filldir+0x5d/0x100 [ocfs2] ocfs2_dir_foreach_blk+0x490/0x4f0 [ocfs2] ocfs2_dir_foreach+0x29/0x30 [ocfs2] ocfs2_recover_orphans+0x1b6/0x9a0 [ocfs2] ocfs2_complete_recovery+0x1de/0x5c0 [ocfs2] process_one_work+0x169/0x4a0 worker_thread+0x5b/0x560 kthread+0xcb/0xf0 ret_from_fork+0x61/0x90 The above stack is not reasonable, the final iput shouldn't happen in ocfs2_orphan_filldir() function. Looking at the code, 2067 /* Skip inodes which are already added to recover list, since dio may 2068 * happen concurrently with unlink/rename */ 2069 if (OCFS2_I(iter)->ip_next_orphan) { 2070 iput(iter); 2071 return 0; 2072 } 2073 The logic thinks the inode is already in recover list on seeing ip_next_orphan is non-NULL, so it skip this inode after dropping a reference which incremented in ocfs2_iget(). While, if the inode is already in recover list, it should have another reference and the iput() at line 2070 should not be the final iput (dropping the last reference). So I don't think the inode is really in the recover list (no vmcore to confirm). Note that ocfs2_queue_orphans(), though not shown up in the call back trace, is holding cluster lock on the orphan directory when looking up for unlinked inodes. The on disk inode eviction could involve a lot of IOs which may need long time to finish. That means this node could hold the cluster lock for very long time, that can lead to the lock requests (from other nodes) to the orhpan directory hang for long time. Looking at more on ip_next_orphan, I found it's not initialized when allocating a new ocfs2_inode_info structure. This causes te reflink operations from some nodes hang for very long time waiting for the cluster lock on the orphan directory. Fix: initialize ip_next_orphan as NULL. Signed-off-by: Wengang Wang Signed-off-by: Andrew Morton Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Cc: Link: https://lkml.kernel.org/r/20201109171746.27884-1-wen.gang.wang@oracle.com Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/ocfs2/super.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index 70d8857b161d..60b4b6df1ed3 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -1692,6 +1692,7 @@ static void ocfs2_inode_init_once(void *data) oi->ip_blkno = 0ULL; oi->ip_clusters = 0; + oi->ip_next_orphan = NULL; ocfs2_resv_init_once(&oi->ip_la_data_resv); From 33e53f2cac19cc6b2f2eaebc27ff70c813a04998 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sun, 4 Oct 2020 19:04:26 +0100 Subject: [PATCH 125/151] btrfs: fix potential overflow in cluster_pages_for_defrag on 32bit arch commit a1fbc6750e212c5675a4e48d7f51d44607eb8756 upstream. On 32-bit systems, this shift will overflow for files larger than 4GB as start_index is unsigned long while the calls to btrfs_delalloc_*_space expect u64. CC: stable@vger.kernel.org # 4.4+ Fixes: df480633b891 ("btrfs: extent-tree: Switch to new delalloc space reserve and release") Reviewed-by: Josef Bacik Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: David Sterba [ define the variable instead of repeating the shift ] Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/ioctl.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 3fd6c9aed719..f58e03d1775d 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1255,6 +1255,7 @@ static int cluster_pages_for_defrag(struct inode *inode, u64 page_start; u64 page_end; u64 page_cnt; + u64 start = (u64)start_index << PAGE_SHIFT; int ret; int i; int i_done; @@ -1271,8 +1272,7 @@ static int cluster_pages_for_defrag(struct inode *inode, page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1); ret = btrfs_delalloc_reserve_space(inode, &data_reserved, - start_index << PAGE_SHIFT, - page_cnt << PAGE_SHIFT); + start, page_cnt << PAGE_SHIFT); if (ret) return ret; i_done = 0; @@ -1361,8 +1361,7 @@ again: btrfs_mod_outstanding_extents(BTRFS_I(inode), 1); spin_unlock(&BTRFS_I(inode)->lock); btrfs_delalloc_release_space(inode, data_reserved, - start_index << PAGE_SHIFT, - (page_cnt - i_done) << PAGE_SHIFT, true); + start, (page_cnt - i_done) << PAGE_SHIFT, true); } @@ -1389,8 +1388,7 @@ out: put_page(pages[i]); } btrfs_delalloc_release_space(inode, data_reserved, - start_index << PAGE_SHIFT, - page_cnt << PAGE_SHIFT, true); + start, page_cnt << PAGE_SHIFT, true); btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT); extent_changeset_free(data_reserved); return ret; From 68dae71b7cde628a082e982c0abbf6e4de7b7db5 Mon Sep 17 00:00:00 2001 From: Chen Zhou Date: Thu, 12 Nov 2020 21:53:32 +0800 Subject: [PATCH 126/151] selinux: Fix error return code in sel_ib_pkey_sid_slow() commit c350f8bea271782e2733419bd2ab9bf4ec2051ef upstream. Fix to return a negative error code from the error handling case instead of 0 in function sel_ib_pkey_sid_slow(), as done elsewhere in this function. Cc: stable@vger.kernel.org Fixes: 409dcf31538a ("selinux: Add a cache for quicker retreival of PKey SIDs") Reported-by: Hulk Robot Signed-off-by: Chen Zhou Signed-off-by: Paul Moore Signed-off-by: Greg Kroah-Hartman --- security/selinux/ibpkey.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/security/selinux/ibpkey.c b/security/selinux/ibpkey.c index de92365e4324..5887bff50560 100644 --- a/security/selinux/ibpkey.c +++ b/security/selinux/ibpkey.c @@ -151,8 +151,10 @@ static int sel_ib_pkey_sid_slow(u64 subnet_prefix, u16 pkey_num, u32 *sid) * is valid, it just won't be added to the cache. */ new = kzalloc(sizeof(*new), GFP_ATOMIC); - if (!new) + if (!new) { + ret = -ENOMEM; goto out; + } new->psec.subnet_prefix = subnet_prefix; new->psec.pkey = pkey_num; From 819bf3b0d969ea91dd33e06271df2e3349923f20 Mon Sep 17 00:00:00 2001 From: Arnaud de Turckheim Date: Wed, 4 Nov 2020 16:24:53 +0100 Subject: [PATCH 127/151] gpio: pcie-idio-24: Fix irq mask when masking commit d8f270efeac850c569c305dc0baa42ac3d607988 upstream. Fix the bitwise operation to remove only the corresponding bit from the mask. Fixes: 585562046628 ("gpio: Add GPIO support for the ACCES PCIe-IDIO-24 family") Cc: stable@vger.kernel.org Signed-off-by: Arnaud de Turckheim Reviewed-by: William Breathitt Gray Signed-off-by: Bartosz Golaszewski Signed-off-by: Greg Kroah-Hartman --- drivers/gpio/gpio-pcie-idio-24.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpio/gpio-pcie-idio-24.c b/drivers/gpio/gpio-pcie-idio-24.c index 52f1647a46fd..c69f93dcbb58 100644 --- a/drivers/gpio/gpio-pcie-idio-24.c +++ b/drivers/gpio/gpio-pcie-idio-24.c @@ -365,7 +365,7 @@ static void idio_24_irq_mask(struct irq_data *data) raw_spin_lock_irqsave(&idio24gpio->lock, flags); - idio24gpio->irq_mask &= BIT(bit_offset); + idio24gpio->irq_mask &= ~BIT(bit_offset); new_irq_mask = idio24gpio->irq_mask >> bank_offset; if (!new_irq_mask) { From 7b6790ae3a94654565695c76aab65f3f5db67639 Mon Sep 17 00:00:00 2001 From: Arnaud de Turckheim Date: Wed, 4 Nov 2020 16:24:54 +0100 Subject: [PATCH 128/151] gpio: pcie-idio-24: Fix IRQ Enable Register value commit 23a7fdc06ebcc334fa667f0550676b035510b70b upstream. This fixes the COS Enable Register value for enabling/disabling the corresponding IRQs bank. Fixes: 585562046628 ("gpio: Add GPIO support for the ACCES PCIe-IDIO-24 family") Cc: stable@vger.kernel.org Signed-off-by: Arnaud de Turckheim Reviewed-by: William Breathitt Gray Signed-off-by: Bartosz Golaszewski Signed-off-by: Greg Kroah-Hartman --- drivers/gpio/gpio-pcie-idio-24.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpio/gpio-pcie-idio-24.c b/drivers/gpio/gpio-pcie-idio-24.c index c69f93dcbb58..9dbc63f1f358 100644 --- a/drivers/gpio/gpio-pcie-idio-24.c +++ b/drivers/gpio/gpio-pcie-idio-24.c @@ -360,13 +360,13 @@ static void idio_24_irq_mask(struct irq_data *data) unsigned long flags; const unsigned long bit_offset = irqd_to_hwirq(data) - 24; unsigned char new_irq_mask; - const unsigned long bank_offset = bit_offset/8 * 8; + const unsigned long bank_offset = bit_offset / 8; unsigned char cos_enable_state; raw_spin_lock_irqsave(&idio24gpio->lock, flags); idio24gpio->irq_mask &= ~BIT(bit_offset); - new_irq_mask = idio24gpio->irq_mask >> bank_offset; + new_irq_mask = idio24gpio->irq_mask >> bank_offset * 8; if (!new_irq_mask) { cos_enable_state = ioread8(&idio24gpio->reg->cos_enable); @@ -389,12 +389,12 @@ static void idio_24_irq_unmask(struct irq_data *data) unsigned long flags; unsigned char prev_irq_mask; const unsigned long bit_offset = irqd_to_hwirq(data) - 24; - const unsigned long bank_offset = bit_offset/8 * 8; + const unsigned long bank_offset = bit_offset / 8; unsigned char cos_enable_state; raw_spin_lock_irqsave(&idio24gpio->lock, flags); - prev_irq_mask = idio24gpio->irq_mask >> bank_offset; + prev_irq_mask = idio24gpio->irq_mask >> bank_offset * 8; idio24gpio->irq_mask |= BIT(bit_offset); if (!prev_irq_mask) { From 2a6cba6d3d72e9c4c4e5303715f49f8f03189137 Mon Sep 17 00:00:00 2001 From: Arnaud de Turckheim Date: Wed, 4 Nov 2020 16:24:55 +0100 Subject: [PATCH 129/151] gpio: pcie-idio-24: Enable PEX8311 interrupts commit 10a2f11d3c9e48363c729419e0f0530dea76e4fe upstream. This enables the PEX8311 internal PCI wire interrupt and the PEX8311 local interrupt input so the local interrupts are forwarded to the PCI. Fixes: 585562046628 ("gpio: Add GPIO support for the ACCES PCIe-IDIO-24 family") Cc: stable@vger.kernel.org Signed-off-by: Arnaud de Turckheim Reviewed-by: William Breathitt Gray Signed-off-by: Bartosz Golaszewski Signed-off-by: Greg Kroah-Hartman --- drivers/gpio/gpio-pcie-idio-24.c | 52 +++++++++++++++++++++++++++++++- 1 file changed, 51 insertions(+), 1 deletion(-) diff --git a/drivers/gpio/gpio-pcie-idio-24.c b/drivers/gpio/gpio-pcie-idio-24.c index 9dbc63f1f358..0cfeccb4ffe2 100644 --- a/drivers/gpio/gpio-pcie-idio-24.c +++ b/drivers/gpio/gpio-pcie-idio-24.c @@ -28,6 +28,47 @@ #include #include +/* + * PLX PEX8311 PCI LCS_INTCSR Interrupt Control/Status + * + * Bit: Description + * 0: Enable Interrupt Sources (Bit 0) + * 1: Enable Interrupt Sources (Bit 1) + * 2: Generate Internal PCI Bus Internal SERR# Interrupt + * 3: Mailbox Interrupt Enable + * 4: Power Management Interrupt Enable + * 5: Power Management Interrupt + * 6: Slave Read Local Data Parity Check Error Enable + * 7: Slave Read Local Data Parity Check Error Status + * 8: Internal PCI Wire Interrupt Enable + * 9: PCI Express Doorbell Interrupt Enable + * 10: PCI Abort Interrupt Enable + * 11: Local Interrupt Input Enable + * 12: Retry Abort Enable + * 13: PCI Express Doorbell Interrupt Active + * 14: PCI Abort Interrupt Active + * 15: Local Interrupt Input Active + * 16: Local Interrupt Output Enable + * 17: Local Doorbell Interrupt Enable + * 18: DMA Channel 0 Interrupt Enable + * 19: DMA Channel 1 Interrupt Enable + * 20: Local Doorbell Interrupt Active + * 21: DMA Channel 0 Interrupt Active + * 22: DMA Channel 1 Interrupt Active + * 23: Built-In Self-Test (BIST) Interrupt Active + * 24: Direct Master was the Bus Master during a Master or Target Abort + * 25: DMA Channel 0 was the Bus Master during a Master or Target Abort + * 26: DMA Channel 1 was the Bus Master during a Master or Target Abort + * 27: Target Abort after internal 256 consecutive Master Retrys + * 28: PCI Bus wrote data to LCS_MBOX0 + * 29: PCI Bus wrote data to LCS_MBOX1 + * 30: PCI Bus wrote data to LCS_MBOX2 + * 31: PCI Bus wrote data to LCS_MBOX3 + */ +#define PLX_PEX8311_PCI_LCS_INTCSR 0x68 +#define INTCSR_INTERNAL_PCI_WIRE BIT(8) +#define INTCSR_LOCAL_INPUT BIT(11) + /** * struct idio_24_gpio_reg - GPIO device registers structure * @out0_7: Read: FET Outputs 0-7 @@ -92,6 +133,7 @@ struct idio_24_gpio_reg { struct idio_24_gpio { struct gpio_chip chip; raw_spinlock_t lock; + __u8 __iomem *plx; struct idio_24_gpio_reg __iomem *reg; unsigned long irq_mask; }; @@ -481,6 +523,7 @@ static int idio_24_probe(struct pci_dev *pdev, const struct pci_device_id *id) struct device *const dev = &pdev->dev; struct idio_24_gpio *idio24gpio; int err; + const size_t pci_plx_bar_index = 1; const size_t pci_bar_index = 2; const char *const name = pci_name(pdev); @@ -494,12 +537,13 @@ static int idio_24_probe(struct pci_dev *pdev, const struct pci_device_id *id) return err; } - err = pcim_iomap_regions(pdev, BIT(pci_bar_index), name); + err = pcim_iomap_regions(pdev, BIT(pci_plx_bar_index) | BIT(pci_bar_index), name); if (err) { dev_err(dev, "Unable to map PCI I/O addresses (%d)\n", err); return err; } + idio24gpio->plx = pcim_iomap_table(pdev)[pci_plx_bar_index]; idio24gpio->reg = pcim_iomap_table(pdev)[pci_bar_index]; idio24gpio->chip.label = name; @@ -520,6 +564,12 @@ static int idio_24_probe(struct pci_dev *pdev, const struct pci_device_id *id) /* Software board reset */ iowrite8(0, &idio24gpio->reg->soft_reset); + /* + * enable PLX PEX8311 internal PCI wire interrupt and local interrupt + * input + */ + iowrite8((INTCSR_INTERNAL_PCI_WIRE | INTCSR_LOCAL_INPUT) >> 8, + idio24gpio->plx + PLX_PEX8311_PCI_LCS_INTCSR + 1); err = devm_gpiochip_add_data(dev, &idio24gpio->chip, idio24gpio); if (err) { From e1d706eeeaf760d53dcf08515a85251181b3ea32 Mon Sep 17 00:00:00 2001 From: Yangbo Lu Date: Tue, 10 Nov 2020 15:13:14 +0800 Subject: [PATCH 130/151] mmc: sdhci-of-esdhc: Handle pulse width detection erratum for more SoCs commit 71b053276a87ddfa40c8f236315d81543219bfb9 upstream. Apply erratum workaround of unreliable pulse width detection to more affected platforms (LX2160A Rev2.0 and LS1028A Rev1.0). Signed-off-by: Yangbo Lu Fixes: 48e304cc1970 ("mmc: sdhci-of-esdhc: workaround for unreliable pulse width detection") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201110071314.3868-1-yangbo.lu@nxp.com Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/host/sdhci-of-esdhc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/mmc/host/sdhci-of-esdhc.c b/drivers/mmc/host/sdhci-of-esdhc.c index 64196c1b1c8f..5922ae021d86 100644 --- a/drivers/mmc/host/sdhci-of-esdhc.c +++ b/drivers/mmc/host/sdhci-of-esdhc.c @@ -1212,6 +1212,8 @@ static struct soc_device_attribute soc_fixup_sdhc_clkdivs[] = { static struct soc_device_attribute soc_unreliable_pulse_detection[] = { { .family = "QorIQ LX2160A", .revision = "1.0", }, + { .family = "QorIQ LX2160A", .revision = "2.0", }, + { .family = "QorIQ LS1028A", .revision = "1.0", }, { }, }; From 039c8dcd2b150205ed6cb02d03a45468ba8fc618 Mon Sep 17 00:00:00 2001 From: Yoshihiro Shimoda Date: Fri, 6 Nov 2020 18:25:30 +0900 Subject: [PATCH 131/151] mmc: renesas_sdhi_core: Add missing tmio_mmc_host_free() at remove MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit e8973201d9b281375b5a8c66093de5679423021a upstream. The commit 94b110aff867 ("mmc: tmio: add tmio_mmc_host_alloc/free()") added tmio_mmc_host_free(), but missed the function calling in the sh_mobile_sdhi_remove() at that time. So, fix it. Otherwise, we cannot rebind the sdhi/mmc devices when we use aliases of mmc. Fixes: 94b110aff867 ("mmc: tmio: add tmio_mmc_host_alloc/free()") Signed-off-by: Yoshihiro Shimoda Reviewed-by: Wolfram Sang Tested-by: Wolfram Sang Reviewed-by: Niklas Söderlund Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1604654730-29914-1-git-send-email-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/host/renesas_sdhi_core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c index 234551a68739..689eb119d44f 100644 --- a/drivers/mmc/host/renesas_sdhi_core.c +++ b/drivers/mmc/host/renesas_sdhi_core.c @@ -874,6 +874,7 @@ int renesas_sdhi_remove(struct platform_device *pdev) tmio_mmc_host_remove(host); renesas_sdhi_clk_disable(host); + tmio_mmc_host_free(host); return 0; } From 974e3a7002a0c4adb921b8b92656463ae795b61c Mon Sep 17 00:00:00 2001 From: Al Viro Date: Wed, 28 Oct 2020 16:39:49 -0400 Subject: [PATCH 132/151] don't dump the threads that had been already exiting when zapped. commit 77f6ab8b7768cf5e6bdd0e72499270a0671506ee upstream. Coredump logics needs to report not only the registers of the dumping thread, but (since 2.5.43) those of other threads getting killed. Doing that might require extra state saved on the stack in asm glue at kernel entry; signal delivery logics does that (we need to be able to save sigcontext there, at the very least) and so does seccomp. That covers all callers of do_coredump(). Secondary threads get hit with SIGKILL and caught as soon as they reach exit_mm(), which normally happens in signal delivery, so those are also fine most of the time. Unfortunately, it is possible to end up with secondary zapped when it has already entered exit(2) (or, worse yet, is oopsing). In those cases we reach exit_mm() when mm->core_state is already set, but the stack contents is not what we would have in signal delivery. At least on two architectures (alpha and m68k) it leads to infoleaks - we end up with a chunk of kernel stack written into coredump, with the contents consisting of normal C stack frames of the call chain leading to exit_mm() instead of the expected copy of userland registers. In case of alpha we leak 312 bytes of stack. Other architectures (including the regset-using ones) might have similar problems - the normal user of regsets is ptrace and the state of tracee at the time of such calls is special in the same way signal delivery is. Note that had the zapper gotten to the exiting thread slightly later, it wouldn't have been included into coredump anyway - we skip the threads that have already cleared their ->mm. So let's pretend that zapper always loses the race. IOW, have exit_mm() only insert into the dumper list if we'd gotten there from handling a fatal signal[*] As the result, the callers of do_exit() that have *not* gone through get_signal() are not seen by coredump logics as secondary threads. Which excludes voluntary exit()/oopsen/traps/etc. The dumper thread itself is unaffected by that, so seccomp is fine. [*] originally I intended to add a new flag in tsk->flags, but ebiederman pointed out that PF_SIGNALED is already doing just what we need. Cc: stable@vger.kernel.org Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3") History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Acked-by: "Eric W. Biederman" Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman --- kernel/exit.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/exit.c b/kernel/exit.c index fa46977b9c07..ece64771a31f 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -456,7 +456,10 @@ static void exit_mm(void) up_read(&mm->mmap_sem); self.task = current; - self.next = xchg(&core_state->dumper.next, &self); + if (self.task->flags & PF_SIGNALED) + self.next = xchg(&core_state->dumper.next, &self); + else + self.task = NULL; /* * Implies mb(), the result of xchg() must be visible * to core_state->dumper. From c6a6168a31e100baf2e2b54da87d89b2ff384c1d Mon Sep 17 00:00:00 2001 From: Thomas Zimmermann Date: Thu, 5 Nov 2020 20:02:56 +0100 Subject: [PATCH 133/151] drm/gma500: Fix out-of-bounds access to struct drm_device.vblank[] commit 06ad8d339524bf94b89859047822c31df6ace239 upstream. The gma500 driver expects 3 pipelines in several it's IRQ functions. Accessing struct drm_device.vblank[], this fails with devices that only have 2 pipelines. An example KASAN report is shown below. [ 62.267688] ================================================================== [ 62.268856] BUG: KASAN: slab-out-of-bounds in psb_irq_postinstall+0x250/0x3c0 [gma500_gfx] [ 62.269450] Read of size 1 at addr ffff8880012bc6d0 by task systemd-udevd/285 [ 62.269949] [ 62.270192] CPU: 0 PID: 285 Comm: systemd-udevd Tainted: G E 5.10.0-rc1-1-default+ #572 [ 62.270807] Hardware name: /DN2800MT, BIOS MTCDT10N.86A.0164.2012.1213.1024 12/13/2012 [ 62.271366] Call Trace: [ 62.271705] dump_stack+0xae/0xe5 [ 62.272180] print_address_description.constprop.0+0x17/0xf0 [ 62.272987] ? psb_irq_postinstall+0x250/0x3c0 [gma500_gfx] [ 62.273474] __kasan_report.cold+0x20/0x38 [ 62.273989] ? psb_irq_postinstall+0x250/0x3c0 [gma500_gfx] [ 62.274460] kasan_report+0x3a/0x50 [ 62.274891] psb_irq_postinstall+0x250/0x3c0 [gma500_gfx] [ 62.275380] drm_irq_install+0x131/0x1f0 <...> [ 62.300751] Allocated by task 285: [ 62.301223] kasan_save_stack+0x1b/0x40 [ 62.301731] __kasan_kmalloc.constprop.0+0xbf/0xd0 [ 62.302293] drmm_kmalloc+0x55/0x100 [ 62.302773] drm_vblank_init+0x77/0x210 Resolve the issue by only handling vblank entries up to the number of CRTCs. I'm adding a Fixes tag for reference, although the bug has been present since the driver's initial commit. Signed-off-by: Thomas Zimmermann Reviewed-by: Daniel Vetter Fixes: 5c49fd3aa0ab ("gma500: Add the core DRM files and headers") Cc: Alan Cox Cc: Dave Airlie Cc: Patrik Jakobsson Cc: dri-devel@lists.freedesktop.org Cc: stable@vger.kernel.org#v3.3+ Link: https://patchwork.freedesktop.org/patch/msgid/20201105190256.3893-1-tzimmermann@suse.de Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/gma500/psb_irq.c | 34 +++++++++++--------------------- 1 file changed, 12 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/gma500/psb_irq.c b/drivers/gpu/drm/gma500/psb_irq.c index e6265fb85626..56bb34d04332 100644 --- a/drivers/gpu/drm/gma500/psb_irq.c +++ b/drivers/gpu/drm/gma500/psb_irq.c @@ -337,6 +337,7 @@ int psb_irq_postinstall(struct drm_device *dev) { struct drm_psb_private *dev_priv = dev->dev_private; unsigned long irqflags; + unsigned int i; spin_lock_irqsave(&dev_priv->irqmask_lock, irqflags); @@ -349,20 +350,12 @@ int psb_irq_postinstall(struct drm_device *dev) PSB_WVDC32(dev_priv->vdc_irq_mask, PSB_INT_ENABLE_R); PSB_WVDC32(0xFFFFFFFF, PSB_HWSTAM); - if (dev->vblank[0].enabled) - psb_enable_pipestat(dev_priv, 0, PIPE_VBLANK_INTERRUPT_ENABLE); - else - psb_disable_pipestat(dev_priv, 0, PIPE_VBLANK_INTERRUPT_ENABLE); - - if (dev->vblank[1].enabled) - psb_enable_pipestat(dev_priv, 1, PIPE_VBLANK_INTERRUPT_ENABLE); - else - psb_disable_pipestat(dev_priv, 1, PIPE_VBLANK_INTERRUPT_ENABLE); - - if (dev->vblank[2].enabled) - psb_enable_pipestat(dev_priv, 2, PIPE_VBLANK_INTERRUPT_ENABLE); - else - psb_disable_pipestat(dev_priv, 2, PIPE_VBLANK_INTERRUPT_ENABLE); + for (i = 0; i < dev->num_crtcs; ++i) { + if (dev->vblank[i].enabled) + psb_enable_pipestat(dev_priv, i, PIPE_VBLANK_INTERRUPT_ENABLE); + else + psb_disable_pipestat(dev_priv, i, PIPE_VBLANK_INTERRUPT_ENABLE); + } if (dev_priv->ops->hotplug_enable) dev_priv->ops->hotplug_enable(dev, true); @@ -375,6 +368,7 @@ void psb_irq_uninstall(struct drm_device *dev) { struct drm_psb_private *dev_priv = dev->dev_private; unsigned long irqflags; + unsigned int i; spin_lock_irqsave(&dev_priv->irqmask_lock, irqflags); @@ -383,14 +377,10 @@ void psb_irq_uninstall(struct drm_device *dev) PSB_WVDC32(0xFFFFFFFF, PSB_HWSTAM); - if (dev->vblank[0].enabled) - psb_disable_pipestat(dev_priv, 0, PIPE_VBLANK_INTERRUPT_ENABLE); - - if (dev->vblank[1].enabled) - psb_disable_pipestat(dev_priv, 1, PIPE_VBLANK_INTERRUPT_ENABLE); - - if (dev->vblank[2].enabled) - psb_disable_pipestat(dev_priv, 2, PIPE_VBLANK_INTERRUPT_ENABLE); + for (i = 0; i < dev->num_crtcs; ++i) { + if (dev->vblank[i].enabled) + psb_disable_pipestat(dev_priv, i, PIPE_VBLANK_INTERRUPT_ENABLE); + } dev_priv->vdc_irq_mask &= _PSB_IRQ_SGX_FLAG | _PSB_IRQ_MSVDX_FLAG | From fa76dd3c1df3f32511f5fa7789e2f8301936eb3e Mon Sep 17 00:00:00 2001 From: Coiby Xu Date: Fri, 6 Nov 2020 07:19:10 +0800 Subject: [PATCH 134/151] pinctrl: amd: use higher precision for 512 RtcClk commit c64a6a0d4a928c63e5bc3b485552a8903a506c36 upstream. RTC is 32.768kHz thus 512 RtcClk equals 15625 usec. The documentation likely has dropped precision and that's why the driver mistakenly took the slightly deviated value. Cc: stable@vger.kernel.org Reported-by: Andy Shevchenko Suggested-by: Andy Shevchenko Suggested-by: Hans de Goede Signed-off-by: Coiby Xu Reviewed-by: Andy Shevchenko Reviewed-by: Hans de Goede Link: https://lore.kernel.org/linux-gpio/2f4706a1-502f-75f0-9596-cc25b4933b6c@redhat.com/ Link: https://lore.kernel.org/r/20201105231912.69527-3-coiby.xu@gmail.com Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman --- drivers/pinctrl/pinctrl-amd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pinctrl/pinctrl-amd.c b/drivers/pinctrl/pinctrl-amd.c index eab078244a4c..3918f4fb3962 100644 --- a/drivers/pinctrl/pinctrl-amd.c +++ b/drivers/pinctrl/pinctrl-amd.c @@ -153,7 +153,7 @@ static int amd_gpio_set_debounce(struct gpio_chip *gc, unsigned offset, pin_reg |= BIT(DB_TMR_OUT_UNIT_OFF); pin_reg &= ~BIT(DB_TMR_LARGE_OFF); } else if (debounce < 250000) { - time = debounce / 15600; + time = debounce / 15625; pin_reg |= time & DB_TMR_OUT_MASK; pin_reg &= ~BIT(DB_TMR_OUT_UNIT_OFF); pin_reg |= BIT(DB_TMR_LARGE_OFF); From 2cd21fe5bcc4137912b30033610feca8d40e9bed Mon Sep 17 00:00:00 2001 From: Coiby Xu Date: Fri, 6 Nov 2020 07:19:09 +0800 Subject: [PATCH 135/151] pinctrl: amd: fix incorrect way to disable debounce filter commit 06abe8291bc31839950f7d0362d9979edc88a666 upstream. The correct way to disable debounce filter is to clear bit 5 and 6 of the register. Cc: stable@vger.kerne.org Signed-off-by: Coiby Xu Reviewed-by: Hans de Goede Cc: Hans de Goede Link: https://lore.kernel.org/linux-gpio/df2c008b-e7b5-4fdd-42ea-4d1c62b52139@redhat.com/ Link: https://lore.kernel.org/r/20201105231912.69527-2-coiby.xu@gmail.com Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman --- drivers/pinctrl/pinctrl-amd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/pinctrl/pinctrl-amd.c b/drivers/pinctrl/pinctrl-amd.c index 3918f4fb3962..12b2707296b6 100644 --- a/drivers/pinctrl/pinctrl-amd.c +++ b/drivers/pinctrl/pinctrl-amd.c @@ -163,14 +163,14 @@ static int amd_gpio_set_debounce(struct gpio_chip *gc, unsigned offset, pin_reg |= BIT(DB_TMR_OUT_UNIT_OFF); pin_reg |= BIT(DB_TMR_LARGE_OFF); } else { - pin_reg &= ~DB_CNTRl_MASK; + pin_reg &= ~(DB_CNTRl_MASK << DB_CNTRL_OFF); ret = -EINVAL; } } else { pin_reg &= ~BIT(DB_TMR_OUT_UNIT_OFF); pin_reg &= ~BIT(DB_TMR_LARGE_OFF); pin_reg &= ~DB_TMR_OUT_MASK; - pin_reg &= ~DB_CNTRl_MASK; + pin_reg &= ~(DB_CNTRl_MASK << DB_CNTRL_OFF); } writel(pin_reg, gpio_dev->base + offset * 4); raw_spin_unlock_irqrestore(&gpio_dev->lock, flags); From 98901bff58d93e63980abe0c461633004f7954f8 Mon Sep 17 00:00:00 2001 From: Stefano Stabellini Date: Mon, 26 Oct 2020 17:02:14 -0700 Subject: [PATCH 136/151] swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb" commit e9696d259d0fb5d239e8c28ca41089838ea76d13 upstream. kernel/dma/swiotlb.c:swiotlb_init gets called first and tries to allocate a buffer for the swiotlb. It does so by calling memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE); If the allocation must fail, no_iotlb_memory is set. Later during initialization swiotlb-xen comes in (drivers/xen/swiotlb-xen.c:xen_swiotlb_init) and given that io_tlb_start is != 0, it thinks the memory is ready to use when actually it is not. When the swiotlb is actually needed, swiotlb_tbl_map_single gets called and since no_iotlb_memory is set the kernel panics. Instead, if swiotlb-xen.c:xen_swiotlb_init knew the swiotlb hadn't been initialized, it would do the initialization itself, which might still succeed. Fix the panic by setting io_tlb_start to 0 on swiotlb initialization failure, and also by setting no_iotlb_memory to false on swiotlb initialization success. Fixes: ac2cbab21f31 ("x86: Don't panic if can not alloc buffer for swiotlb") Reported-by: Elliott Mitchell Tested-by: Elliott Mitchell Signed-off-by: Stefano Stabellini Reviewed-by: Christoph Hellwig Cc: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman --- kernel/dma/swiotlb.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 673a2cdb2656..f99b79d7e123 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -230,6 +230,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose) io_tlb_orig_addr[i] = INVALID_PHYS_ADDR; } io_tlb_index = 0; + no_iotlb_memory = false; if (verbose) swiotlb_print_info(); @@ -261,9 +262,11 @@ swiotlb_init(int verbose) if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose)) return; - if (io_tlb_start) + if (io_tlb_start) { memblock_free_early(io_tlb_start, PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT)); + io_tlb_start = 0; + } pr_warn("Cannot allocate buffer"); no_iotlb_memory = true; } @@ -361,6 +364,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs) io_tlb_orig_addr[i] = INVALID_PHYS_ADDR; } io_tlb_index = 0; + no_iotlb_memory = false; swiotlb_print_info(); From 22ee23fe1cc9fa4e6c4208b803c762d16f77d5c5 Mon Sep 17 00:00:00 2001 From: Oliver Herms Date: Tue, 3 Nov 2020 11:41:33 +0100 Subject: [PATCH 137/151] IPv6: Set SIT tunnel hard_header_len to zero [ Upstream commit 8ef9ba4d666614497a057d09b0a6eafc1e34eadf ] Due to the legacy usage of hard_header_len for SIT tunnels while already using infrastructure from net/ipv4/ip_tunnel.c the calculation of the path MTU in tnl_update_pmtu is incorrect. This leads to unnecessary creation of MTU exceptions for any flow going over a SIT tunnel. As SIT tunnels do not have a header themsevles other than their transport (L3, L2) headers we're leaving hard_header_len set to zero as tnl_update_pmtu is already taking care of the transport headers sizes. This will also help avoiding unnecessary IPv6 GC runs and spinlock contention seen when using SIT tunnels and for more than net.ipv6.route.gc_thresh flows. Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Signed-off-by: Oliver Herms Acked-by: Willem de Bruijn Link: https://lore.kernel.org/r/20201103104133.GA1573211@tws Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- net/ipv6/sit.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index 98954830c40b..2872f7a00e86 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -1088,7 +1088,6 @@ static void ipip6_tunnel_bind_dev(struct net_device *dev) if (tdev && !netif_is_l3_master(tdev)) { int t_hlen = tunnel->hlen + sizeof(struct iphdr); - dev->hard_header_len = tdev->hard_header_len + sizeof(struct iphdr); dev->mtu = tdev->mtu - t_hlen; if (dev->mtu < IPV6_MIN_MTU) dev->mtu = IPV6_MIN_MTU; @@ -1378,7 +1377,6 @@ static void ipip6_tunnel_setup(struct net_device *dev) dev->priv_destructor = ipip6_dev_free; dev->type = ARPHRD_SIT; - dev->hard_header_len = LL_MAX_HEADER + t_hlen; dev->mtu = ETH_DATA_LEN - t_hlen; dev->min_mtu = IPV6_MIN_MTU; dev->max_mtu = IP6_MAX_MTU - t_hlen; From 016e70d176ff947cb313097163f0ac3b15282aed Mon Sep 17 00:00:00 2001 From: Ursula Braun Date: Mon, 9 Nov 2020 08:57:05 +0100 Subject: [PATCH 138/151] net/af_iucv: fix null pointer dereference on shutdown [ Upstream commit 4031eeafa71eaf22ae40a15606a134ae86345daf ] syzbot reported the following KASAN finding: BUG: KASAN: nullptr-dereference in iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385 Read of size 2 at addr 000000000000021e by task syz-executor907/519 CPU: 0 PID: 519 Comm: syz-executor907 Not tainted 5.9.0-syzkaller-07043-gbcf9877ad213 #0 Hardware name: IBM 3906 M04 701 (KVM/Linux) Call Trace: [<00000000c576af60>] unwind_start arch/s390/include/asm/unwind.h:65 [inline] [<00000000c576af60>] show_stack+0x180/0x228 arch/s390/kernel/dumpstack.c:135 [<00000000c9dcd1f8>] __dump_stack lib/dump_stack.c:77 [inline] [<00000000c9dcd1f8>] dump_stack+0x268/0x2f0 lib/dump_stack.c:118 [<00000000c5fed016>] print_address_description.constprop.0+0x5e/0x218 mm/kasan/report.c:383 [<00000000c5fec82a>] __kasan_report mm/kasan/report.c:517 [inline] [<00000000c5fec82a>] kasan_report+0x11a/0x168 mm/kasan/report.c:534 [<00000000c98b5b60>] iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385 [<00000000c98b6262>] iucv_sock_shutdown+0x44a/0x4c0 net/iucv/af_iucv.c:1457 [<00000000c89d3a54>] __sys_shutdown+0x12c/0x1c8 net/socket.c:2204 [<00000000c89d3b70>] __do_sys_shutdown net/socket.c:2212 [inline] [<00000000c89d3b70>] __s390x_sys_shutdown+0x38/0x48 net/socket.c:2210 [<00000000c9e36eac>] system_call+0xe0/0x28c arch/s390/kernel/entry.S:415 There is nothing to shutdown if a connection has never been established. Besides that iucv->hs_dev is not yet initialized if a socket is in IUCV_OPEN state and iucv->path is not yet initialized if socket is in IUCV_BOUND state. So, just skip the shutdown calls for a socket in these states. Fixes: eac3731bd04c ("[S390]: Add AF_IUCV socket support") Fixes: 82492a355fac ("af_iucv: add shutdown for HS transport") Reviewed-by: Vasily Gorbik Signed-off-by: Ursula Braun [jwi: correct one Fixes tag] Signed-off-by: Julian Wiedmann Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- net/iucv/af_iucv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c index ebb62a4ebe30..be8fd79202b8 100644 --- a/net/iucv/af_iucv.c +++ b/net/iucv/af_iucv.c @@ -1574,7 +1574,8 @@ static int iucv_sock_shutdown(struct socket *sock, int how) break; } - if (how == SEND_SHUTDOWN || how == SHUTDOWN_MASK) { + if ((how == SEND_SHUTDOWN || how == SHUTDOWN_MASK) && + sk->sk_state == IUCV_CONNECTED) { if (iucv->transport == AF_IUCV_TRANS_IUCV) { txmsg.class = 0; txmsg.tag = 0; From 25786fb512f7bf4fea376ce380ea940d0fc53e5e Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Wed, 11 Nov 2020 20:45:25 +0000 Subject: [PATCH 139/151] net: udp: fix UDP header access on Fast/frag0 UDP GRO [ Upstream commit 4b1a86281cc1d0de46df3ad2cb8c1f86ac07681c ] UDP GRO uses udp_hdr(skb) in its .gro_receive() callback. While it's probably OK for non-frag0 paths (when all headers or even the entire frame are already in skb head), this inline points to junk when using Fast GRO (napi_gro_frags() or napi_gro_receive() with only Ethernet header in skb head and all the rest in the frags) and breaks GRO packet compilation and the packet flow itself. To support both modes, skb_gro_header_fast() + skb_gro_header_slow() are typically used. UDP even has an inline helper that makes use of them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr() to get rid of the out-of-order delivers. Present since the introduction of plain UDP GRO in 5.0-rc1. Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") Cc: Eric Dumazet Signed-off-by: Alexander Lobakin Acked-by: Willem de Bruijn Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- net/ipv4/udp_offload.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index a3908e55ed89..d7c64e953e9a 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -349,7 +349,7 @@ out: static struct sk_buff *udp_gro_receive_segment(struct list_head *head, struct sk_buff *skb) { - struct udphdr *uh = udp_hdr(skb); + struct udphdr *uh = udp_gro_udphdr(skb); struct sk_buff *pp = NULL; struct udphdr *uh2; struct sk_buff *p; From 7e332a5c0e2c451fe1488fa48783b671e15c15b7 Mon Sep 17 00:00:00 2001 From: Mao Wenan Date: Tue, 10 Nov 2020 08:16:31 +0800 Subject: [PATCH 140/151] net: Update window_clamp if SOCK_RCVBUF is set [ Upstream commit 909172a149749242990a6e64cb55d55460d4e417 ] When net.ipv4.tcp_syncookies=1 and syn flood is happened, cookie_v4_check or cookie_v6_check tries to redo what tcp_v4_send_synack or tcp_v6_send_synack did, rsk_window_clamp will be changed if SOCK_RCVBUF is set, which will make rcv_wscale is different, the client still operates with initial window scale and can overshot granted window, the client use the initial scale but local server use new scale to advertise window value, and session work abnormally. Fixes: e88c64f0a425 ("tcp: allow effective reduction of TCP's rcv-buffer via setsockopt") Signed-off-by: Mao Wenan Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/1604967391-123737-1-git-send-email-wenan.mao@linux.alibaba.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- net/ipv4/syncookies.c | 9 +++++++-- net/ipv6/syncookies.c | 10 ++++++++-- 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c index 535b69326f66..2b45d1455592 100644 --- a/net/ipv4/syncookies.c +++ b/net/ipv4/syncookies.c @@ -291,7 +291,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb) __u32 cookie = ntohl(th->ack_seq) - 1; struct sock *ret = sk; struct request_sock *req; - int mss; + int full_space, mss; struct rtable *rt; __u8 rcv_wscale; struct flowi4 fl4; @@ -386,8 +386,13 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb) /* Try to redo what tcp_v4_send_synack did. */ req->rsk_window_clamp = tp->window_clamp ? :dst_metric(&rt->dst, RTAX_WINDOW); + /* limit the window selection if the user enforce a smaller rx buffer */ + full_space = tcp_full_space(sk); + if (sk->sk_userlocks & SOCK_RCVBUF_LOCK && + (req->rsk_window_clamp > full_space || req->rsk_window_clamp == 0)) + req->rsk_window_clamp = full_space; - tcp_select_initial_window(sk, tcp_full_space(sk), req->mss, + tcp_select_initial_window(sk, full_space, req->mss, &req->rsk_rcv_wnd, &req->rsk_window_clamp, ireq->wscale_ok, &rcv_wscale, dst_metric(&rt->dst, RTAX_INITRWND)); diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c index 30915f6f31e3..ec155844012b 100644 --- a/net/ipv6/syncookies.c +++ b/net/ipv6/syncookies.c @@ -136,7 +136,7 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb) __u32 cookie = ntohl(th->ack_seq) - 1; struct sock *ret = sk; struct request_sock *req; - int mss; + int full_space, mss; struct dst_entry *dst; __u8 rcv_wscale; u32 tsoff = 0; @@ -241,7 +241,13 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb) } req->rsk_window_clamp = tp->window_clamp ? :dst_metric(dst, RTAX_WINDOW); - tcp_select_initial_window(sk, tcp_full_space(sk), req->mss, + /* limit the window selection if the user enforce a smaller rx buffer */ + full_space = tcp_full_space(sk); + if (sk->sk_userlocks & SOCK_RCVBUF_LOCK && + (req->rsk_window_clamp > full_space || req->rsk_window_clamp == 0)) + req->rsk_window_clamp = full_space; + + tcp_select_initial_window(sk, full_space, req->mss, &req->rsk_rcv_wnd, &req->rsk_window_clamp, ireq->wscale_ok, &rcv_wscale, dst_metric(dst, RTAX_INITRWND)); From c59039a088bd4f33cb2f0cc11103aadb241b6685 Mon Sep 17 00:00:00 2001 From: Martin Schiller Date: Mon, 9 Nov 2020 07:54:49 +0100 Subject: [PATCH 141/151] net/x25: Fix null-ptr-deref in x25_connect [ Upstream commit 361182308766a265b6c521879b34302617a8c209 ] This fixes a regression for blocking connects introduced by commit 4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect"). The x25->neighbour is already set to "NULL" by x25_disconnect() now, while a blocking connect is waiting in x25_wait_for_connection_establishment(). Therefore x25->neighbour must not be accessed here again and x25->state is also already set to X25_STATE_0 by x25_disconnect(). Fixes: 4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect") Signed-off-by: Martin Schiller Reviewed-by: Xie He Link: https://lore.kernel.org/r/20201109065449.9014-1-ms@dev.tdt.de Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- net/x25/af_x25.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c index 256f3e97d1f3..5e8146fcb583 100644 --- a/net/x25/af_x25.c +++ b/net/x25/af_x25.c @@ -819,7 +819,7 @@ static int x25_connect(struct socket *sock, struct sockaddr *uaddr, sock->state = SS_CONNECTED; rc = 0; out_put_neigh: - if (rc) { + if (rc && x25->neighbour) { read_lock_bh(&x25_list_lock); x25_neigh_put(x25->neighbour); x25->neighbour = NULL; From 78f6fac0814e242690d047ee5aacf093709b8797 Mon Sep 17 00:00:00 2001 From: Wang Hai Date: Mon, 9 Nov 2020 22:09:13 +0800 Subject: [PATCH 142/151] tipc: fix memory leak in tipc_topsrv_start() [ Upstream commit fa6882c63621821f73cc806f291208e1c6ea6187 ] kmemleak report a memory leak as follows: unreferenced object 0xffff88810a596800 (size 512): comm "ip", pid 21558, jiffies 4297568990 (age 112.120s) hex dump (first 32 bytes): 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N.......... ff ff ff ff ff ff ff ff 00 83 60 b0 ff ff ff ff ..........`..... backtrace: [<0000000022bbe21f>] tipc_topsrv_init_net+0x1f3/0xa70 [<00000000fe15ddf7>] ops_init+0xa8/0x3c0 [<00000000138af6f2>] setup_net+0x2de/0x7e0 [<000000008c6807a3>] copy_net_ns+0x27d/0x530 [<000000006b21adbd>] create_new_namespaces+0x382/0xa30 [<00000000bb169746>] unshare_nsproxy_namespaces+0xa1/0x1d0 [<00000000fe2e42bc>] ksys_unshare+0x39c/0x780 [<0000000009ba3b19>] __x64_sys_unshare+0x2d/0x40 [<00000000614ad866>] do_syscall_64+0x56/0xa0 [<00000000a1b5ca3c>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 'srv' is malloced in tipc_topsrv_start() but not free before leaving from the error handling cases. We need to free it. Fixes: 5c45ab24ac77 ("tipc: make struct tipc_server private for server.c") Reported-by: Hulk Robot Signed-off-by: Wang Hai Link: https://lore.kernel.org/r/20201109140913.47370-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- net/tipc/topsrv.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c index 931c426673c0..444e1792d02c 100644 --- a/net/tipc/topsrv.c +++ b/net/tipc/topsrv.c @@ -664,12 +664,18 @@ static int tipc_topsrv_start(struct net *net) ret = tipc_topsrv_work_start(srv); if (ret < 0) - return ret; + goto err_start; ret = tipc_topsrv_create_listener(srv); if (ret < 0) - tipc_topsrv_work_stop(srv); + goto err_create; + return 0; + +err_create: + tipc_topsrv_work_stop(srv); +err_start: + kfree(srv); return ret; } From 6fcf4141b9a275fa4aa647974f06c1916904e3f9 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Thu, 5 Nov 2020 15:28:42 +0100 Subject: [PATCH 143/151] r8169: fix potential skb double free in an error path [ Upstream commit cc6528bc9a0c901c83b8220a2e2617f3354d6dd9 ] The caller of rtl8169_tso_csum_v2() frees the skb if false is returned. eth_skb_pad() internally frees the skb on error what would result in a double free. Therefore use __skb_put_padto() directly and instruct it to not free the skb on error. Fixes: b423e9ae49d7 ("r8169: fix offloaded tx checksum for small packets.") Reported-by: Jakub Kicinski Signed-off-by: Heiner Kallweit Link: https://lore.kernel.org/r/f7e68191-acff-9ded-4263-c016428a8762@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/realtek/r8169_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index d8881ba773de..fd5adb0c54d2 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -5846,7 +5846,8 @@ static bool rtl8169_tso_csum_v2(struct rtl8169_private *tp, opts[1] |= transport_offset << TCPHO_SHIFT; } else { if (unlikely(rtl_test_hw_pad_bug(tp, skb))) - return !eth_skb_pad(skb); + /* eth_skb_pad would free the skb on error */ + return !__skb_put_padto(skb, ETH_ZLEN, false); } return true; From 5af9d48acbee63bc3ee6524fc99fc78c018064a7 Mon Sep 17 00:00:00 2001 From: Venkata Sandeep Dhanalakota Date: Thu, 5 Nov 2020 17:18:42 -0800 Subject: [PATCH 144/151] drm/i915: Correctly set SFC capability for video engines commit 5ce6861d36ed5207aff9e5eead4c7cc38a986586 upstream. SFC capability of video engines is not set correctly because i915 is testing for incorrect bits. Fixes: c5d3e39caa45 ("drm/i915: Engine discovery query") Cc: Matt Roper Cc: Tvrtko Ursulin Signed-off-by: Venkata Sandeep Dhanalakota Signed-off-by: Daniele Ceraolo Spurio Reviewed-by: Tvrtko Ursulin Cc: # v5.3+ Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20201106011842.36203-1-daniele.ceraolospurio@intel.com (cherry picked from commit ad18fa0f5f052046cad96fee762b5c64f42dd86a) Signed-off-by: Rodrigo Vivi Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 4ce8626b140e..8073758d1036 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -354,7 +354,8 @@ static void __setup_engine_capabilities(struct intel_engine_cs *engine) * instances. */ if ((INTEL_GEN(i915) >= 11 && - RUNTIME_INFO(i915)->vdbox_sfc_access & engine->mask) || + (RUNTIME_INFO(i915)->vdbox_sfc_access & + BIT(engine->instance))) || (INTEL_GEN(i915) >= 9 && engine->instance == 0)) engine->uabi_capabilities |= I915_VIDEO_AND_ENHANCE_CLASS_CAPABILITY_SFC; From 6958fbd52e79beb0f0b1077e54097215940bef9f Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Sat, 10 Oct 2020 15:14:30 +0000 Subject: [PATCH 145/151] powerpc/603: Always fault when _PAGE_ACCESSED is not set commit 11522448e641e8f1690c9db06e01985e8e19b401 upstream. The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. Fixes: 84de6ab0e904 ("powerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/a44367744de54e2315b2f1a8cbbd7f88488072e0.1602342806.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/head_32.S | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index 5e2f2fd78b94..126ba5438430 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -418,11 +418,7 @@ InstructionTLBMiss: cmplw 0,r1,r3 #endif mfspr r2, SPRN_SPRG_PGDIR -#ifdef CONFIG_SWAP li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC -#else - li r1,_PAGE_PRESENT | _PAGE_EXEC -#endif #if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC) bge- 112f lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */ @@ -484,11 +480,7 @@ DataLoadTLBMiss: lis r1,PAGE_OFFSET@h /* check if kernel address */ cmplw 0,r1,r3 mfspr r2, SPRN_SPRG_PGDIR -#ifdef CONFIG_SWAP li r1, _PAGE_PRESENT | _PAGE_ACCESSED -#else - li r1, _PAGE_PRESENT -#endif bge- 112f lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */ addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */ @@ -564,11 +556,7 @@ DataStoreTLBMiss: lis r1,PAGE_OFFSET@h /* check if kernel address */ cmplw 0,r1,r3 mfspr r2, SPRN_SPRG_PGDIR -#ifdef CONFIG_SWAP li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED -#else - li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT -#endif bge- 112f lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */ addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */ From b74fe3186471b68c41f07c787c60ada75d64c4b3 Mon Sep 17 00:00:00 2001 From: Anand K Mistry Date: Thu, 5 Nov 2020 16:33:04 +1100 Subject: [PATCH 146/151] x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP commit 1978b3a53a74e3230cd46932b149c6e62e832e9a upstream. On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON, STIBP is set to on and spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED At the same time, IBPB can be set to conditional. However, this leads to the case where it's impossible to turn on IBPB for a process because in the PR_SPEC_DISABLE case in ib_prctl_set() the spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED condition leads to a return before the task flag is set. Similarly, ib_prctl_get() will return PR_SPEC_DISABLE even though IBPB is set to conditional. More generally, the following cases are possible: 1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb 2. STIBP = on && IBPB = conditional for AMD CPUs with X86_FEATURE_AMD_STIBP_ALWAYS_ON The first case functions correctly today, but only because spectre_v2_user_ibpb isn't updated to reflect the IBPB mode. At a high level, this change does one thing. If either STIBP or IBPB is set to conditional, allow the prctl to change the task flag. Also, reflect that capability when querying the state. This isn't perfect since it doesn't take into account if only STIBP or IBPB is unconditionally on. But it allows the conditional feature to work as expected, without affecting the unconditional one. [ bp: Massage commit message and comment; space out statements for better readability. ] Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") Signed-off-by: Anand K Mistry Signed-off-by: Borislav Petkov Acked-by: Thomas Gleixner Acked-by: Tom Lendacky Link: https://lkml.kernel.org/r/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/bugs.c | 52 ++++++++++++++++++++++++-------------- 1 file changed, 33 insertions(+), 19 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index acbf3dbb8bf2..bdc1ed7ff669 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1252,6 +1252,14 @@ static int ssb_prctl_set(struct task_struct *task, unsigned long ctrl) return 0; } +static bool is_spec_ib_user_controlled(void) +{ + return spectre_v2_user_ibpb == SPECTRE_V2_USER_PRCTL || + spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP || + spectre_v2_user_stibp == SPECTRE_V2_USER_PRCTL || + spectre_v2_user_stibp == SPECTRE_V2_USER_SECCOMP; +} + static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) { switch (ctrl) { @@ -1259,17 +1267,26 @@ static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE && spectre_v2_user_stibp == SPECTRE_V2_USER_NONE) return 0; - /* - * Indirect branch speculation is always disabled in strict - * mode. It can neither be enabled if it was force-disabled - * by a previous prctl call. + /* + * With strict mode for both IBPB and STIBP, the instruction + * code paths avoid checking this task flag and instead, + * unconditionally run the instruction. However, STIBP and IBPB + * are independent and either can be set to conditionally + * enabled regardless of the mode of the other. + * + * If either is set to conditional, allow the task flag to be + * updated, unless it was force-disabled by a previous prctl + * call. Currently, this is possible on an AMD CPU which has the + * feature X86_FEATURE_AMD_STIBP_ALWAYS_ON. In this case, if the + * kernel is booted with 'spectre_v2_user=seccomp', then + * spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP and + * spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED. */ - if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED || + if (!is_spec_ib_user_controlled() || task_spec_ib_force_disable(task)) return -EPERM; + task_clear_spec_ib_disable(task); task_update_spec_tif(task); break; @@ -1282,10 +1299,10 @@ static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE && spectre_v2_user_stibp == SPECTRE_V2_USER_NONE) return -EPERM; - if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED) + + if (!is_spec_ib_user_controlled()) return 0; + task_set_spec_ib_disable(task); if (ctrl == PR_SPEC_FORCE_DISABLE) task_set_spec_ib_force_disable(task); @@ -1350,20 +1367,17 @@ static int ib_prctl_get(struct task_struct *task) if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE && spectre_v2_user_stibp == SPECTRE_V2_USER_NONE) return PR_SPEC_ENABLE; - else if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED) - return PR_SPEC_DISABLE; - else if (spectre_v2_user_ibpb == SPECTRE_V2_USER_PRCTL || - spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP || - spectre_v2_user_stibp == SPECTRE_V2_USER_PRCTL || - spectre_v2_user_stibp == SPECTRE_V2_USER_SECCOMP) { + else if (is_spec_ib_user_controlled()) { if (task_spec_ib_force_disable(task)) return PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE; if (task_spec_ib_disable(task)) return PR_SPEC_PRCTL | PR_SPEC_DISABLE; return PR_SPEC_PRCTL | PR_SPEC_ENABLE; - } else + } else if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || + spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || + spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED) + return PR_SPEC_DISABLE; + else return PR_SPEC_NOT_AFFECTED; } From c6b1616f5472e86dc1c629383f912921a0b33368 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Fri, 30 Oct 2020 08:24:38 -0300 Subject: [PATCH 147/151] perf scripting python: Avoid declaring function pointers with a visibility attribute commit d0e7b0c71fbb653de90a7163ef46912a96f0bdaf upstream. To avoid this: util/scripting-engines/trace-event-python.c: In function 'python_start_script': util/scripting-engines/trace-event-python.c:1595:2: error: 'visibility' attribute ignored [-Werror=attributes] 1595 | PyMODINIT_FUNC (*initfunc)(void); | ^~~~~~~~~~~~~~ That started breaking when building with PYTHON=python3 and these gcc versions (I haven't checked with the clang ones, maybe it breaks there as well): # export PERF_TARBALL=http://192.168.86.5/perf/perf-5.9.0.tar.xz # dm fedora:33 fedora:rawhide 1 107.80 fedora:33 : Ok gcc (GCC) 10.2.1 20201005 (Red Hat 10.2.1-5), clang version 11.0.0 (Fedora 11.0.0-1.fc33) 2 92.47 fedora:rawhide : Ok gcc (GCC) 10.2.1 20201016 (Red Hat 10.2.1-6), clang version 11.0.0 (Fedora 11.0.0-1.fc34) # Avoid that by ditching that 'initfunc' function pointer with its: #define Py_EXPORTED_SYMBOL _attribute_ ((visibility ("default"))) #define PyMODINIT_FUNC Py_EXPORTED_SYMBOL PyObject* And just call PyImport_AppendInittab() at the end of the ifdef python3 block with the functions that were being attributed to that initfunc. Cc: Adrian Hunter Cc: Ian Rogers Cc: Jiri Olsa Cc: Namhyung Kim Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Tapas Kundu Signed-off-by: Greg Kroah-Hartman --- tools/perf/util/scripting-engines/trace-event-python.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c index 93c03b39cd9c..3b02c3f1b289 100644 --- a/tools/perf/util/scripting-engines/trace-event-python.c +++ b/tools/perf/util/scripting-engines/trace-event-python.c @@ -1587,7 +1587,6 @@ static void _free_command_line(wchar_t **command_line, int num) static int python_start_script(const char *script, int argc, const char **argv) { struct tables *tables = &tables_global; - PyMODINIT_FUNC (*initfunc)(void); #if PY_MAJOR_VERSION < 3 const char **command_line; #else @@ -1602,20 +1601,18 @@ static int python_start_script(const char *script, int argc, const char **argv) FILE *fp; #if PY_MAJOR_VERSION < 3 - initfunc = initperf_trace_context; command_line = malloc((argc + 1) * sizeof(const char *)); command_line[0] = script; for (i = 1; i < argc + 1; i++) command_line[i] = argv[i - 1]; + PyImport_AppendInittab(name, initperf_trace_context); #else - initfunc = PyInit_perf_trace_context; command_line = malloc((argc + 1) * sizeof(wchar_t *)); command_line[0] = Py_DecodeLocale(script, NULL); for (i = 1; i < argc + 1; i++) command_line[i] = Py_DecodeLocale(argv[i - 1], NULL); + PyImport_AppendInittab(name, PyInit_perf_trace_context); #endif - - PyImport_AppendInittab(name, initfunc); Py_Initialize(); #if PY_MAJOR_VERSION < 3 From c5cf5c7b585c7f48195892e44b76237010c0747a Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Wed, 16 Sep 2020 13:53:11 +0200 Subject: [PATCH 148/151] perf/core: Fix race in the perf_mmap_close() function commit f91072ed1b7283b13ca57fcfbece5a3b92726143 upstream. There's a possible race in perf_mmap_close() when checking ring buffer's mmap_count refcount value. The problem is that the mmap_count check is not atomic because we call atomic_dec() and atomic_read() separately. perf_mmap_close: ... atomic_dec(&rb->mmap_count); ... if (atomic_read(&rb->mmap_count)) goto out_put; free_uid out_put: ring_buffer_put(rb); /* could be last */ The race can happen when we have two (or more) events sharing same ring buffer and they go through atomic_dec() and then they both see 0 as refcount value later in atomic_read(). Then both will go on and execute code which is meant to be run just once. The code that detaches ring buffer is probably fine to be executed more than once, but the problem is in calling free_uid(), which will later on demonstrate in related crashes and refcount warnings, like: refcount_t: addition on 0; use-after-free. ... RIP: 0010:refcount_warn_saturate+0x6d/0xf ... Call Trace: prepare_creds+0x190/0x1e0 copy_creds+0x35/0x172 copy_process+0x471/0x1a80 _do_fork+0x83/0x3a0 __do_sys_wait4+0x83/0x90 __do_sys_clone+0x85/0xa0 do_syscall_64+0x5b/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Using atomic decrease and check instead of separated calls. Tested-by: Michael Petlan Signed-off-by: Jiri Olsa Signed-off-by: Ingo Molnar Acked-by: Peter Zijlstra Acked-by: Namhyung Kim Acked-by: Wade Mealing Fixes: 9bb5d40cd93c ("perf: Fix mmap() accounting hole"); Link: https://lore.kernel.org/r/20200916115311.GE2301783@krava [sudip: used ring_buffer] Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman --- kernel/events/core.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 1b60f9c508c9..9f7c2da99299 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5596,11 +5596,11 @@ static void perf_pmu_output_stop(struct perf_event *event); static void perf_mmap_close(struct vm_area_struct *vma) { struct perf_event *event = vma->vm_file->private_data; - struct ring_buffer *rb = ring_buffer_get(event); struct user_struct *mmap_user = rb->mmap_user; int mmap_locked = rb->mmap_locked; unsigned long size = perf_data_size(rb); + bool detach_rest = false; if (event->pmu->event_unmapped) event->pmu->event_unmapped(event, vma->vm_mm); @@ -5631,7 +5631,8 @@ static void perf_mmap_close(struct vm_area_struct *vma) mutex_unlock(&event->mmap_mutex); } - atomic_dec(&rb->mmap_count); + if (atomic_dec_and_test(&rb->mmap_count)) + detach_rest = true; if (!atomic_dec_and_mutex_lock(&event->mmap_count, &event->mmap_mutex)) goto out_put; @@ -5640,7 +5641,7 @@ static void perf_mmap_close(struct vm_area_struct *vma) mutex_unlock(&event->mmap_mutex); /* If there's still other mmap()s of this buffer, we're done. */ - if (atomic_read(&rb->mmap_count)) + if (!detach_rest) goto out_put; /* From ebc24aeb86942d5700fa05cbd288a16dc9bde0ff Mon Sep 17 00:00:00 2001 From: Yunsheng Lin Date: Tue, 3 Nov 2020 11:25:38 +0800 Subject: [PATCH 149/151] net: sch_generic: fix the missing new qdisc assignment bug When commit 2fb541c862c9 ("net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc") is backported to stable kernel, one assignment is missing, which causes two problems reported by Joakim and Vishwanath, see [1] and [2]. So add the assignment back to fix it. 1. https://www.spinics.net/lists/netdev/msg693916.html 2. https://www.spinics.net/lists/netdev/msg695131.html Fixes: 749cc0b0c7f3 ("net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc") Signed-off-by: Yunsheng Lin Acked-by: Jakub Kicinski Tested-by: Brian Norris Signed-off-by: Greg Kroah-Hartman --- net/sched/sch_generic.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 0e275e11f511..6e6147a81bc3 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -1127,10 +1127,13 @@ static void dev_deactivate_queue(struct net_device *dev, void *_qdisc_default) { struct Qdisc *qdisc = rtnl_dereference(dev_queue->qdisc); + struct Qdisc *qdisc_default = _qdisc_default; if (qdisc) { if (!(qdisc->flags & TCQ_F_BUILTIN)) set_bit(__QDISC_STATE_DEACTIVATED, &qdisc->state); + + rcu_assign_pointer(dev_queue->qdisc, qdisc_default); } } From 9fda2e76249833f431b4c6cf842ce85d78272b79 Mon Sep 17 00:00:00 2001 From: Boris Protopopov Date: Thu, 24 Sep 2020 00:36:38 +0000 Subject: [PATCH 150/151] Convert trailing spaces and periods in path components commit 57c176074057531b249cf522d90c22313fa74b0b upstream. When converting trailing spaces and periods in paths, do so for every component of the path, not just the last component. If the conversion is not done for every path component, then subsequent operations in directories with trailing spaces or periods (e.g. create(), mkdir()) will fail with ENOENT. This is because on the server, the directory will have a special symbol in its name, and the client needs to provide the same. Signed-off-by: Boris Protopopov Acked-by: Ronnie Sahlberg Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman --- fs/cifs/cifs_unicode.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/cifs/cifs_unicode.c b/fs/cifs/cifs_unicode.c index 498777d859eb..9bd03a231032 100644 --- a/fs/cifs/cifs_unicode.c +++ b/fs/cifs/cifs_unicode.c @@ -488,7 +488,13 @@ cifsConvertToUTF16(__le16 *target, const char *source, int srclen, else if (map_chars == SFM_MAP_UNI_RSVD) { bool end_of_string; - if (i == srclen - 1) + /** + * Remap spaces and periods found at the end of every + * component of the path. The special cases of '.' and + * '..' do not need to be dealt with explicitly because + * they are addressed in namei.c:link_path_walk(). + **/ + if ((i == srclen - 1) || (source[i+1] == '\\')) end_of_string = true; else end_of_string = false; From 315443293a2d0d7c183ca6dd4624d9e4f8a7054a Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 18 Nov 2020 19:20:34 +0100 Subject: [PATCH 151/151] Linux 5.4.78 Tested-by: Jon Hunter Tested-by: Shuah Khan Tested-by: Linux Kernel Functional Testing Reviewed-by: Guenter Roeck Link: https://lore.kernel.org/r/20201117122121.381905960@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 2e24b568b93f..5725b07aaddf 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 5 PATCHLEVEL = 4 -SUBLEVEL = 77 +SUBLEVEL = 78 EXTRAVERSION = NAME = Kleptomaniac Octopus