Commit graph

966056 commits

Author SHA1 Message Date
Chaitanya Kulkarni 150dfb6c83 nvmet: don't use BLK_MQ_REQ_NOWAIT for passthru
By default, we set the passthru request allocation flag such that it
returns the error in the following code path and we fail the I/O when
BLK_MQ_REQ_NOWAIT is used for request allocation :-

nvme_alloc_request()
 blk_mq_alloc_request()
  blk_mq_queue_enter()
   if (flag & BLK_MQ_REQ_NOWAIT)
        return -EBUSY; <-- return if busy.

On some controllers using BLK_MQ_REQ_NOWAIT ends up in I/O error where
the controller is perfectly healthy and not in a degraded state.

Block layer request allocation does allow us to wait instead of
immediately returning the error when we BLK_MQ_REQ_NOWAIT flag is not
used. This has shown to fix the I/O error problem reported under
heavy random write workload.

Remove the BLK_MQ_REQ_NOWAIT parameter for passthru request allocation
which resolves this issue.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22 15:28:02 +02:00
Logan Gunthorpe 5e063101ff nvmet: cleanup nvmet_passthru_map_sg()
Clean up some confusing elements of nvmet_passthru_map_sg() by returning
early if the request is greater than the maximum bio size. This allows
us to drop the sg_cnt variable.

This should not result in any functional change but makes the code
clearer and more understandable. The original code allocated a truncated
bio then would return EINVAL when bio_add_pc_page() filled that bio. The
new code just returns EINVAL early if this would happen.

Fixes: c1fef73f79 ("nvmet: add passthru code to process commands")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Suggested-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22 15:28:02 +02:00
Logan Gunthorpe df06047d54 nvmet: limit passthru MTDS by BIO_MAX_PAGES
nvmet_passthru_map_sg() only supports mapping a single BIO, not a chain
so the effective maximum transfer should also be limitted by
BIO_MAX_PAGES (presently this works out to 1MB).

For PCI passthru devices the max_sectors would typically be more
limitting than BIO_MAX_PAGES, but this may not be true for all passthru
devices.

Fixes: c1fef73f79 ("nvmet: add passthru code to process commands")
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22 15:27:57 +02:00
zhenwei pi 85bd23f3dc nvmet: fix uninitialized work for zero kato
When connecting a controller with a zero kato value using the following
command line

   nvme connect -t tcp -n NQN -a ADDR -s PORT --keep-alive-tmo=0

the warning below can be reproduced:

WARNING: CPU: 1 PID: 241 at kernel/workqueue.c:1627 __queue_delayed_work+0x6d/0x90
with trace:
  mod_delayed_work_on+0x59/0x90
  nvmet_update_cc+0xee/0x100 [nvmet]
  nvmet_execute_prop_set+0x72/0x80 [nvmet]
  nvmet_tcp_try_recv_pdu+0x2f7/0x770 [nvmet_tcp]
  nvmet_tcp_io_work+0x63f/0xb2d [nvmet_tcp]
  ...

This is caused by queuing up an uninitialized work.  Althrough the
keep-alive timer is disabled during allocating the controller (fixed in
0d3b6a8d21), ka_work still has a chance to run (called by
nvmet_start_ctrl).

Fixes: 0d3b6a8d21 ("nvmet: Disable keep-alive timer when kato is cleared to 0h")
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22 15:27:14 +02:00
Kai-Heng Feng 02ca079c99 nvme-pci: disable Write Zeroes on Sandisk Skyhawk
Like commit 5611ec2b98 ("nvme-pci: prevent SK hynix PC400 from using
Write Zeroes command"), Sandisk Skyhawk has the same issue:
[ 6305.633887] blk_update_request: operation not supported error, dev nvme0n1, sector 340812032 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0

So also disable Write Zeroes command on Sandisk Skyhawk.

BugLink: https://bugs.launchpad.net/bugs/1899503
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22 15:27:14 +02:00
Keith Busch 643c476d6f nvme: use queuedata for nvme_req_qid
The request's rq_disk isn't set for passthrough IO commands, so tracing
uses qid 0 for these which incorrectly decodes as an admin command. Use
the request_queue's queuedata instead since that value is always set for
the IO queues, and never set for the admin queue.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22 15:27:14 +02:00
Chao Leng a87da50f39 nvme-rdma: fix crash due to incorrect cqe
A crash happened due to injecting error test.
When a CQE has incorrect command id due do an error injection, the host
may find a request which is already freed.  Dereferencing req->mr->rkey
causes a crash in nvme_rdma_process_nvme_rsp because the mr is already
freed.

Add a check for the mr to fix it.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22 15:27:14 +02:00
Chao Leng 43efdb8e87 nvme-rdma: fix crash when connect rejected
A crash can happened when a connect is rejected.   The host establishes
the connection after received ConnectReply, and then continues to send
the fabrics Connect command.  If the controller does not receive the
ReadyToUse capsule, host may receive a ConnectReject reply.

Call nvme_rdma_destroy_queue_ib after the host received the
RDMA_CM_EVENT_REJECTED event.  Then when the fabrics Connect command
times out, nvme_rdma_timeout calls nvme_rdma_complete_rq to fail the
request.  A crash happenes due to use after free in
nvme_rdma_complete_rq.

nvme_rdma_destroy_queue_ib is redundant when handling the
RDMA_CM_EVENT_REJECTED event as nvme_rdma_destroy_queue_ib is already
called in connection failure handler.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22 15:27:13 +02:00
Pablo Neira Ayuso c77761c8a5 netfilter: nf_fwd_netdev: clear timestamp in forwarding path
Similar to 7980d2eabd ("ipvs: clear skb->tstamp in forwarding path").
fq qdisc requires tstamp to be cleared in forwarding path.

Fixes: 8203e2d844 ("net: clear skb->tstamp in forwarding paths")
Fixes: fb420d5d91 ("tcp/fq: move back to CLOCK_MONOTONIC")
Fixes: 80b14dee2b ("net: Add a new socket option for a future transmit time.")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-22 14:49:36 +02:00
Hui Wang 033e4040d4 ALSA: hda - Fix the return value if cb func is already registered
If the cb function is already registered, should return the pointer
of the structure hda_jack_callback which contains this cb func, but
instead it returns the NULL.

Now fix it by replacing func_is_already_in_callback_list() with
find_callback_from_list().

Fixes: f4794c6064 ("ALSA: hda - Don't register a cb func if it is registered already")
Reported-and-suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Link: https://lore.kernel.org/r/20201022030221.22393-1-hui.wang@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-10-22 08:17:02 +02:00
Lijun Pan d9b0e599b2 ibmvnic: save changed mac address to adapter->mac_addr
After mac address change request completes successfully, the new mac
address need to be saved to adapter->mac_addr as well as
netdev->dev_addr. Otherwise, adapter->mac_addr still holds old
data.

Fixes: 62740e9788 ("net/ibmvnic: Update MAC address settings after adapter reset")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20201020223919.46106-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-21 20:54:02 -07:00
Matthieu Baerts 287d354059 selftests: mptcp: depends on built-in IPv6
Recently, CONFIG_MPTCP_IPV6 no longer selects CONFIG_IPV6. As a
consequence, if CONFIG_MPTCP_IPV6=y is added to the kconfig, it will no
longer ensure CONFIG_IPV6=y. If it is not enabled, CONFIG_MPTCP_IPV6
will stay disabled and selftests will fail.

We also need CONFIG_IPV6 to be built-in. For more details, please see
commit 0ed37ac586 ("mptcp: depends on IPV6 but not as a module").

Note that 'make kselftest-merge' will take all 'config' files found in
'tools/testsing/selftests'. Because some of them already set
CONFIG_IPV6=y, MPTCP selftests were still passing. But they will fail if
MPTCP selftests are launched manually after having executed this command
to prepare the kernel config:

  ./scripts/kconfig/merge_config.sh -m .config \
      ./tools/testing/selftests/net/mptcp/config

Fixes: 010b430d5d ("mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/20201021155549.933731-1-matthieu.baerts@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-21 20:39:58 -07:00
Ganesh Goudar 4ff753feab powerpc/pseries: Avoid using addr_to_pfn in real mode
When an UE or memory error exception is encountered the MCE handler
tries to find the pfn using addr_to_pfn() which takes effective
address as an argument, later pfn is used to poison the page where
memory error occurred, recent rework in this area made addr_to_pfn
to run in real mode, which can be fatal as it may try to access
memory outside RMO region.

Have two helper functions to separate things to be done in real mode
and virtual mode without changing any functionality. This also fixes
the following error as the use of addr_to_pfn is now moved to virtual
mode.

Without this change following kernel crash is seen on hitting UE.

[  485.128036] Oops: Kernel access of bad area, sig: 11 [#1]
[  485.128040] LE SMP NR_CPUS=2048 NUMA pSeries
[  485.128047] Modules linked in:
[  485.128067] CPU: 15 PID: 6536 Comm: insmod Kdump: loaded Tainted: G OE 5.7.0 #22
[  485.128074] NIP:  c00000000009b24c LR: c0000000000398d8 CTR: c000000000cd57c0
[  485.128078] REGS: c000000003f1f970 TRAP: 0300   Tainted: G OE (5.7.0)
[  485.128082] MSR:  8000000000001003 <SF,ME,RI,LE>  CR: 28008284  XER: 00000001
[  485.128088] CFAR: c00000000009b190 DAR: c0000001fab00000 DSISR: 40000000 IRQMASK: 1
[  485.128088] GPR00: 0000000000000001 c000000003f1fbf0 c000000001634300 0000b0fa01000000
[  485.128088] GPR04: d000000002220000 0000000000000000 00000000fab00000 0000000000000022
[  485.128088] GPR08: c0000001fab00000 0000000000000000 c0000001fab00000 c000000003f1fc14
[  485.128088] GPR12: 0000000000000008 c000000003ff5880 d000000002100008 0000000000000000
[  485.128088] GPR16: 000000000000ff20 000000000000fff1 000000000000fff2 d0000000021a1100
[  485.128088] GPR20: d000000002200000 c00000015c893c50 c000000000d49b28 c00000015c893c50
[  485.128088] GPR24: d0000000021a0d08 c0000000014e5da8 d0000000021a0818 000000000000000a
[  485.128088] GPR28: 0000000000000008 000000000000000a c0000000017e2970 000000000000000a
[  485.128125] NIP [c00000000009b24c] __find_linux_pte+0x11c/0x310
[  485.128130] LR [c0000000000398d8] addr_to_pfn+0x138/0x170
[  485.128133] Call Trace:
[  485.128135] Instruction dump:
[  485.128138] 3929ffff 7d4a3378 7c883c36 7d2907b4 794a1564 7d294038 794af082 3900ffff
[  485.128144] 79291f24 790af00e 78e70020 7d095214 <7c69502a> 2fa30000 419e011c 70690040
[  485.128152] ---[ end trace d34b27e29ae0e340 ]---

Fixes: 9ca766f989 ("powerpc/64s/pseries: machine check convert to use common event code")
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724063946.21378-1-ganeshgr@linux.ibm.com
2020-10-22 14:34:45 +11:00
Michael S. Tsirkin cf8691cbc2 Revert "virtio-net: ethtool configurable RXCSUM"
This reverts commit 3618ad2a7c.

When control vq is not negotiated, that commit causes a crash:

[   72.229171] kernel BUG at drivers/net/virtio_net.c:1667!
[   72.230266] invalid opcode: 0000 [#1] PREEMPT SMP
[   72.231172] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc8-02934-g3618ad2a7c0e7 #1
[   72.231172] EIP: virtnet_send_command+0x120/0x140
[   72.231172] Code: 00 0f 94 c0 8b 7d f0 65 33 3d 14 00 00 00 75 1c 8d 65 f4 5b 5e 5f 5d c3 66 90 be 01 00 00 00 e9 6e ff ff ff 8d b6 00
+00 00 00 <0f> 0b e8 d9 bb 82 00 eb 17 8d b4 26 00 00 00 00 8d b4 26 00 00 00
[   72.231172] EAX: 0000000d EBX: f72895c0 ECX: 00000017 EDX: 00000011
[   72.231172] ESI: f7197800 EDI: ed69bd00 EBP: ed69bcf4 ESP: ed69bc98
[   72.231172] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246
[   72.231172] CR0: 80050033 CR2: 00000000 CR3: 02c84000 CR4: 000406f0
[   72.231172] Call Trace:
[   72.231172]  ? __virt_addr_valid+0x45/0x60
[   72.231172]  ? ___cache_free+0x51f/0x760
[   72.231172]  ? kobject_uevent_env+0xf4/0x560
[   72.231172]  virtnet_set_guest_offloads+0x4d/0x80
[   72.231172]  virtnet_set_features+0x85/0x120
[   72.231172]  ? virtnet_set_guest_offloads+0x80/0x80
[   72.231172]  __netdev_update_features+0x27a/0x8e0
[   72.231172]  ? kobject_uevent+0xa/0x20
[   72.231172]  ? netdev_register_kobject+0x12c/0x160
[   72.231172]  register_netdevice+0x4fe/0x740
[   72.231172]  register_netdev+0x1c/0x40
[   72.231172]  virtnet_probe+0x728/0xb60
[   72.231172]  ? _raw_spin_unlock+0x1d/0x40
[   72.231172]  ? virtio_vdpa_get_status+0x1c/0x20
[   72.231172]  virtio_dev_probe+0x1c6/0x271
[   72.231172]  really_probe+0x195/0x2e0
[   72.231172]  driver_probe_device+0x26/0x60
[   72.231172]  device_driver_attach+0x49/0x60
[   72.231172]  __driver_attach+0x46/0xc0
[   72.231172]  ? device_driver_attach+0x60/0x60
[   72.231172]  bus_add_driver+0x197/0x1c0
[   72.231172]  driver_register+0x66/0xc0
[   72.231172]  register_virtio_driver+0x1b/0x40
[   72.231172]  virtio_net_driver_init+0x61/0x86
[   72.231172]  ? veth_init+0x14/0x14
[   72.231172]  do_one_initcall+0x76/0x2e4
[   72.231172]  ? rdinit_setup+0x2a/0x2a
[   72.231172]  do_initcalls+0xb2/0xd5
[   72.231172]  kernel_init_freeable+0x14f/0x179
[   72.231172]  ? rest_init+0x100/0x100
[   72.231172]  kernel_init+0xd/0xe0
[   72.231172]  ret_from_fork+0x1c/0x30
[   72.231172] Modules linked in:
[   72.269563] ---[ end trace a6ebc4afea0e6cb1 ]---

The reason is that virtnet_set_features now calls virtnet_set_guest_offloads
unconditionally, it used to only call it when there is something
to configure.

If device does not have a control vq, everything breaks.

Revert the original commit for now.

Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Fixes: 3618ad2a7c ("virtio-net: ethtool configurable RXCSUM")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20201021142944.13615-1-mst@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-21 20:33:19 -07:00
Christophe Leroy 592bbe9c50 powerpc/uaccess: Don't use "m<>" constraint with GCC 4.9
GCC 4.9 sometimes fails to build with "m<>" constraint in
inline assembly.

  CC      lib/iov_iter.o
In file included from ./arch/powerpc/include/asm/cmpxchg.h:6:0,
                 from ./arch/powerpc/include/asm/atomic.h:11,
                 from ./include/linux/atomic.h:7,
                 from ./include/linux/crypto.h:15,
                 from ./include/crypto/hash.h:11,
                 from lib/iov_iter.c:2:
lib/iov_iter.c: In function 'iovec_from_user.part.30':
./arch/powerpc/include/asm/uaccess.h:287:2: error: 'asm' operand has impossible constraints
  __asm__ __volatile__(    \
  ^
./include/linux/compiler.h:78:42: note: in definition of macro 'unlikely'
 # define unlikely(x) __builtin_expect(!!(x), 0)
                                          ^
./arch/powerpc/include/asm/uaccess.h:583:34: note: in expansion of macro 'unsafe_op_wrap'
 #define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e)
                                  ^
./arch/powerpc/include/asm/uaccess.h:329:10: note: in expansion of macro '__get_user_asm'
  case 4: __get_user_asm(x, (u32 __user *)ptr, retval, "lwz"); break; \
          ^
./arch/powerpc/include/asm/uaccess.h:363:3: note: in expansion of macro '__get_user_size_allowed'
   __get_user_size_allowed(__gu_val, __gu_addr, __gu_size, __gu_err); \
   ^
./arch/powerpc/include/asm/uaccess.h💯2: note: in expansion of macro '__get_user_nocheck'
  __get_user_nocheck((x), (ptr), sizeof(*(ptr)), false)
  ^
./arch/powerpc/include/asm/uaccess.h:583:49: note: in expansion of macro '__get_user_allowed'
 #define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e)
                                                 ^
lib/iov_iter.c:1663:3: note: in expansion of macro 'unsafe_get_user'
   unsafe_get_user(len, &uiov[i].iov_len, uaccess_end);
   ^
make[1]: *** [scripts/Makefile.build:283: lib/iov_iter.o] Error 1

Define a UPD_CONSTR macro that is "<>" by default and
only "" with GCC prior to GCC 5.

Fixes: fcf1f26895 ("powerpc/uaccess: Add pre-update addressing to __put_user_asm_goto()")
Fixes: 2f279eeb68 ("powerpc/uaccess: Add pre-update addressing to __get_user_asm() and __put_user_asm()")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/212d3bc4a52ca71523759517bb9c61f7e477c46a.1603179582.git.christophe.leroy@csgroup.eu
2020-10-22 14:26:09 +11:00
Luo Meng 1322181170 ext4: fix invalid inode checksum
During the stability test, there are some errors:
  ext4_lookup:1590: inode #6967: comm fsstress: iget: checksum invalid.

If the inode->i_iblocks too big and doesn't set huge file flag, checksum
will not be recalculated when update the inode information to it's buffer.
If other inode marks the buffer dirty, then the inconsistent inode will
be flushed to disk.

Fix this problem by checking i_blocks in advance.

Cc: stable@kernel.org
Signed-off-by: Luo Meng <luomeng12@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20201020013631.3796673-1-luomeng12@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:22:38 -04:00
Harshad Shirwadkar ce8c59d197 ext4: add fast commit stats in procfs
This commit adds a file in procfs that tracks fast commit related
statistics.

root@kvm-xfstests:/mnt# cat /proc/fs/ext4/vdc/fc_info
fc stats:
7772 commits
15 ineligible
4083 numblks
2242us avg_commit_time
Ineligible reasons:
"Extended attributes changed":  0
"Cross rename": 0
"Journal flag changed": 0
"Insufficient memory":  0
"Swap boot":    0
"Resize":       0
"Dir renamed":  0
"Falloc range op":      0
"FC Commit Failed":     15

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-10-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:22:38 -04:00
Harshad Shirwadkar 0f0672ffb6 ext4: add a mount opt to forcefully turn fast commits on
This is a debug only mount option that forcefully turns fast commits
on at mount time.

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-9-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:22:38 -04:00
Harshad Shirwadkar 8016e29f43 ext4: fast commit recovery path
This patch adds fast commit recovery path support for Ext4 file
system. We add several helper functions that are similar in spirit to
e2fsprogs journal recovery path handlers. Example of such functions
include - a simple block allocator, idempotent block bitmap update
function etc. Using these routines and the fast commit log in the fast
commit area, the recovery path (ext4_fc_replay()) performs fast commit
log recovery.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-8-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:22:38 -04:00
Harshad Shirwadkar 5b849b5f96 jbd2: fast commit recovery path
This patch adds fast commit recovery support in JBD2.

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-7-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:22:37 -04:00
Harshad Shirwadkar aa75f4d3da ext4: main fast-commit commit path
This patch adds main fast commit commit path handlers. The overall
patch can be divided into two inter-related parts:

(A) Metadata updates tracking

    This part consists of helper functions to track changes that need
    to be committed during a commit operation. These updates are
    maintained by Ext4 in different in-memory queues. Following are
    the APIs and their short description that are implemented in this
    patch:

    - ext4_fc_track_link/unlink/creat() - Track unlink. link and creat
      operations
    - ext4_fc_track_range() - Track changed logical block offsets
      inodes
    - ext4_fc_track_inode() - Track inodes
    - ext4_fc_mark_ineligible() - Mark file system fast commit
      ineligible()
    - ext4_fc_start_update() / ext4_fc_stop_update() /
      ext4_fc_start_ineligible() / ext4_fc_stop_ineligible() These
      functions are useful for co-ordinating inode updates with
      commits.

(B) Main commit Path

    This part consists of functions to convert updates tracked in
    in-memory data structures into on-disk commits. Function
    ext4_fc_commit() is the main entry point to commit path.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-6-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:22:37 -04:00
Harshad Shirwadkar ff780b91ef jbd2: add fast commit machinery
This functions adds necessary APIs needed in JBD2 layer for fast
commits.

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-5-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:22:37 -04:00
Harshad Shirwadkar 6866d7b3f2 ext4 / jbd2: add fast commit initialization
This patch adds fast commit area trackers in the journal_t
structure. These are initialized via the jbd2_fc_init() routine that
this patch adds. This patch also adds ext4/fast_commit.c and
ext4/fast_commit.h files for fast commit code that will be added in
subsequent patches in this series.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-4-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:22:26 -04:00
Harshad Shirwadkar 995a3ed67f ext4: add fast_commit feature and handling for extended mount options
We are running out of mount option bits. Add handling for using
s_mount_opt2. Add ext4 and jbd2 fast commit feature flag and also add
ability to turn off the fast commit feature in Ext4.

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-3-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:22:26 -04:00
Harshad Shirwadkar f5b8b297b0 doc: update ext4 and journalling docs to include fast commit feature
This patch adds necessary documentation for fast commits.

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-2-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:19:43 -04:00
Likun Gao 687e79c0fe drm/amdgpu: correct the cu and rb info for sienna cichlid
Skip disabled sa to correct the cu_info and active_rbs for sienna cichlid.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.9.x
2020-10-21 23:06:23 -04:00
Kenneth Feng 0435d77cd9 drm/amd/pm: remove the average clock value in sysfs
if it's fine-grained clock dpm, remove the average clock value and
reflects the real clock.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-10-21 23:06:23 -04:00
Kenneth Feng 392d256fa2 drm/amd/pm: fix pp_dpm_fclk
fclk value is missing in pp_dpm_fclk. add this to correctly show the current value.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.9.x
2020-10-21 23:06:23 -04:00
John Clements e4eeceb73c Revert drm/amdgpu: disable sienna chichlid UMC RAS
This reverts commit 265c280a48.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-10-21 23:06:23 -04:00
Likun Gao 9a2f408f54 drm/amd/pm: fix pcie information for sienna cichlid
Fix the function used for sienna cichlid to get correct PCIE information
by pp_dpm_pcie.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.9.x
2020-10-21 23:06:23 -04:00
Jay Cornwall d56b1980d7 drm/amdkfd: Use same SQ prefetch setting as amdgpu
0 causes instruction fetch stall at cache line boundary under some
conditions on Navi10. A non-zero prefetch is the preferred default
in any case.

Fixes soft hang in Luxmark.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-10-21 23:06:07 -04:00
Di Zhu ebfe3c5183 rtnetlink: fix data overflow in rtnl_calcit()
"ip addr show" command execute error when we have a physical
network card with a large number of VFs

The return value of if_nlmsg_size() in rtnl_calcit() will exceed
range of u16 data type when any network cards has a larger number of
VFs. rtnl_vfinfo_size() will significant increase needed dump size when
the value of num_vfs is larger.

Eventually we get a wrong value of min_ifinfo_dump_size because of overflow
which decides the memory size needed by netlink dump and netlink_dump()
will return -EMSGSIZE because of not enough memory was allocated.

So fix it by promoting  min_dump_alloc data type to u32 to
avoid whole netlink message size overflow and it's also align
with the data type of struct netlink_callback{}.min_dump_alloc
which is assigned by return value of rtnl_calcit()

Signed-off-by: Di Zhu <zhudi21@huawei.com>
Link: https://lore.kernel.org/r/20201021020053.1401-1-zhudi21@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-21 18:24:08 -07:00
Bartosz Golaszewski 5403caf216 net: ethernet: mtk-star-emac: select REGMAP_MMIO
The driver depends on mmio regmap API but doesn't select the appropriate
Kconfig option. This fixes it.

Fixes: 8c7bd5a454 ("net: ethernet: mtk-star-emac: new driver")
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Link: https://lore.kernel.org/r/20201020073515.22769-1-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-21 18:11:17 -07:00
Xie He 5fce1e43e2 net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup
This driver calls ether_setup to set up the network device.
The ether_setup function would add the IFF_TX_SKB_SHARING flag to the
device. This flag indicates that it is safe to transmit shared skbs to
the device.

However, this is not true. This driver may pad the frame (in eth_tx)
before transmission, so the skb may be modified.

Fixes: 550fd08c2c ("net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared")
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Link: https://lore.kernel.org/r/20201020063420.187497-1-xie.he.0141@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-21 17:58:26 -07:00
Xie He 01c4ceae0a net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device
The hdlc_rcv function is used as hdlc_packet_type.func to process any
skb received in the kernel with skb->protocol == htons(ETH_P_HDLC).
The purpose of this function is to provide second-stage processing for
skbs not assigned a "real" L3 skb->protocol value in the first stage.

This function assumes the device from which the skb is received is an
HDLC device (a device created by this module). It assumes that
netdev_priv(dev) returns a pointer to "struct hdlc_device".

However, it is possible that some driver in the kernel (not necessarily
in our control) submits a received skb with skb->protocol ==
htons(ETH_P_HDLC), from a non-HDLC device. In this case, the skb would
still be received by hdlc_rcv. This will cause problems.

hdlc_rcv should be able to recognize and drop invalid skbs. It should
first make sure "dev" is actually an HDLC device, before starting its
processing. This patch adds this check to hdlc_rcv.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Link: https://lore.kernel.org/r/20201020013152.89259-1-xie.he.0141@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-21 17:20:19 -07:00
Daniel Borkmann 3652c9a1b1 bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static
Yaniv reported a compilation error after pulling latest libbpf:

  [...]
  ../libbpf/src/root/usr/include/bpf/bpf_helpers.h:99:10: error:
  unknown register name 'r0' in asm
                     : "r0", "r1", "r2", "r3", "r4", "r5");
  [...]

The issue got triggered given Yaniv was compiling tracing programs with native
target (e.g. x86) instead of BPF target, hence no BTF generated vmlinux.h nor
CO-RE used, and later llc with -march=bpf was invoked to compile from LLVM IR
to BPF object file. Given that clang was expecting x86 inline asm and not BPF
one the error complained that these regs don't exist on the former.

Guard bpf_tail_call_static() with defined(__bpf__) where BPF inline asm is valid
to use. BPF tracing programs on more modern kernels use BPF target anyway and
thus the bpf_tail_call_static() function will be available for them. BPF inline
asm is supported since clang 7 (clang <= 6 otherwise throws same above error),
and __bpf_unreachable() since clang 8, therefore include the latter condition
in order to prevent compilation errors for older clang versions. Given even an
old Ubuntu 18.04 LTS has official LLVM packages all the way up to llvm-10, I did
not bother to special case the __bpf_unreachable() inside bpf_tail_call_static()
further.

Also, undo the sockex3_kern's use of bpf_tail_call_static() sample given they
still have the old hacky way to even compile networking progs with native instead
of BPF target so bpf_tail_call_static() won't be defined there anymore.

Fixes: 0e9f6841f6 ("bpf, libbpf: Add bpf_tail_call_static helper for bpf programs")
Reported-by: Yaniv Agman <yanivagman@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Tested-by: Yaniv Agman <yanivagman@gmail.com>
Link: https://lore.kernel.org/bpf/CAMy7=ZUk08w5Gc2Z-EKi4JFtuUCaZYmE4yzhJjrExXpYKR4L8w@mail.gmail.com
Link: https://lore.kernel.org/bpf/20201021203257.26223-1-daniel@iogearbox.net
2020-10-22 01:46:52 +02:00
Oliver O'Halloran 99f6e9795a powerpc/eeh: Fix eeh_dev_check_failure() for PE#0
In commit 269e583357 ("powerpc/eeh: Delete eeh_pe->config_addr") the
following simplification was made:

-       if (!pe->addr && !pe->config_addr) {
+       if (!pe->addr) {
                eeh_stats.no_cfg_addr++;
                return 0;
        }

This introduced a bug which causes EEH checking to be skipped for
devices in PE#0.

Before the change above the check would always pass since at least one
of the two PE addresses would be non-zero in all circumstances. On
PowerNV pe->config_addr would be the BDFN of the first device added to
the PE. The zero BDFN is reserved for the PHB's root port, but this is
fine since for obscure platform reasons the root port is never
assigned to PE#0.

Similarly, on pseries pe->addr has always been non-zero for the
reasons outlined in commit 42de19d5ef ("powerpc/pseries/eeh: Allow
zero to be a valid PE configuration address").

We can fix the problem by deleting the block entirely The original
purpose of this test was to avoid performing EEH checks on devices
that were not on an EEH capable bus. In modern Linux the edev->pe
pointer will be NULL for devices that are not on an EEH capable bus.
The code block immediately above this one already checks for the
edev->pe == NULL case so this test (new and old) is entirely
redundant.

Ideally we'd delete eeh_stats.no_cfg_addr too since nothing increments
it any more. Unfortunately, that information is exposed via
/proc/powerpc/eeh which means it's technically ABI. We could make it
hard-coded, but that's a change for another patch.

Fixes: 269e583357 ("powerpc/eeh: Delete eeh_pe->config_addr")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201021232554.1434687-1-oohall@gmail.com
2020-10-22 10:38:53 +11:00
Toke Høiland-Jørgensen adfd272c4c bpf, selftests: Extend test_tc_redirect to use modified bpf_redirect_neigh()
This updates the test_tc_neigh prog in selftests to use the new syntax of
bpf_redirect_neigh(). To exercise the helper both with and without the
optional parameter, add an additional test_tc_neigh_fib test program, which
does a bpf_fib_lookup() followed by a call to bpf_redirect_neigh() instead
of looking up the ifindex in a map.

Update the test_tc_redirect.sh script to run both versions of the test,
and while we're add it, fix it to work on systems that have a consolidated
dual-stack 'ping' binary instead of separate ping/ping6 versions.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/160322915724.32199.17530068594636950447.stgit@toke.dk
2020-10-22 01:30:00 +02:00
Tetsuhiro Kohada eae503f7eb exfat: remove useless check in exfat_move_file()
In exfat_move_file(), the identity of source and target directory has been
checked by the caller.
Also, it gets stream.start_clu from file dir-entry, which is an invalid
determination.

Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com>
Acked-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-10-22 08:29:12 +09:00
Tetsuhiro Kohada 04cee52fb8 exfat: remove 'rwoffset' in exfat_inode_info
Remove 'rwoffset' in exfat_inode_info and replace it with the parameter of
exfat_readdir().
Since rwoffset is referenced only by exfat_readdir(), it is not necessary
a exfat_inode_info's member.
Also, change cpos to point to the next of entry-set, and return the index
of dir-entry via dir_entry->entry.

Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com>
Acked-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-10-22 08:29:11 +09:00
Tetsuhiro Kohada a7a241686c exfat: replace memcpy with structure assignment
Use structure assignment instead of memcpy.

Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com>
Acked-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-10-22 08:29:11 +09:00
Tetsuhiro Kohada 6c958a0955 exfat: remove useless directory scan in exfat_add_entry()
There is nothing in directory just created, so there is no need to scan.

Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com>
Acked-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-10-22 08:29:11 +09:00
Tetsuhiro Kohada 188df41f21 exfat: eliminate dead code in exfat_find()
The exfat_find_dir_entry() called by exfat_find() doesn't return -EEXIST.
Therefore, the root-dir information setting is never executed.

Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com>
Acked-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-10-22 08:29:11 +09:00
Xianting Tian 45882a6a0d exfat: use i_blocksize() to get blocksize
We alreday has the interface i_blocksize() to get blocksize,
so use it.

Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-10-22 08:29:10 +09:00
Namjae Jeon 9e456aeaac exfat: fix misspellings using codespell tool
Sedat reported typos using codespell tool.

 $ codespell fs/exfat/*.c | grep -v iput
 fs/exfat/namei.c:293: upto ==> up to
 fs/exfat/nls.c:14: tabel ==> table

 $ codespell fs/exfat/*.h
 fs/exfat/exfat_fs.h:133: usally ==> usually

Fix typos found by codespell.

Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-10-22 08:29:10 +09:00
Toke Høiland-Jørgensen ba452c9e99 bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop
Based on the discussion in [0], update the bpf_redirect_neigh() helper to
accept an optional parameter specifying the nexthop information. This makes
it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without
incurring a duplicate FIB lookup - since the FIB lookup helper will return
the nexthop information even if no neighbour is present, this can simply
be passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns
BPF_FIB_LKUP_RET_NO_NEIGH. Thus fix & extend it before helper API is frozen.

  [0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/bpf/160322915615.32199.1187570224032024535.stgit@toke.dk
2020-10-22 01:28:54 +02:00
Darrick J. Wong 2e76f188fd xfs: cancel intents immediately if process_intents fails
If processing recovered log intent items fails, we need to cancel all
the unprocessed recovered items immediately so that a subsequent AIL
push in the bail out path won't get wedged on the pinned intent items
that didn't get processed.

This can happen if the log contains (1) an intent that gets and releases
an inode, (2) an intent that cannot be recovered successfully, and (3)
some third intent item.  When recovery of (2) fails, we leave (3) pinned
in memory.  Inode reclamation is called in the error-out path of
xfs_mountfs before xfs_log_cancel_mount.  Reclamation calls
xfs_ail_push_all_sync, which gets stuck waiting for (3).

Therefore, call xlog_recover_cancel_intents if _process_intents fails.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-10-21 16:28:46 -07:00
Steve French 3c6e65e679 smb3: do not try to cache root directory if dir leases not supported
To servers which do not support directory leases (e.g. Samba)
it is wasteful to try to open_shroot (ie attempt to cache the
root directory handle).  Skip attempt to open_shroot when
server does not indicate support for directory leases.

Cuts the number of requests on mount from 17 to 15, and
cuts the number of requests on stat of the root directory
from 4 to 3.

Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org> # v5.1+
2020-10-21 17:57:07 -05:00
Steve French 3c3317daef smb3: fix stat when special device file and mounted with modefromsid
When mounting with modefromsid mount option, it was possible to
get the error on stat of a fifo or char or block device:
        "cannot stat <filename>: Operation not supported"

Special devices can be stored as reparse points by some servers
(e.g. Windows NFS server and when using the SMB3.1.1 POSIX
Extensions) but when the modefromsid mount option is used
the client attempts to get the ACL for the file which requires
opening with OPEN_REPARSE_POINT create option.

Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
2020-10-21 17:57:02 -05:00
Samuel Cabrero def6e1dc17 cifs: Print the address and port we are connecting to in generic_ip_connect()
Can be helpful in debugging mount and reconnect issues

Signed-off-by: Samuel Cabrero <scabrero@suse.de>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-10-21 17:57:02 -05:00