1
0
Fork 0
Commit Graph

872737 Commits (221ec5c0a46c1a1740f34fb36fc661a5284d01b0)

Author SHA1 Message Date
Eran Ben Elisha e19868efea net/mlx4_core: Dynamically set guaranteed amount of counters per VF
Prior to this patch, the amount of counters guaranteed per VF in the
resource tracker was MLX4_VF_COUNTERS_PER_PORT * MLX4_MAX_PORTS. It was
set regardless if the VF was single or dual port.
This caused several VFs to have no guaranteed counters although the
system could satisfy their request.

The fix is to dynamically guarantee counters, based on each VF
specification.

Fixes: 9de92c60be ("net/mlx4_core: Adjust counter grant policy in the resource tracker")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29 16:29:43 -07:00
Aya Levin 926b37f76f net/mlx5e: Initialize on stack link modes bitmap
Initialize link modes bitmap on stack before using it, otherwise the
outcome of ethtool set link ksettings might have unexpected values.

Fixes: 4b95840a6c ("net/mlx5e: Fix matching of speed to PRM link modes")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29 16:27:20 -07:00
Aya Levin 534e7366f4 net/mlx5e: Fix ethtool self test: link speed
Ethtool self test contains a test for link speed. This test reads the
PTYS register and determines whether the current speed is valid or not.
Change current implementation to use the function mlx5e_port_linkspeed()
that does the same check and fails when speed is invalid. This code
redundancy lead to a bug when mlx5e_port_linkspeed() was updated with
expended speeds and the self test was not.

Fixes: 2c81bfd5ae ("net/mlx5e: Move port speed code from en_ethtool.c to en/port.c")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29 16:27:20 -07:00
Maxim Mikityanskiy 9df86bdb67 net/mlx5e: Fix handling of compressed CQEs in case of low NAPI budget
When CQE compression is enabled, compressed CQEs use the following
structure: a title is followed by one or many blocks, each containing 8
mini CQEs (except the last, which may contain fewer mini CQEs).

Due to NAPI budget restriction, a complete structure is not always
parsed in one NAPI run, and some blocks with mini CQEs may be deferred
to the next NAPI poll call - we have the mlx5e_decompress_cqes_cont call
in the beginning of mlx5e_poll_rx_cq. However, if the budget is
extremely low, some blocks may be left even after that, but the code
that follows the mlx5e_decompress_cqes_cont call doesn't check it and
assumes that a new CQE begins, which may not be the case. In such cases,
random memory corruptions occur.

An extremely low NAPI budget of 8 is used when busy_poll or busy_read is
active.

This commit adds a check to make sure that the previous compressed CQE
has been completely parsed after mlx5e_decompress_cqes_cont, otherwise
it prevents a new CQE from being fetched in the middle of a compressed
CQE.

This commit fixes random crashes in __build_skb, __page_pool_put_page
and other not-related-directly places, that used to happen when both CQE
compression and busy_poll/busy_read were enabled.

Fixes: 7219ab34f1 ("net/mlx5e: CQE compression")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29 16:27:19 -07:00
Vlad Buslov 2a4b652623 net/mlx5e: Don't store direct pointer to action's tunnel info
Geneve implementation changed mlx5 tc to user direct pointer to tunnel_key
action's internal struct ip_tunnel_info instance. However, this leads to
use-after-free error when initial filter that caused creation of new encap
entry is deleted or when tunnel_key action is manually overwritten through
action API. Moreover, with recent TC offloads API unlocking change struct
flow_action_entry->tunnel point to temporal copy of tunnel info that is
deallocated after filter is offloaded to hardware which causes bug to
reproduce every time new filter is attached to existing encap entry with
following KASAN bug:

[  314.885555] ==================================================================
[  314.886641] BUG: KASAN: use-after-free in memcmp+0x2c/0x60
[  314.886864] Read of size 1 at addr ffff88886c746280 by task tc/2682

[  314.887179] CPU: 22 PID: 2682 Comm: tc Not tainted 5.3.0-rc7+ #703
[  314.887188] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[  314.887195] Call Trace:
[  314.887215]  dump_stack+0x9a/0xf0
[  314.887236]  print_address_description+0x67/0x323
[  314.887248]  ? memcmp+0x2c/0x60
[  314.887257]  ? memcmp+0x2c/0x60
[  314.887272]  __kasan_report.cold+0x1a/0x3d
[  314.887474]  ? __mlx5e_tc_del_fdb_peer_flow+0x100/0x1b0 [mlx5_core]
[  314.887484]  ? memcmp+0x2c/0x60
[  314.887509]  kasan_report+0xe/0x12
[  314.887521]  memcmp+0x2c/0x60
[  314.887662]  mlx5e_tc_add_fdb_flow+0x51b/0xbe0 [mlx5_core]
[  314.887838]  ? mlx5e_encap_take+0x110/0x110 [mlx5_core]
[  314.887902]  ? lockdep_init_map+0x87/0x2c0
[  314.887924]  ? __init_waitqueue_head+0x4f/0x60
[  314.888062]  ? mlx5e_alloc_flow.isra.0+0x18c/0x1c0 [mlx5_core]
[  314.888207]  __mlx5e_add_fdb_flow+0x2d7/0x440 [mlx5_core]
[  314.888359]  ? mlx5e_tc_update_neigh_used_value+0x6f0/0x6f0 [mlx5_core]
[  314.888374]  ? match_held_lock+0x2e/0x240
[  314.888537]  mlx5e_configure_flower+0x830/0x16a0 [mlx5_core]
[  314.888702]  ? __mlx5e_add_fdb_flow+0x440/0x440 [mlx5_core]
[  314.888713]  ? down_read+0x118/0x2c0
[  314.888728]  ? down_read_killable+0x300/0x300
[  314.888882]  ? mlx5e_rep_get_ethtool_stats+0x180/0x180 [mlx5_core]
[  314.888899]  tc_setup_cb_add+0x127/0x270
[  314.888937]  fl_hw_replace_filter+0x2ac/0x380 [cls_flower]
[  314.888976]  ? fl_hw_destroy_filter+0x1b0/0x1b0 [cls_flower]
[  314.888990]  ? fl_change+0xbcf/0x27ef [cls_flower]
[  314.889030]  ? fl_change+0xa57/0x27ef [cls_flower]
[  314.889069]  fl_change+0x16bd/0x27ef [cls_flower]
[  314.889135]  ? __rhashtable_insert_fast.constprop.0+0xa00/0xa00 [cls_flower]
[  314.889167]  ? __radix_tree_lookup+0xa4/0x130
[  314.889200]  ? fl_get+0x169/0x240 [cls_flower]
[  314.889218]  ? fl_walk+0x230/0x230 [cls_flower]
[  314.889249]  tc_new_tfilter+0x5e1/0xd40
[  314.889281]  ? __rhashtable_insert_fast.constprop.0+0xa00/0xa00 [cls_flower]
[  314.889309]  ? tc_del_tfilter+0xa30/0xa30
[  314.889335]  ? __lock_acquire+0x5b5/0x2460
[  314.889378]  ? find_held_lock+0x85/0xa0
[  314.889442]  ? tc_del_tfilter+0xa30/0xa30
[  314.889465]  rtnetlink_rcv_msg+0x4ab/0x5f0
[  314.889488]  ? rtnl_dellink+0x490/0x490
[  314.889518]  ? lockdep_hardirqs_on+0x260/0x260
[  314.889538]  ? netlink_deliver_tap+0xab/0x5a0
[  314.889550]  ? match_held_lock+0x1b/0x240
[  314.889575]  netlink_rcv_skb+0xd0/0x200
[  314.889588]  ? rtnl_dellink+0x490/0x490
[  314.889605]  ? netlink_ack+0x440/0x440
[  314.889635]  ? netlink_deliver_tap+0x161/0x5a0
[  314.889648]  ? lock_downgrade+0x360/0x360
[  314.889657]  ? lock_acquire+0xe5/0x210
[  314.889686]  netlink_unicast+0x296/0x350
[  314.889707]  ? netlink_attachskb+0x390/0x390
[  314.889726]  ? _copy_from_iter_full+0xe0/0x3a0
[  314.889738]  ? __virt_addr_valid+0xbb/0x130
[  314.889771]  netlink_sendmsg+0x394/0x600
[  314.889800]  ? netlink_unicast+0x350/0x350
[  314.889817]  ? move_addr_to_kernel.part.0+0x90/0x90
[  314.889852]  ? netlink_unicast+0x350/0x350
[  314.889872]  sock_sendmsg+0x96/0xa0
[  314.889891]  ___sys_sendmsg+0x482/0x520
[  314.889919]  ? copy_msghdr_from_user+0x250/0x250
[  314.889930]  ? __fput+0x1fa/0x390
[  314.889941]  ? task_work_run+0xb7/0xf0
[  314.889957]  ? exit_to_usermode_loop+0x117/0x120
[  314.889972]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  314.889982]  ? do_syscall_64+0x74/0xe0
[  314.889992]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  314.890012]  ? mark_lock+0xac/0x9a0
[  314.890028]  ? __lock_acquire+0x5b5/0x2460
[  314.890053]  ? mark_lock+0xac/0x9a0
[  314.890083]  ? __lock_acquire+0x5b5/0x2460
[  314.890112]  ? match_held_lock+0x1b/0x240
[  314.890144]  ? __fget_light+0xa1/0xf0
[  314.890166]  ? sockfd_lookup_light+0x91/0xb0
[  314.890187]  __sys_sendmsg+0xba/0x130
[  314.890201]  ? __sys_sendmsg_sock+0xb0/0xb0
[  314.890225]  ? __blkcg_punt_bio_submit+0xd0/0xd0
[  314.890264]  ? lockdep_hardirqs_off+0xbe/0x100
[  314.890274]  ? mark_held_locks+0x24/0x90
[  314.890286]  ? do_syscall_64+0x1e/0xe0
[  314.890308]  do_syscall_64+0x74/0xe0
[  314.890325]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  314.890336] RIP: 0033:0x7f00ca33d7b8
[  314.890348] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 5
4
[  314.890356] RSP: 002b:00007ffea2983928 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  314.890369] RAX: ffffffffffffffda RBX: 000000005d777d5b RCX: 00007f00ca33d7b8
[  314.890377] RDX: 0000000000000000 RSI: 00007ffea2983990 RDI: 0000000000000003
[  314.890384] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
[  314.890392] R10: 0000000000404eda R11: 0000000000000246 R12: 0000000000000001
[  314.890400] R13: 000000000047f640 R14: 00007ffea2987b58 R15: 0000000000000021

[  314.890529] Allocated by task 2687:
[  314.890684]  save_stack+0x1b/0x80
[  314.890694]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[  314.890705]  __kmalloc_track_caller+0x102/0x340
[  314.890721]  kmemdup+0x1d/0x40
[  314.890730]  tc_setup_flow_action+0x731/0x2c27
[  314.890743]  fl_hw_replace_filter+0x23b/0x380 [cls_flower]
[  314.890756]  fl_change+0x16bd/0x27ef [cls_flower]
[  314.890765]  tc_new_tfilter+0x5e1/0xd40
[  314.890776]  rtnetlink_rcv_msg+0x4ab/0x5f0
[  314.890786]  netlink_rcv_skb+0xd0/0x200
[  314.890796]  netlink_unicast+0x296/0x350
[  314.890805]  netlink_sendmsg+0x394/0x600
[  314.890815]  sock_sendmsg+0x96/0xa0
[  314.890825]  ___sys_sendmsg+0x482/0x520
[  314.890834]  __sys_sendmsg+0xba/0x130
[  314.890844]  do_syscall_64+0x74/0xe0
[  314.890854]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  314.890937] Freed by task 2687:
[  314.891076]  save_stack+0x1b/0x80
[  314.891086]  __kasan_slab_free+0x12c/0x170
[  314.891095]  kfree+0xeb/0x2f0
[  314.891106]  tc_cleanup_flow_action+0x69/0xa0
[  314.891119]  fl_hw_replace_filter+0x2c5/0x380 [cls_flower]
[  314.891132]  fl_change+0x16bd/0x27ef [cls_flower]
[  314.891140]  tc_new_tfilter+0x5e1/0xd40
[  314.891151]  rtnetlink_rcv_msg+0x4ab/0x5f0
[  314.891161]  netlink_rcv_skb+0xd0/0x200
[  314.891170]  netlink_unicast+0x296/0x350
[  314.891180]  netlink_sendmsg+0x394/0x600
[  314.891190]  sock_sendmsg+0x96/0xa0
[  314.891200]  ___sys_sendmsg+0x482/0x520
[  314.891208]  __sys_sendmsg+0xba/0x130
[  314.891218]  do_syscall_64+0x74/0xe0
[  314.891228]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  314.891315] The buggy address belongs to the object at ffff88886c746280
                which belongs to the cache kmalloc-96 of size 96
[  314.891762] The buggy address is located 0 bytes inside of
                96-byte region [ffff88886c746280, ffff88886c7462e0)
[  314.892196] The buggy address belongs to the page:
[  314.892387] page:ffffea0021b1d180 refcount:1 mapcount:0 mapping:ffff88835d00ef80 index:0x0
[  314.892398] flags: 0x57ffffc0000200(slab)
[  314.892413] raw: 0057ffffc0000200 ffffea00219e0340 0000000800000008 ffff88835d00ef80
[  314.892423] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[  314.892430] page dumped because: kasan: bad access detected

[  314.892515] Memory state around the buggy address:
[  314.892707]  ffff88886c746180: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  314.892976]  ffff88886c746200: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  314.893251] >ffff88886c746280: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  314.893522]                    ^
[  314.893657]  ffff88886c746300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  314.893924]  ffff88886c746380: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc
[  314.894189] ==================================================================

Fix the issue by duplicating tunnel info into per-encap copy that is
deallocated with encap structure. Also, duplicate tunnel info in flow parse
attribute to support cases when flow might be attached asynchronously.

Fixes: 1f6da30697 ("net/mlx5e: Geneve, Keep tunnel info as pointer to the original struct")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29 16:27:19 -07:00
Eli Britstein 0fd79b1e17 net/mlx5: Fix NULL pointer dereference in extended destination
The cited commit refactored the encap id into a struct pointed from the
destination.
Bug fix for the case there is no encap for one of the destinations.

Fixes: 2b688ea5ef ("net/mlx5: Add flow steering actions to fs_cmd shim layer")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29 16:27:19 -07:00
Parav Pandit 2347cee83b net/mlx5: Fix rtable reference leak
If the rt entry gateway family is not AF_INET for multipath device,
rtable reference is leaked.
Hence, fix it by releasing the reference.

Fixes: 5fb091e813 ("net/mlx5e: Use hint to resolve route when in HW multipath mode")
Fixes: e32ee6c78e ("net/mlx5e: Support tunnel encap over tagged Ethernet")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29 16:27:19 -07:00
Vlad Buslov 64d7b68577 net/mlx5e: Only skip encap flows update when encap init failed
When encap entry initialization completes successfully e->compl_result is
set to positive value and not zero, like mlx5e_rep_update_flows() assumes
at the moment. Fix the conditional to only skip encap flows update when
e->compl_result < 0.

Fixes: 2a1f1768fa ("net/mlx5e: Refactor neigh update for concurrent execution")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29 16:27:18 -07:00
Maor Gottlieb 5dfb6335cb net/mlx5e: Replace kfree with kvfree when free vhca stats
Memory allocated by kvzalloc should be freed by kvfree.

Fixes: cef35af34d ("net/mlx5e: Add mlx5e HV VHCA stats agent")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29 16:27:18 -07:00
Dmytro Linkin 752d3dc06d net/mlx5e: Remove incorrect match criteria assignment line
Driver have function, which enable match criteria for misc parameters
in dependence of eswitch capabilities.

Fixes: 4f5d1beadc ("Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29 16:27:18 -07:00
Dmytro Linkin d5dbcc4e87 net/mlx5e: Determine source port properly for vlan push action
Termination tables are used for vlan push actions on uplink ports.
To support RoCE dual port the source port value was placed in a register.
Fix the code to use an API method returning the source port according to
the FW capabilities.

Fixes: 10caabdaad ("net/mlx5e: Use termination table for VLAN push actions")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29 16:27:17 -07:00
Roi Dayan 6dfef396ea net/mlx5: Fix flow counter list auto bits struct
The union should contain the extended dest and counter list.
Remove the resevered 0x40 bits which is redundant.
This change doesn't break any functionally.
Everything works today because the code in fs_cmd.c is using
the correct structs if extended dest or the basic dest.

Fixes: 1b11549859 ("net/mlx5: Introduce extended destination fields")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29 16:27:17 -07:00
David S. Miller c1b5ddc112 Merge branch 'VLAN-fixes-for-Ocelot-switch'
Vladimir Oltean says:

====================
VLAN fixes for Ocelot switch

This series addresses 2 issues with vlan_filtering=1:
- Untagged traffic gets dropped unless commands are run in a very
  specific order.
- Untagged traffic starts being transmitted as tagged after adding
  another untagged VID on the port.

Tested on NXP LS1028A-RDB board.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29 16:22:07 -07:00
Vladimir Oltean b9cd75e668 net: mscc: ocelot: refuse to overwrite the port's native vlan
The switch driver keeps a "vid" variable per port, which signifies _the_
VLAN ID that is stripped on that port's egress (aka the native VLAN on a
trunk port).

That is the way the hardware is designed (mostly). The port->vid is
programmed into REW:PORT:PORT_VLAN_CFG:PORT_VID and the rewriter is told
to send all traffic as tagged except the one having port->vid.

There exists a possibility of finer-grained egress untagging decisions:
using the VCAP IS1 engine, one rule can be added to match every
VLAN-tagged frame whose VLAN should be untagged, and set POP_CNT=1 as
action. However, the IS1 can hold at most 512 entries, and the VLANs are
in the order of 6 * 4096.

So the code is fine for now. But this sequence of commands:

$ bridge vlan add dev swp0 vid 1 pvid untagged
$ bridge vlan add dev swp0 vid 2 untagged

makes untagged and pvid-tagged traffic be sent out of swp0 as tagged
with VID 1, despite user's request.

Prevent that from happening. The user should temporarily remove the
existing untagged VLAN (1 in this case), add it back as tagged, and then
add the new untagged VLAN (2 in this case).

Cc: Antoine Tenart <antoine.tenart@bootlin.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Fixes: 7142529f16 ("net: mscc: ocelot: add VLAN filtering")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29 16:22:07 -07:00
Vladimir Oltean 1c44ce560b net: mscc: ocelot: fix vlan_filtering when enslaving to bridge before link is up
Background information: the driver operates the hardware in a mode where
a single VLAN can be transmitted as untagged on a particular egress
port. That is the "native VLAN on trunk port" use case. Its value is
held in port->vid.

Consider the following command sequence (no network manager, all
interfaces are down, debugging prints added by me):

$ ip link add dev br0 type bridge vlan_filtering 1
$ ip link set dev swp0 master br0

Kernel code path during last command:

br_add_slave -> ocelot_netdevice_port_event (NETDEV_CHANGEUPPER):
[   21.401901] ocelot_vlan_port_apply: port 0 vlan aware 0 pvid 0 vid 0

br_add_slave -> nbp_vlan_init -> switchdev_port_attr_set -> ocelot_port_attr_set (SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING):
[   21.413335] ocelot_vlan_port_apply: port 0 vlan aware 1 pvid 0 vid 0

br_add_slave -> nbp_vlan_init -> nbp_vlan_add -> br_switchdev_port_vlan_add -> switchdev_port_obj_add -> ocelot_port_obj_add -> ocelot_vlan_vid_add
[   21.667421] ocelot_vlan_port_apply: port 0 vlan aware 1 pvid 1 vid 1

So far so good. The bridge has replaced the driver's default pvid used
in standalone mode (0) with its own default_pvid (1). The port's vid
(native VLAN) has also changed from 0 to 1.

$ ip link set dev swp0 up

[   31.722956] 8021q: adding VLAN 0 to HW filter on device swp0
do_setlink -> dev_change_flags -> vlan_vid_add -> ocelot_vlan_rx_add_vid -> ocelot_vlan_vid_add:
[   31.728700] ocelot_vlan_port_apply: port 0 vlan aware 1 pvid 1 vid 0

The 8021q module uses the .ndo_vlan_rx_add_vid API on .ndo_open to make
ports be able to transmit and receive 802.1p-tagged traffic by default.
This API is supposed to offload a VLAN sub-interface, which for a switch
port means to add a VLAN that is not a pvid, and tagged on egress.

But the driver implementation of .ndo_vlan_rx_add_vid is wrong: it adds
back vid 0 as "egress untagged". Now back to the initial paragraph:
there is a single untagged VID that the driver keeps track of, and that
has just changed from 1 (the pvid) to 0. So this breaks the bridge
core's expectation, because it has changed vid 1 from untagged to
tagged, when what the user sees is.

$ bridge vlan
port    vlan ids
swp0     1 PVID Egress Untagged

br0      1 PVID Egress Untagged

But curiously, instead of manifesting itself as "untagged and
pvid-tagged traffic gets sent as tagged on egress", the bug:

- is hidden when vlan_filtering=0
- manifests as dropped traffic when vlan_filtering=1, due to this setting:

	if (port->vlan_aware && !port->vid)
		/* If port is vlan-aware and tagged, drop untagged and priority
		 * tagged frames.
		 */
		val |= ANA_PORT_DROP_CFG_DROP_UNTAGGED_ENA |
		       ANA_PORT_DROP_CFG_DROP_PRIO_S_TAGGED_ENA |
		       ANA_PORT_DROP_CFG_DROP_PRIO_C_TAGGED_ENA;

which would have made sense if it weren't for this bug. The setting's
intention was "this is a trunk port with no native VLAN, so don't accept
untagged traffic". So the driver was never expecting to set VLAN 0 as
the value of the native VLAN, 0 was just encoding for "invalid".

So the fix is to not send 802.1p traffic as untagged, because that would
change the port's native vlan to 0, unbeknownst to the bridge, and
trigger unexpected code paths in the driver.

Cc: Antoine Tenart <antoine.tenart@bootlin.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Fixes: 7142529f16 ("net: mscc: ocelot: add VLAN filtering")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29 16:22:07 -07:00
Navid Emamdoost 6f3ef5c25c wimax: i2400: Fix memory leak in i2400m_op_rfkill_sw_toggle
In the implementation of i2400m_op_rfkill_sw_toggle() the allocated
buffer for cmd should be released before returning. The
documentation for i2400m_msg_to_dev() says when it returns the buffer
can be reused. Meaning cmd should be released in either case. Move
kfree(cmd) before return to be reached by all execution paths.

Fixes: 2507e6ab7a ("wimax: i2400: fix memory leak")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29 16:20:25 -07:00
Anna Karas dd7ebe6787 drm/i915/tgl: Fix doc not corresponding to code
Replace PLLs names used in documentation to that used in the code.

Cc: Vandita Kulkarni <vandita.kulkarni@intel.com>
Fixes: 68ff39c3f8 ("drm/i915/tgl: Add new pll ids")
Signed-off-by: Anna Karas <anna.karas@intel.com>
Reviewed-by: Vandita Kulkarni <vandita.kulkarni@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190926123559.15717-1-anna.karas@intel.com
(cherry picked from commit d328bd4f90)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2019-10-29 15:33:12 -07:00
Robin Murphy f70744c687 drm/panfrost: Don't dereference bogus MMU pointers
It seems that killing an application while faults are occurring
(particularly with a GPU in FPGA at a whopping 40MHz) can lead to
handling a lingering page fault after all the address space contexts
have already been freed. In this situation, the LRU list is empty so
addr_to_drm_mm_node() ends up dereferencing the list head as if it were
a struct panfrost_mmu entry; this leaves "mmu->as" actually pointing at
the pfdev->alloc_mask bitmap, which is also empty, and given that the
fault has a high likelihood of being in AS0, hilarity ensues.

Sadly, the cleanest solution seems to involve another goto. Oh well, at
least it's robust...

Fixes: 65e51e30d8 ("drm/panfrost: Prevent race when handling page fault")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/9a0b09e6b5851f0d4428b72dd6b8b4c0d0ef4206.1572293305.git.robin.murphy@arm.com
2019-10-29 13:18:17 -05:00
Yi Wang 6f39188c9d drm/panfrost: fix -Wmissing-prototypes warnings
We get these warnings when build kernel W=1:
drivers/gpu/drm/panfrost/panfrost_perfcnt.c:35:6: warning: no previous prototype for ‘panfrost_perfcnt_clean_cache_done’ [-Wmissing-prototypes]
drivers/gpu/drm/panfrost/panfrost_perfcnt.c:40:6: warning: no previous prototype for ‘panfrost_perfcnt_sample_done’ [-Wmissing-prototypes]
drivers/gpu/drm/panfrost/panfrost_perfcnt.c:190:5: warning: no previous prototype for ‘panfrost_ioctl_perfcnt_enable’ [-Wmissing-prototypes]
drivers/gpu/drm/panfrost/panfrost_perfcnt.c:218:5: warning: no previous prototype for ‘panfrost_ioctl_perfcnt_dump’ [-Wmissing-prototypes]
drivers/gpu/drm/panfrost/panfrost_perfcnt.c:250:6: warning: no previous prototype for ‘panfrost_perfcnt_close’ [-Wmissing-prototypes]
drivers/gpu/drm/panfrost/panfrost_perfcnt.c:264:5: warning: no previous prototype for ‘panfrost_perfcnt_init’ [-Wmissing-prototypes]
drivers/gpu/drm/panfrost/panfrost_perfcnt.c:320:6: warning: no previous prototype for ‘panfrost_perfcnt_fini’ [-Wmissing-prototypes]
drivers/gpu/drm/panfrost/panfrost_mmu.c:227:6: warning: no previous prototype for ‘panfrost_mmu_flush_range’ [-Wmissing-prototypes]
drivers/gpu/drm/panfrost/panfrost_mmu.c:435:5: warning: no previous prototype for ‘panfrost_mmu_map_fault_addr’ [-Wmissing-prototypes]

For file panfrost_mmu.c, make functions static to fix this.
For file panfrost_perfcnt.c, include header file can fix this.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Reviewed-by: Steven Price <steven.price@arm.com>
Cc: stable@vger.kernel.org
[robh: fixup function parameter alignment]
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/1571967015-42854-1-git-send-email-wang.yi59@zte.com.cn
2019-10-29 13:08:20 -05:00
Jiangfeng Xiao 63a4174682 net: hisilicon: Fix "Trying to free already-free IRQ"
When rmmod hip04_eth.ko, we can get the following warning:

Task track: rmmod(1623)>bash(1591)>login(1581)>init(1)
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1623 at kernel/irq/manage.c:1557 __free_irq+0xa4/0x2ac()
Trying to free already-free IRQ 200
Modules linked in: ping(O) pramdisk(O) cpuinfo(O) rtos_snapshot(O) interrupt_ctrl(O) mtdblock mtd_blkdevrtfs nfs_acl nfs lockd grace sunrpc xt_tcpudp ipt_REJECT iptable_filter ip_tables x_tables nf_reject_ipv
CPU: 0 PID: 1623 Comm: rmmod Tainted: G           O    4.4.193 #1
Hardware name: Hisilicon A15
[<c020b408>] (rtos_unwind_backtrace) from [<c0206624>] (show_stack+0x10/0x14)
[<c0206624>] (show_stack) from [<c03f2be4>] (dump_stack+0xa0/0xd8)
[<c03f2be4>] (dump_stack) from [<c021a780>] (warn_slowpath_common+0x84/0xb0)
[<c021a780>] (warn_slowpath_common) from [<c021a7e8>] (warn_slowpath_fmt+0x3c/0x68)
[<c021a7e8>] (warn_slowpath_fmt) from [<c026876c>] (__free_irq+0xa4/0x2ac)
[<c026876c>] (__free_irq) from [<c0268a14>] (free_irq+0x60/0x7c)
[<c0268a14>] (free_irq) from [<c0469e80>] (release_nodes+0x1c4/0x1ec)
[<c0469e80>] (release_nodes) from [<c0466924>] (__device_release_driver+0xa8/0x104)
[<c0466924>] (__device_release_driver) from [<c0466a80>] (driver_detach+0xd0/0xf8)
[<c0466a80>] (driver_detach) from [<c0465e18>] (bus_remove_driver+0x64/0x8c)
[<c0465e18>] (bus_remove_driver) from [<c02935b0>] (SyS_delete_module+0x198/0x1e0)
[<c02935b0>] (SyS_delete_module) from [<c0202ed0>] (__sys_trace_return+0x0/0x10)
---[ end trace bb25d6123d849b44 ]---

Currently "rmmod hip04_eth.ko" call free_irq more than once
as devres_release_all and hip04_remove both call free_irq.
This results in a 'Trying to free already-free IRQ' warning.
To solve the problem free_irq has been moved out of hip04_remove.

Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29 10:57:01 -07:00
Will Deacon 85ac30fa2e fjes: Handle workqueue allocation failure
In the highly unlikely event that we fail to allocate either of the
"/txrx" or "/control" workqueues, we should bail cleanly rather than
blindly march on with NULL queue pointer(s) installed in the
'fjes_adapter' instance.

Cc: "David S. Miller" <davem@davemloft.net>
Reported-by: Nicolas Waisman <nico@semmle.com>
Link: https://lore.kernel.org/lkml/CADJ_3a8WFrs5NouXNqS5WYe7rebFP+_A5CheeqAyD_p7DFJJcg@mail.gmail.com/
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29 10:33:10 -07:00
Bjorn Andersson d4af3c4b81 arm64: cpufeature: Enable Qualcomm Falkor/Kryo errata 1003
With the introduction of 'cce360b54ce6 ("arm64: capabilities: Filter the
entries based on a given mask")' the Qualcomm Falkor/Kryo errata 1003 is
no long applied.

The result of not applying errata 1003 is that MSM8996 runs into various
RCU stalls and fails to boot most of the times.

Give 1003 a "type" to ensure they are not filtered out in
update_cpu_capabilities().

Fixes: cce360b54c ("arm64: capabilities: Filter the entries based on a given mask")
Cc: stable@vger.kernel.org
Reported-by: Mark Brown <broonie@kernel.org>
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2019-10-29 17:18:50 +00:00
Christian Gmeiner a2f10d4a30 drm/etnaviv: fix dumping of iommuv2
etnaviv_iommuv2_dump_size(..) returns the number of PTE * SZ_4K but
etnaviv_iommuv2_dump(..) increments buf pointer even if there is no PTE.
This results in a bad buf pointer which gets used for memcpy(..), when
copying the MMU state in the coredump buffer.

Fixes: afb7b3b1de ("drm/etnaviv: implement IOMMUv2 translation")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2019-10-29 18:12:24 +01:00
Lucas Stach 18fa692d80 drm/etnaviv: reinstate MMUv1 command buffer window check
The switch to per-process address spaces erroneously dropped the check
which validated that the command buffer is mapped through the linear
apperture as required by the hardware. This turned a system
misconfiguration with a helpful error message into a very hard to
debug issue. Reinstate the check at the appropriate location.

Fixes: 17e4660ae3 (drm/etnaviv: implement per-process address spaces on MMUv2)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
2019-10-29 18:11:50 +01:00
Lucas Stach ca8cb69580 drm/etnaviv: fix deadlock in GPU coredump
The GPU coredump function violates the locking order by holding the MMU
context lock while trying to acquire the etnaviv_gem_object lock. This
results in a possible ABBA deadlock with other codepaths which follow
the established locking order.
Fortunately this is easy to fix by dropping the MMU context lock
earlier, as the BO dumping doesn't need the MMU context to be stable.
The only thing the BO dumping cares about are the BO mappings, which
are stable across the lifetime of the job.

Fixes: 27b67278e0 (drm/etnaviv: rework MMU handling)
[ Not really the first bad commit, but the one where this fix applies
  cleanly. Stable kernels need a manual backport. ]
Reported-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-10-29 18:11:06 +01:00
Linus Torvalds 23fdb198ae fuse fixes for 5.4-rc6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXbgx9wAKCRDh3BK/laaZ
 PIjMAQCTrWPf6pLHSoq+Pll7b8swu1m2mW97fj5k8coED1/DSQEA4222PkIdMmhu
 qyHnoX0fTtYXg6NnHbDVWsNL4uG8YAM=
 =RE9K
 -----END PGP SIGNATURE-----

Merge tag 'fuse-fixes-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:
 "Mostly virtiofs fixes, but also fixes a regression and couple of
  longstanding data/metadata writeback ordering issues"

* tag 'fuse-fixes-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: redundant get_fuse_inode() calls in fuse_writepages_fill()
  fuse: Add changelog entries for protocols 7.1 - 7.8
  fuse: truncate pending writes on O_TRUNC
  fuse: flush dirty data/metadata before non-truncate setattr
  virtiofs: Remove set but not used variable 'fc'
  virtiofs: Retry request submission from worker context
  virtiofs: Count pending forgets as in_flight forgets
  virtiofs: Set FR_SENT flag only after request has been sent
  virtiofs: No need to check fpq->connected state
  virtiofs: Do not end request in submission context
  fuse: don't advise readdirplus for negative lookup
  fuse: don't dereference req->args on finished request
  virtio-fs: don't show mount options
  virtio-fs: Change module name to virtiofs.ko
2019-10-29 17:43:33 +01:00
Catalin Marinas aa57157be6 arm64: Ensure VM_WRITE|VM_SHARED ptes are clean by default
Shared and writable mappings (__S.1.) should be clean (!dirty) initially
and made dirty on a subsequent write either through the hardware DBM
(dirty bit management) mechanism or through a write page fault. A clean
pte for the arm64 kernel is one that has PTE_RDONLY set and PTE_DIRTY
clear.

The PAGE_SHARED{,_EXEC} attributes have PTE_WRITE set (PTE_DBM) and
PTE_DIRTY clear. Prior to commit 73e86cb03c ("arm64: Move PTE_RDONLY
bit handling out of set_pte_at()"), it was the responsibility of
set_pte_at() to set the PTE_RDONLY bit and mark the pte clean if the
software PTE_DIRTY bit was not set. However, the above commit removed
the pte_sw_dirty() check and the subsequent setting of PTE_RDONLY in
set_pte_at() while leaving the PAGE_SHARED{,_EXEC} definitions
unchanged. The result is that shared+writable mappings are now dirty by
default

Fix the above by explicitly setting PTE_RDONLY in PAGE_SHARED{,_EXEC}.
In addition, remove the superfluous PTE_DIRTY bit from the kernel PROT_*
attributes.

Fixes: 73e86cb03c ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()")
Cc: <stable@vger.kernel.org> # 4.14.x-
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-10-29 16:22:33 +00:00
Anton Ivanov d848074b2f um-ubd: Entrust re-queue to the upper layers
Fixes crashes due to ubd requeue logic conflicting with the block-mq
logic. Crash is reproducible in 5.0 - 5.3.

Fixes: 53766defb8 ("um: Clean-up command processing in UML UBD driver")
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 10:07:41 -06:00
Anton Eidelman 86cccfbf77 nvme-multipath: remove unused groups_only mode in ana log
groups_only mode in nvme_read_ana_log() is no longer used: remove it.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 08:55:00 -06:00
Anton Eidelman af8fd04247 nvme-multipath: fix possible io hang after ctrl reconnect
The following scenario results in an IO hang:
1) ctrl completes a request with NVME_SC_ANA_TRANSITION.
   NVME_NS_ANA_PENDING bit in ns->flags is set and ana_work is triggered.
2) ana_work: nvme_read_ana_log() tries to get the ANA log page from the ctrl.
   This fails because ctrl disconnects.
   Therefore nvme_update_ns_ana_state() is not called
   and NVME_NS_ANA_PENDING bit in ns->flags is not cleared.
3) ctrl reconnects: nvme_mpath_init(ctrl,...) calls
   nvme_read_ana_log(ctrl, groups_only=true).
   However, nvme_update_ana_state() does not update namespaces
   because nr_nsids = 0 (due to groups_only mode).
4) scan_work calls nvme_validate_ns() finds the ns and re-validates OK.

Result:
The ctrl is now live but NVME_NS_ANA_PENDING bit in ns->flags is still set.
Consequently ctrl will never be considered a viable path by __nvme_find_path().
IO will hang if ctrl is the only or the last path to the namespace.

More generally, while ctrl is reconnecting, its ANA state may change.
And because nvme_mpath_init() requests ANA log in groups_only mode,
these changes are not propagated to the existing ctrl namespaces.
This may result in a mal-function or an IO hang.

Solution:
nvme_mpath_init() will nvme_read_ana_log() with groups_only set to false.
This will not harm the new ctrl case (no namespaces present),
and will make sure the ANA state of namespaces gets updated after reconnect.

Note: Another option would be for nvme_mpath_init() to invoke
nvme_parse_ana_log(..., nvme_set_ns_ana_state) for each existing namespace.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 08:55:00 -06:00
Nicholas Piggin 7d6475051f powerpc/powernv: Fix CPU idle to be called with IRQs disabled
Commit e78a7614f3 ("idle: Prevent late-arriving interrupts from
disrupting offline") changes arch_cpu_idle_dead to be called with
interrupts disabled, which triggers the WARN in pnv_smp_cpu_kill_self.

Fix this by fixing up irq_happened after hard disabling, rather than
requiring there are no pending interrupts, similarly to what was done
done until commit 2525db04d1 ("powerpc/powernv: Simplify lazy IRQ
handling in CPU offline").

Fixes: e78a7614f3 ("idle: Prevent late-arriving interrupts from disrupting offline")
Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add unexpected_mask rather than checking for known bad values,
      change the WARN_ON() to a WARN_ON_ONCE()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191022115814.22456-1-npiggin@gmail.com
2019-10-29 21:47:01 +11:00
Valentin Schneider e284df705c sched/topology: Allow sched_asym_cpucapacity to be disabled
While the static key is correctly initialized as being disabled, it will
remain forever enabled once turned on. This means that if we start with an
asymmetric system and hotplug out enough CPUs to end up with an SMP system,
the static key will remain set - which is obviously wrong. We should detect
this and turn off things like misfit migration and capacity aware wakeups.

As Quentin pointed out, having separate root domains makes this slightly
trickier. We could have exclusive cpusets that create an SMP island - IOW,
the domains within this root domain will not see any asymmetry. This means
we can't just disable the key on domain destruction, we need to count how
many asymmetric root domains we have.

Consider the following example using Juno r0 which is 2+4 big.LITTLE, where
two identical cpusets are created: they both span both big and LITTLE CPUs:

    asym0    asym1
  [       ][       ]
   L  L  B  L  L  B

  $ cgcreate -g cpuset:asym0
  $ cgset -r cpuset.cpus=0,1,3 asym0
  $ cgset -r cpuset.mems=0 asym0
  $ cgset -r cpuset.cpu_exclusive=1 asym0

  $ cgcreate -g cpuset:asym1
  $ cgset -r cpuset.cpus=2,4,5 asym1
  $ cgset -r cpuset.mems=0 asym1
  $ cgset -r cpuset.cpu_exclusive=1 asym1

  $ cgset -r cpuset.sched_load_balance=0 .

(the CPU numbering may look odd because on the Juno LITTLEs are CPUs 0,3-5
and bigs are CPUs 1-2)

If we make one of those SMP (IOW remove asymmetry) by e.g. hotplugging its
big core, we would end up with an SMP cpuset and an asymmetric cpuset - the
static key must remain set, because we still have one asymmetric root domain.

With the above example, this could be done with:

  $ echo 0 > /sys/devices/system/cpu/cpu2/online

Which would result in:

    asym0   asym1
  [       ][    ]
   L  L  B  L  L

When both SMP and asymmetric cpusets are present, all CPUs will observe
sched_asym_cpucapacity being set (it is system-wide), but not all CPUs
observe asymmetry in their sched domain hierarchy:

  per_cpu(sd_asym_cpucapacity, <any CPU in asym0>) == <some SD at DIE level>
  per_cpu(sd_asym_cpucapacity, <any CPU in asym1>) == NULL

Change the simple key enablement to an increment, and decrement the key
counter when destroying domains that cover asymmetric CPUs.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Dietmar.Eggemann@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hannes@cmpxchg.org
Cc: lizefan@huawei.com
Cc: morten.rasmussen@arm.com
Cc: qperret@google.com
Cc: tj@kernel.org
Cc: vincent.guittot@linaro.org
Fixes: df054e8445 ("sched/topology: Add static_key for asymmetric CPU capacity optimizations")
Link: https://lkml.kernel.org/r/20191023153745.19515-3-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-29 09:58:46 +01:00
Valentin Schneider cd1cb33505 sched/topology: Don't try to build empty sched domains
Turns out hotplugging CPUs that are in exclusive cpusets can lead to the
cpuset code feeding empty cpumasks to the sched domain rebuild machinery.

This leads to the following splat:

    Internal error: Oops: 96000004 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 0 PID: 235 Comm: kworker/5:2 Not tainted 5.4.0-rc1-00005-g8d495477d62e #23
    Hardware name: ARM Juno development board (r0) (DT)
    Workqueue: events cpuset_hotplug_workfn
    pstate: 60000005 (nZCv daif -PAN -UAO)
    pc : build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
    lr : build_sched_domains (kernel/sched/topology.c:1966)
    Call trace:
    build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
    partition_sched_domains_locked (kernel/sched/topology.c:2250)
    rebuild_sched_domains_locked (./include/linux/bitmap.h:370 ./include/linux/cpumask.h:538 kernel/cgroup/cpuset.c:955 kernel/cgroup/cpuset.c:978 kernel/cgroup/cpuset.c:1019)
    rebuild_sched_domains (kernel/cgroup/cpuset.c:1032)
    cpuset_hotplug_workfn (kernel/cgroup/cpuset.c:3205 (discriminator 2))
    process_one_work (./arch/arm64/include/asm/jump_label.h:21 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:114 kernel/workqueue.c:2274)
    worker_thread (./include/linux/compiler.h:199 ./include/linux/list.h:268 kernel/workqueue.c:2416)
    kthread (kernel/kthread.c:255)
    ret_from_fork (arch/arm64/kernel/entry.S:1167)
    Code: f860dae2 912802d6 aa1603e1 12800000 (f8616853)

The faulty line in question is:

  cap = arch_scale_cpu_capacity(cpumask_first(cpu_map));

and we're not checking the return value against nr_cpu_ids (we shouldn't
have to!), which leads to the above.

Prevent generate_sched_domains() from returning empty cpumasks, and add
some assertion in build_sched_domains() to scream bloody murder if it
happens again.

The above splat was obtained on my Juno r0 with the following reproducer:

  $ cgcreate -g cpuset:asym
  $ cgset -r cpuset.cpus=0-3 asym
  $ cgset -r cpuset.mems=0 asym
  $ cgset -r cpuset.cpu_exclusive=1 asym

  $ cgcreate -g cpuset:smp
  $ cgset -r cpuset.cpus=4-5 smp
  $ cgset -r cpuset.mems=0 smp
  $ cgset -r cpuset.cpu_exclusive=1 smp

  $ cgset -r cpuset.sched_load_balance=0 .

  $ echo 0 > /sys/devices/system/cpu/cpu4/online
  $ echo 0 > /sys/devices/system/cpu/cpu5/online

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar.Eggemann@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hannes@cmpxchg.org
Cc: lizefan@huawei.com
Cc: morten.rasmussen@arm.com
Cc: qperret@google.com
Cc: tj@kernel.org
Cc: vincent.guittot@linaro.org
Fixes: 05484e0984 ("sched/topology: Add SD_ASYM_CPUCAPACITY flag detection")
Link: https://lkml.kernel.org/r/20191023153745.19515-2-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-29 09:58:45 +01:00
Alan Stern 54f83b8c8e USB: gadget: Reject endpoints with 0 maxpacket value
Endpoints with a maxpacket length of 0 are probably useless.  They
can't transfer any data, and it's not at all unlikely that a UDC will
crash or hang when trying to handle a non-zero-length usb_request for
such an endpoint.  Indeed, dummy-hcd gets a divide error when trying
to calculate the remainder of a transfer length by the maxpacket
value, as discovered by the syzbot fuzzer.

Currently the gadget core does not check for endpoints having a
maxpacket value of 0.  This patch adds a check to usb_ep_enable(),
preventing such endpoints from being used.

As far as I know, none of the gadget drivers in the kernel tries to
create an endpoint with maxpacket = 0, but until now there has been
nothing to prevent userspace programs under gadgetfs or configfs from
doing it.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+8ab8bf161038a8768553@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Acked-by: Felipe Balbi <balbi@kernel.org>

Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.1910281052370.1485-100000@iolanthe.rowland.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-29 09:56:18 +01:00
Thiago Jung Bauermann 05d9a95283 powerpc/prom_init: Undo relocation before entering secure mode
The ultravisor will do an integrity check of the kernel image but we
relocated it so the check will fail. Restore the original image by
relocating it back to the kernel virtual base address.

This works because during build vmlinux is linked with an expected
virtual runtime address of KERNELBASE.

Fixes: 6a9c930bd7 ("powerpc/prom_init: Add the ESM call to prom_init")
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Tested-by: Michael Anderson <andmike@linux.ibm.com>
[mpe: Add IS_ENABLED() to fix the CONFIG_RELOCATABLE=n build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190911163433.12822-1-bauerman@linux.ibm.com
2019-10-29 15:12:17 +11:00
Nicholas Piggin d3566abb1a scsi: qla2xxx: stop timer in shutdown path
In shutdown/reboot paths, the timer is not stopped:

  qla2x00_shutdown
  pci_device_shutdown
  device_shutdown
  kernel_restart_prepare
  kernel_restart
  sys_reboot

This causes lockups (on powerpc) when firmware config space access calls
are interrupted by smp_send_stop later in reboot.

Fixes: e30d175648 ("[SCSI] qla2xxx: Addition of shutdown callback handler.")
Link: https://lore.kernel.org/r/20191024063804.14538-1-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-28 21:58:01 -04:00
Nicolin Chen 2ccb4f16d0 hwmon: (ina3221) Fix read timeout issue
After introducing "samples" to the calculation of wait time, the
driver might timeout at the regmap_field_read_poll_timeout call,
because the wait time could be longer than the 100000 usec limit
due to a large "samples" number.

So this patch sets the timeout limit to 2 times of the wait time
in order to fix this issue.

Fixes: 5c090abf94 ("hwmon: (ina3221) Add averaging mode support")
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Link: https://lore.kernel.org/r/20191022005922.30239-1-nicoleotsuka@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2019-10-28 18:46:55 -07:00
David S. Miller 55793d2a43 Here are two batman-adv bugfixes:
* Fix free/alloc race for OGM and OGMv2, by Sven Eckelmann (2 patches)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl2yvoAWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoVosEAC7Ei09QJKa2Wh6by4c8OQKvloK
 ecei39e0FjMUx4/XETSR/kdHrRFwzMPvIhqe+DUCski7p5x9sHWrkJwIWWirGrDf
 mAsF8mhA1E0pfsQmQRqu4h/UbDyLLhsbfEf95B75Uf6Nq2cIG0Ko7rb2ao3DR7c2
 YwvSjzT+zwlS8Q6Hj1YsJKSS3BEk5NQ9r9soD9WhwWmYynn9TxTWy8o9z3lBCXtV
 j+mOm0qNebxsDKhr7jbfujjIaepmTOQJHwqPzm1Vo/v6CMEdJzd6j8vbsxCUyXvI
 /Cg3V+8INEGmRmCDT96paO5kYkHptr8i467Raj3DAdwlh3BuoMo382i3D3GOXgEK
 VtieWhuOplAJrbc+GFifvFSAA75+B8rZ/IV8CDFIbWHhEjZ/fdtyl1panqQkKSqw
 n5m/nAyKmfTZ9CH0kZ/zKU8EKPaMNh0BUTbjhU/8Zn0lDI+loBs024K5a00TWtD8
 GnjCGKi87HhtCn/3hxasbaVEyJIrMvPlzyYRylzFfSml55GCBuA8i8l/7ql0QKCe
 xq78J08x/n3RjAj+aguLQ1v4Vz/EgOvzj4RdfHS2CGSI9m37z7wz5oFTKcqi4tX+
 4Phpl7onRnMDx6AiZHMGxwrti/AdFO3YZmOiIh/IEBjUBhmABuj6Eq40thl/8Hkv
 DfoOTKf98HhmaW6d/g==
 =usWf
 -----END PGP SIGNATURE-----

Merge tag 'batadv-net-for-davem-20191025' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here are two batman-adv bugfixes:

 * Fix free/alloc race for OGM and OGMv2, by Sven Eckelmann (2 patches)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28 16:39:07 -07:00
Daniel Wagner 0a29ac5bd3 net: usb: lan78xx: Disable interrupts before calling generic_handle_irq()
lan78xx_status() will run with interrupts enabled due to the change in
ed194d1367 ("usb: core: remove local_irq_save() around ->complete()
handler"). generic_handle_irq() expects to be run with IRQs disabled.

[    4.886203] 000: irq 79 handler irq_default_primary_handler+0x0/0x8 enabled interrupts
[    4.886243] 000: WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:152 __handle_irq_event_percpu+0x154/0x168
[    4.896294] 000: Modules linked in:
[    4.896301] 000: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.6 #39
[    4.896310] 000: Hardware name: Raspberry Pi 3 Model B+ (DT)
[    4.896315] 000: pstate: 60000005 (nZCv daif -PAN -UAO)
[    4.896321] 000: pc : __handle_irq_event_percpu+0x154/0x168
[    4.896331] 000: lr : __handle_irq_event_percpu+0x154/0x168
[    4.896339] 000: sp : ffff000010003cc0
[    4.896346] 000: x29: ffff000010003cc0 x28: 0000000000000060
[    4.896355] 000: x27: ffff000011021980 x26: ffff00001189c72b
[    4.896364] 000: x25: ffff000011702bc0 x24: ffff800036d6e400
[    4.896373] 000: x23: 000000000000004f x22: ffff000010003d64
[    4.896381] 000: x21: 0000000000000000 x20: 0000000000000002
[    4.896390] 000: x19: ffff8000371c8480 x18: 0000000000000060
[    4.896398] 000: x17: 0000000000000000 x16: 00000000000000eb
[    4.896406] 000: x15: ffff000011712d18 x14: 7265746e69206465
[    4.896414] 000: x13: ffff000010003ba0 x12: ffff000011712df0
[    4.896422] 000: x11: 0000000000000001 x10: ffff000011712e08
[    4.896430] 000: x9 : 0000000000000001 x8 : 000000000003c920
[    4.896437] 000: x7 : ffff0000118cc410 x6 : ffff0000118c7f00
[    4.896445] 000: x5 : 000000000003c920 x4 : 0000000000004510
[    4.896453] 000: x3 : ffff000011712dc8 x2 : 0000000000000000
[    4.896461] 000: x1 : 73a3f67df94c1500 x0 : 0000000000000000
[    4.896466] 000: Call trace:
[    4.896471] 000:  __handle_irq_event_percpu+0x154/0x168
[    4.896481] 000:  handle_irq_event_percpu+0x50/0xb0
[    4.896489] 000:  handle_irq_event+0x40/0x98
[    4.896497] 000:  handle_simple_irq+0xa4/0xf0
[    4.896505] 000:  generic_handle_irq+0x24/0x38
[    4.896513] 000:  intr_complete+0xb0/0xe0
[    4.896525] 000:  __usb_hcd_giveback_urb+0x58/0xd8
[    4.896533] 000:  usb_giveback_urb_bh+0xd0/0x170
[    4.896539] 000:  tasklet_action_common.isra.0+0x9c/0x128
[    4.896549] 000:  tasklet_hi_action+0x24/0x30
[    4.896556] 000:  __do_softirq+0x120/0x23c
[    4.896564] 000:  irq_exit+0xb8/0xd8
[    4.896571] 000:  __handle_domain_irq+0x64/0xb8
[    4.896579] 000:  bcm2836_arm_irqchip_handle_irq+0x60/0xc0
[    4.896586] 000:  el1_irq+0xb8/0x140
[    4.896592] 000:  arch_cpu_idle+0x10/0x18
[    4.896601] 000:  do_idle+0x200/0x280
[    4.896608] 000:  cpu_startup_entry+0x20/0x28
[    4.896615] 000:  rest_init+0xb4/0xc0
[    4.896623] 000:  arch_call_rest_init+0xc/0x14
[    4.896632] 000:  start_kernel+0x454/0x480

Fixes: ed194d1367 ("usb: core: remove local_irq_save() around ->complete() handler")
Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Stefan Wahren <wahrenst@gmx.net>
Cc: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Tested-by: Stefan Wahren <wahrenst@gmx.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28 16:35:04 -07:00
Arnd Bergmann 5d294fc483 net: dsa: sja1105: improve NET_DSA_SJA1105_TAS dependency
An earlier bugfix introduced a dependency on CONFIG_NET_SCH_TAPRIO,
but this missed the case of NET_SCH_TAPRIO=m and NET_DSA_SJA1105=y,
which still causes a link error:

drivers/net/dsa/sja1105/sja1105_tas.o: In function `sja1105_setup_tc_taprio':
sja1105_tas.c:(.text+0x5c): undefined reference to `taprio_offload_free'
sja1105_tas.c:(.text+0x3b4): undefined reference to `taprio_offload_get'
drivers/net/dsa/sja1105/sja1105_tas.o: In function `sja1105_tas_teardown':
sja1105_tas.c:(.text+0x6ec): undefined reference to `taprio_offload_free'

Change the dependency to only allow selecting the TAS code when it
can link against the taprio code.

Fixes: a8d570de0c ("net: dsa: sja1105: Add dependency for NET_DSA_SJA1105_TAS")
Fixes: 317ab5b86c ("net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28 16:33:42 -07:00
Benjamin Herrenschmidt 88824e3bf2 net: ethernet: ftgmac100: Fix DMA coherency issue with SW checksum
We are calling the checksum helper after the dma_map_single()
call to map the packet. This is incorrect as the checksumming
code will touch the packet from the CPU. This means the cache
won't be properly flushes (or the bounce buffering will leave
us with the unmodified packet to DMA).

This moves the calculation of the checksum & vlan tags to
before the DMA mapping.

This also has the side effect of fixing another bug: If the
checksum helper fails, we goto "drop" to drop the packet, which
will not unmap the DMA mapping.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fixes: 05690d633f ("ftgmac100: Upgrade to NETIF_F_HW_CSUM")
Reviewed-by: Vijay Khemka <vijaykhemka@fb.com>
Tested-by: Vijay Khemka <vijaykhemka@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28 16:22:50 -07:00
Tejun Heo 20eb4f29b6 net: fix sk_page_frag() recursion from memory reclaim
sk_page_frag() optimizes skb_frag allocations by using per-task
skb_frag cache when it knows it's the only user.  The condition is
determined by seeing whether the socket allocation mask allows
blocking - if the allocation may block, it obviously owns the task's
context and ergo exclusively owns current->task_frag.

Unfortunately, this misses recursion through memory reclaim path.
Please take a look at the following backtrace.

 [2] RIP: 0010:tcp_sendmsg_locked+0xccf/0xe10
     ...
     tcp_sendmsg+0x27/0x40
     sock_sendmsg+0x30/0x40
     sock_xmit.isra.24+0xa1/0x170 [nbd]
     nbd_send_cmd+0x1d2/0x690 [nbd]
     nbd_queue_rq+0x1b5/0x3b0 [nbd]
     __blk_mq_try_issue_directly+0x108/0x1b0
     blk_mq_request_issue_directly+0xbd/0xe0
     blk_mq_try_issue_list_directly+0x41/0xb0
     blk_mq_sched_insert_requests+0xa2/0xe0
     blk_mq_flush_plug_list+0x205/0x2a0
     blk_flush_plug_list+0xc3/0xf0
 [1] blk_finish_plug+0x21/0x2e
     _xfs_buf_ioapply+0x313/0x460
     __xfs_buf_submit+0x67/0x220
     xfs_buf_read_map+0x113/0x1a0
     xfs_trans_read_buf_map+0xbf/0x330
     xfs_btree_read_buf_block.constprop.42+0x95/0xd0
     xfs_btree_lookup_get_block+0x95/0x170
     xfs_btree_lookup+0xcc/0x470
     xfs_bmap_del_extent_real+0x254/0x9a0
     __xfs_bunmapi+0x45c/0xab0
     xfs_bunmapi+0x15/0x30
     xfs_itruncate_extents_flags+0xca/0x250
     xfs_free_eofblocks+0x181/0x1e0
     xfs_fs_destroy_inode+0xa8/0x1b0
     destroy_inode+0x38/0x70
     dispose_list+0x35/0x50
     prune_icache_sb+0x52/0x70
     super_cache_scan+0x120/0x1a0
     do_shrink_slab+0x120/0x290
     shrink_slab+0x216/0x2b0
     shrink_node+0x1b6/0x4a0
     do_try_to_free_pages+0xc6/0x370
     try_to_free_mem_cgroup_pages+0xe3/0x1e0
     try_charge+0x29e/0x790
     mem_cgroup_charge_skmem+0x6a/0x100
     __sk_mem_raise_allocated+0x18e/0x390
     __sk_mem_schedule+0x2a/0x40
 [0] tcp_sendmsg_locked+0x8eb/0xe10
     tcp_sendmsg+0x27/0x40
     sock_sendmsg+0x30/0x40
     ___sys_sendmsg+0x26d/0x2b0
     __sys_sendmsg+0x57/0xa0
     do_syscall_64+0x42/0x100
     entry_SYSCALL_64_after_hwframe+0x44/0xa9

In [0], tcp_send_msg_locked() was using current->page_frag when it
called sk_wmem_schedule().  It already calculated how many bytes can
be fit into current->page_frag.  Due to memory pressure,
sk_wmem_schedule() called into memory reclaim path which called into
xfs and then IO issue path.  Because the filesystem in question is
backed by nbd, the control goes back into the tcp layer - back into
tcp_sendmsg_locked().

nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes
sense - it's in the process of freeing memory and wants to be able to,
e.g., drop clean pages to make forward progress.  However, this
confused sk_page_frag() called from [2].  Because it only tests
whether the allocation allows blocking which it does, it now thinks
current->page_frag can be used again although it already was being
used in [0].

After [2] used current->page_frag, the offset would be increased by
the used amount.  When the control returns to [0],
current->page_frag's offset is increased and the previously calculated
number of bytes now may overrun the end of allocated memory leading to
silent memory corruptions.

Fix it by adding gfpflags_normal_context() which tests sleepable &&
!reclaim and use it to determine whether to use current->task_frag.

v2: Eric didn't like gfp flags being tested twice.  Introduce a new
    helper gfpflags_normal_context() and combine the two tests.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28 16:17:31 -07:00
Eric Dumazet a793183caa udp: fix data-race in udp_set_dev_scratch()
KCSAN reported a data-race in udp_set_dev_scratch() [1]

The issue here is that we must not write over skb fields
if skb is shared. A similar issue has been fixed in commit
89c22d8c3b ("net: Fix skb csum races when peeking")

While we are at it, use a helper only dealing with
udp_skb_scratch(skb)->csum_unnecessary, as this allows
udp_set_dev_scratch() to be called once and thus inlined.

[1]
BUG: KCSAN: data-race in udp_set_dev_scratch / udpv6_recvmsg

write to 0xffff888120278317 of 1 bytes by task 10411 on cpu 1:
 udp_set_dev_scratch+0xea/0x200 net/ipv4/udp.c:1308
 __first_packet_length+0x147/0x420 net/ipv4/udp.c:1556
 first_packet_length+0x68/0x2a0 net/ipv4/udp.c:1579
 udp_poll+0xea/0x110 net/ipv4/udp.c:2720
 sock_poll+0xed/0x250 net/socket.c:1256
 vfs_poll include/linux/poll.h:90 [inline]
 do_select+0x7d0/0x1020 fs/select.c:534
 core_sys_select+0x381/0x550 fs/select.c:677
 do_pselect.constprop.0+0x11d/0x160 fs/select.c:759
 __do_sys_pselect6 fs/select.c:784 [inline]
 __se_sys_pselect6 fs/select.c:769 [inline]
 __x64_sys_pselect6+0x12e/0x170 fs/select.c:769
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

read to 0xffff888120278317 of 1 bytes by task 10413 on cpu 0:
 udp_skb_csum_unnecessary include/net/udp.h:358 [inline]
 udpv6_recvmsg+0x43e/0xe90 net/ipv6/udp.c:310
 inet6_recvmsg+0xbb/0x240 net/ipv6/af_inet6.c:592
 sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
 ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
 do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
 __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
 __do_sys_recvmmsg net/socket.c:2703 [inline]
 __se_sys_recvmmsg net/socket.c:2696 [inline]
 __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 10413 Comm: syz-executor.0 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 2276f58ac5 ("udp: use a separate rx queue for packet reception")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28 13:53:40 -07:00
Nishad Kamdar 7de4344f2a net: dpaa2: Use the correct style for SPDX License Identifier
This patch corrects the SPDX License Identifier style in
header files related to DPAA2 Ethernet driver supporting
Freescale SoCs with DPAA2. For C header files
Documentation/process/license-rules.rst mandates C-like comments
(opposed to C source files where C++ style should be used)

Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28 13:40:17 -07:00
David S. Miller 2024305863 Merge branch 'net-avoid-KCSAN-splats'
Eric Dumazet says:

====================
net: avoid KCSAN splats

Often times we use skb_queue_empty() without holding a lock,
meaning that other cpus (or interrupt) can change the queue
under us. This is fine, but we need to properly annotate
the lockless intent to make sure the compiler wont over
optimize things.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28 13:33:41 -07:00
Eric Dumazet 7c422d0ce9 net: add READ_ONCE() annotation in __skb_wait_for_more_packets()
__skb_wait_for_more_packets() can be called while other cpus
can feed packets to the socket receive queue.

KCSAN reported :

BUG: KCSAN: data-race in __skb_wait_for_more_packets / __udp_enqueue_schedule_skb

write to 0xffff888102e40b58 of 8 bytes by interrupt on cpu 0:
 __skb_insert include/linux/skbuff.h:1852 [inline]
 __skb_queue_before include/linux/skbuff.h:1958 [inline]
 __skb_queue_tail include/linux/skbuff.h:1991 [inline]
 __udp_enqueue_schedule_skb+0x2d7/0x410 net/ipv4/udp.c:1470
 __udp_queue_rcv_skb net/ipv4/udp.c:1940 [inline]
 udp_queue_rcv_one_skb+0x7bd/0xc70 net/ipv4/udp.c:2057
 udp_queue_rcv_skb+0xb5/0x400 net/ipv4/udp.c:2074
 udp_unicast_rcv_skb.isra.0+0x7e/0x1c0 net/ipv4/udp.c:2233
 __udp4_lib_rcv+0xa44/0x17c0 net/ipv4/udp.c:2300
 udp_rcv+0x2b/0x40 net/ipv4/udp.c:2470
 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:442 [inline]
 ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
 process_backlog+0x1d3/0x420 net/core/dev.c:5955

read to 0xffff888102e40b58 of 8 bytes by task 13035 on cpu 1:
 __skb_wait_for_more_packets+0xfa/0x320 net/core/datagram.c:100
 __skb_recv_udp+0x374/0x500 net/ipv4/udp.c:1683
 udp_recvmsg+0xe1/0xb10 net/ipv4/udp.c:1712
 inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
 sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
 ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
 do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
 __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
 __do_sys_recvmmsg net/socket.c:2703 [inline]
 __se_sys_recvmmsg net/socket.c:2696 [inline]
 __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 13035 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28 13:33:41 -07:00
Eric Dumazet 3f926af3f4 net: use skb_queue_empty_lockless() in busy poll contexts
Busy polling usually runs without locks.
Let's use skb_queue_empty_lockless() instead of skb_queue_empty()

Also uses READ_ONCE() in __skb_try_recv_datagram() to address
a similar potential problem.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28 13:33:41 -07:00
Eric Dumazet 3ef7cf57c7 net: use skb_queue_empty_lockless() in poll() handlers
Many poll() handlers are lockless. Using skb_queue_empty_lockless()
instead of skb_queue_empty() is more appropriate.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28 13:33:41 -07:00
Eric Dumazet 137a0dbe34 udp: use skb_queue_empty_lockless()
syzbot reported a data-race [1].

We should use skb_queue_empty_lockless() to document that we are
not ensuring a mutual exclusion and silence KCSAN.

[1]
BUG: KCSAN: data-race in __skb_recv_udp / __udp_enqueue_schedule_skb

write to 0xffff888122474b50 of 8 bytes by interrupt on cpu 0:
 __skb_insert include/linux/skbuff.h:1852 [inline]
 __skb_queue_before include/linux/skbuff.h:1958 [inline]
 __skb_queue_tail include/linux/skbuff.h:1991 [inline]
 __udp_enqueue_schedule_skb+0x2c1/0x410 net/ipv4/udp.c:1470
 __udp_queue_rcv_skb net/ipv4/udp.c:1940 [inline]
 udp_queue_rcv_one_skb+0x7bd/0xc70 net/ipv4/udp.c:2057
 udp_queue_rcv_skb+0xb5/0x400 net/ipv4/udp.c:2074
 udp_unicast_rcv_skb.isra.0+0x7e/0x1c0 net/ipv4/udp.c:2233
 __udp4_lib_rcv+0xa44/0x17c0 net/ipv4/udp.c:2300
 udp_rcv+0x2b/0x40 net/ipv4/udp.c:2470
 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:442 [inline]
 ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
 process_backlog+0x1d3/0x420 net/core/dev.c:5955

read to 0xffff888122474b50 of 8 bytes by task 8921 on cpu 1:
 skb_queue_empty include/linux/skbuff.h:1494 [inline]
 __skb_recv_udp+0x18d/0x500 net/ipv4/udp.c:1653
 udp_recvmsg+0xe1/0xb10 net/ipv4/udp.c:1712
 inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
 sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
 ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
 do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
 __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
 __do_sys_recvmmsg net/socket.c:2703 [inline]
 __se_sys_recvmmsg net/socket.c:2696 [inline]
 __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 8921 Comm: syz-executor.4 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28 13:33:41 -07:00
Eric Dumazet d7d16a8935 net: add skb_queue_empty_lockless()
Some paths call skb_queue_empty() without holding
the queue lock. We must use a barrier in order
to not let the compiler do strange things, and avoid
KCSAN splats.

Adding a barrier in skb_queue_empty() might be overkill,
I prefer adding a new helper to clearly identify
points where the callers might be lockless. This might
help us finding real bugs.

The corresponding WRITE_ONCE() should add zero cost
for current compilers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28 13:33:41 -07:00