Commit graph

902094 commits

Author SHA1 Message Date
David S. Miller 549da33801 mlx5-updates-2020-02-27
mlx5 misc updates and minor cleanups:
 
 1) Use per vport tables for mirroring
 2) Improve log messages for SW steering (DR)
 3) Add devlink fdb_large_groups parameter
 4) E-Switch, Allow goto earlier chain
 5) Don't allow forwarding between uplink representors
 6) Add support for devlink-port in non-representors mode
 7) Minor misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl5YYYwACgkQSD+KveBX
 +j7vIQgAsFQSjJGdyUzVDfeHu94Ze79OMRstuEh9fbcNXeyawMTShPLPE94E91qo
 Nj6AfZp8sH9M6fLtFcOMrPou87ITslP3mQ/C2cYWWJn/iq9RN13kO+howZNeRhlk
 iettT8qHGYR7aU3tBU38pc9/bWVHRPt+FZGBeCAcvwgQ1zfjBEtPcMJ8svjsfETA
 0bzEDEq0eKEaynHr3phJpzbcnvzm3154HkfZtDZFAApgr4tpEGSlFa4n+8ctcmqy
 zf82HH8GnB+GhTUPU3W33NwyH2otHOcE3dPAepQNmS3f8EemMJebAtjkexo6SfjY
 rwUs5qTU05myOy5RPmzwhQsnzRyPBw==
 =z2AA
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2020-02-27' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-02-27

mlx5 misc updates and minor cleanups:

1) Use per vport tables for mirroring
2) Improve log messages for SW steering (DR)
3) Add devlink fdb_large_groups parameter
4) E-Switch, Allow goto earlier chain
5) Don't allow forwarding between uplink representors
6) Add support for devlink-port in non-representors mode
7) Minor misc cleanups
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-28 11:59:53 -08:00
Alexei Starovoitov 812285fa5a Merge branch 'bpf_sk_storage_via_inet_diag'
Martin KaFai Lau says:

====================
The bpf_prog can store specific info to a sk by using bpf_sk_storage.
In other words, a sk can be extended by a bpf_prog.

This series is to support providing bpf_sk_storage data during inet_diag's
dump.  The primary target is the usage like iproute2's "ss".

The first two patches are refactoring works in inet_diag to make
adding bpf_sk_storage support easier.  The next two patches do
the actual work.

Please see individual patch for details.

v2:
- Add commit message for u16 to u32 change in min_dump_alloc in Patch 4 (Song)
- Add comment to explain the !skb->len check in __inet_diag_dump in Patch 4.
- Do the map->map_type check earlier in Patch 3 for readability.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-02-27 18:53:37 -08:00
Martin KaFai Lau 085c20cacf bpf: inet_diag: Dump bpf_sk_storages in inet_diag_dump()
This patch will dump out the bpf_sk_storages of a sk
if the request has the INET_DIAG_REQ_SK_BPF_STORAGES nlattr.

An array of SK_DIAG_BPF_STORAGE_REQ_MAP_FD can be specified in
INET_DIAG_REQ_SK_BPF_STORAGES to select which bpf_sk_storage to dump.
If no map_fd is specified, all bpf_sk_storages of a sk will be dumped.

bpf_sk_storages can be added to the system at runtime.  It is difficult
to find a proper static value for cb->min_dump_alloc.

This patch learns the nlattr size required to dump the bpf_sk_storages
of a sk.  If it happens to be the very first nlmsg of a dump and it
cannot fit the needed bpf_sk_storages,  it will try to expand the
skb by "pskb_expand_head()".

Instead of expanding it in inet_sk_diag_fill(), it is expanded at a
sleepable context in __inet_diag_dump() so __GFP_DIRECT_RECLAIM can
be used.  In __inet_diag_dump(), it will retry as long as the
skb is empty and the cb->min_dump_alloc becomes larger than before.
cb->min_dump_alloc is bounded by KMALLOC_MAX_SIZE.  The min_dump_alloc
is also changed from 'u16' to 'u32' to accommodate a sk that may have
a few large bpf_sk_storages.

The updated cb->min_dump_alloc will also be used to allocate the skb in
the next dump.  This logic already exists in netlink_dump().

Here is the sample output of a locally modified 'ss' and it could be made
more readable by using BTF later:
[root@arch-fb-vm1 ~]# ss --bpf-map-id 14 --bpf-map-id 13 -t6an 'dst [::1]:8989'
State Recv-Q Send-Q Local Address:Port  Peer Address:PortProcess
ESTAB 0      0              [::1]:51072        [::1]:8989
	 bpf_map_id:14 value:[ 3feb ]
	 bpf_map_id:13 value:[ 3f ]
ESTAB 0      0              [::1]:51070        [::1]:8989
	 bpf_map_id:14 value:[ 3feb ]
	 bpf_map_id:13 value:[ 3f ]

[root@arch-fb-vm1 ~]# ~/devshare/github/iproute2/misc/ss --bpf-maps -t6an 'dst [::1]:8989'
State         Recv-Q         Send-Q                   Local Address:Port                    Peer Address:Port         Process
ESTAB         0              0                                [::1]:51072                          [::1]:8989
	 bpf_map_id:14 value:[ 3feb ]
	 bpf_map_id:13 value:[ 3f ]
	 bpf_map_id:12 value:[ 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000... total:65407 ]
ESTAB         0              0                                [::1]:51070                          [::1]:8989
	 bpf_map_id:14 value:[ 3feb ]
	 bpf_map_id:13 value:[ 3f ]
	 bpf_map_id:12 value:[ 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000... total:65407 ]

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200225230427.1976129-1-kafai@fb.com
2020-02-27 18:50:19 -08:00
Martin KaFai Lau 1ed4d92458 bpf: INET_DIAG support in bpf_sk_storage
This patch adds INET_DIAG support to bpf_sk_storage.

1. Although this series adds bpf_sk_storage diag capability to inet sk,
   bpf_sk_storage is in general applicable to all fullsock.  Hence, the
   bpf_sk_storage logic will operate on SK_DIAG_* nlattr.  The caller
   will pass in its specific nesting nlattr (e.g. INET_DIAG_*) as
   the argument.

2. The request will be like:
	INET_DIAG_REQ_SK_BPF_STORAGES (nla_nest) (defined in latter patch)
		SK_DIAG_BPF_STORAGE_REQ_MAP_FD (nla_put_u32)
		SK_DIAG_BPF_STORAGE_REQ_MAP_FD (nla_put_u32)
		......

   Considering there could have multiple bpf_sk_storages in a sk,
   instead of reusing INET_DIAG_INFO ("ss -i"),  the user can select
   some specific bpf_sk_storage to dump by specifying an array of
   SK_DIAG_BPF_STORAGE_REQ_MAP_FD.

   If no SK_DIAG_BPF_STORAGE_REQ_MAP_FD is specified (i.e. an empty
   INET_DIAG_REQ_SK_BPF_STORAGES), it will dump all bpf_sk_storages
   of a sk.

3. The reply will be like:
	INET_DIAG_BPF_SK_STORAGES (nla_nest) (defined in latter patch)
		SK_DIAG_BPF_STORAGE (nla_nest)
			SK_DIAG_BPF_STORAGE_MAP_ID (nla_put_u32)
			SK_DIAG_BPF_STORAGE_MAP_VALUE (nla_reserve_64bit)
		SK_DIAG_BPF_STORAGE (nla_nest)
			SK_DIAG_BPF_STORAGE_MAP_ID (nla_put_u32)
			SK_DIAG_BPF_STORAGE_MAP_VALUE (nla_reserve_64bit)
		......

4. Unlike other INET_DIAG info of a sk which is pretty static, the size
   required to dump the bpf_sk_storage(s) of a sk is dynamic as the
   system adding more bpf_sk_storage_map.  It is hard to set a static
   min_dump_alloc size.

   Hence, this series learns it at the runtime and adjust the
   cb->min_dump_alloc as it iterates all sk(s) of a system.  The
   "unsigned int *res_diag_size" in bpf_sk_storage_diag_put()
   is for this purpose.

   The next patch will update the cb->min_dump_alloc as it
   iterates the sk(s).

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200225230421.1975729-1-kafai@fb.com
2020-02-27 18:50:19 -08:00
Martin KaFai Lau 0df6d32842 inet_diag: Move the INET_DIAG_REQ_BYTECODE nlattr to cb->data
The INET_DIAG_REQ_BYTECODE nlattr is currently re-found every time when
the "dump()" is re-started.

In a latter patch, it will also need to parse the new
INET_DIAG_REQ_SK_BPF_STORAGES nlattr to learn the map_fds. Thus, this
patch takes this chance to store the parsed nlattr in cb->data
during the "start" time of a dump.

By doing this, the "bc" argument also becomes unnecessary
and is removed.  Also, the two copies of the INET_DIAG_REQ_BYTECODE
parsing-audit logic between compat/current version can be
consolidated to one.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200225230415.1975555-1-kafai@fb.com
2020-02-27 18:50:19 -08:00
Martin KaFai Lau 5682d393b4 inet_diag: Refactor inet_sk_diag_fill(), dump(), and dump_one()
In a latter patch, there is a need to update "cb->min_dump_alloc"
in inet_sk_diag_fill() as it learns the diffierent bpf_sk_storages
stored in a sk while dumping all sk(s) (e.g. tcp_hashinfo).

The inet_sk_diag_fill() currently does not take the "cb" as an argument.
One of the reason is inet_sk_diag_fill() is used by both dump_one()
and dump() (which belong to the "struct inet_diag_handler".  The dump_one()
interface does not pass the "cb" along.

This patch is to make dump_one() pass a "cb".  The "cb" is created in
inet_diag_cmd_exact().  The "nlh" and "in_skb" are stored in "cb" as
the dump() interface does.  The total number of args in
inet_sk_diag_fill() is also cut from 10 to 7 and
that helps many callers to pass fewer args.

In particular,
"struct user_namespace *user_ns", "u32 pid", and "u32 seq"
can be replaced by accessing "cb->nlh" and "cb->skb".

A similar argument reduction is also made to
inet_twsk_diag_fill() and inet_req_diag_fill().

inet_csk_diag_dump() and inet_csk_diag_fill() are also removed.
They are mostly equivalent to inet_sk_diag_fill().  Their repeated
usages are very limited.  Thus, inet_sk_diag_fill() is directly used
in those occasions.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200225230409.1975173-1-kafai@fb.com
2020-02-27 18:50:19 -08:00
David S. Miller 9f6e055907 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
The mptcp conflict was overlapping additions.

The SMC conflict was an additional and removal happening at the same
time.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 18:31:39 -08:00
Roi Dayan bc1d75fa79 net/mlx5e: Remove redundant comment about goto slow path
The code is self explanatory and makes the comment redundant.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-27 16:40:42 -08:00
Eli Cohen 178f69b477 net/mlx5e: Reduce number of arguments in slow path handling
mlx5e_tc_offload_to_slow_path() and mlx5e_tc_unoffload_from_slow_path()
take an extra argument allocated on the stack of the caller but not used
by the caller. Avoid the extra argument and use local variable in the
function itself.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-27 16:40:39 -08:00
Eli Cohen dec481c86e net/mlx5e: Remove unused argument from parse_tc_pedit_action()
parse_attr is not used by parse_tc_pedit_action() so revmove it.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-27 16:40:36 -08:00
Roi Dayan 61644c3de8 net/mlx5e: Use NL_SET_ERR_MSG_MOD() extack for errors
This to be consistent and adds the module name to the error message.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-27 16:40:34 -08:00
Roi Dayan 4ccd83f40c net/mlx5e: Use netdev_warn() instead of pr_err() for errors
This is for added netdev prefix that helps identify
the source of the message.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-27 16:40:31 -08:00
Roi Dayan 237ac8ded4 net/mlx5e: Use netdev_warn() for errors for added prefix
This helps identify the source of the message.
If netdev still doesn't exists use mlx5_core_warn().

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-27 16:40:28 -08:00
Erez Shitrit b7d0db5520 net/mlx5: DR, Improve log messages
Few print messages are in debug level where they should be in error, and
few messages are missing.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-27 16:40:25 -08:00
Hamdan Igbaria f64092997f net/mlx5: DR, Change matcher priority parameter type
Change matcher priority parameter type from u16 to u32,
this change is needed since sometimes upper levels
create a matcher with priority bigger than 2^16.

Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-27 16:40:23 -08:00
Jianbo Liu 87dac697a0 net/mlx5e: Add devlink fdb_large_groups parameter
Add a devlink parameter to control the number of large groups in a
autogrouped flow table. The default value is 15, and the range is between 1
and 1024.

The size of each large group can be calculated according to the following
formula: size = 4M / (fdb_large_groups + 1).

Examples:
- Set the number of large groups to 20.
    $ devlink dev param set pci/0000:82:00.0 name fdb_large_groups \
      cmode driverinit value 20

  Then run devlink reload command to apply the new value.
    $ devlink dev reload pci/0000:82:00.0

- Read the number of large groups in flow table.
    $ devlink dev param show pci/0000:82:00.0 name fdb_large_groups
    pci/0000:82:00.0:
      name fdb_large_groups type driver-specific
        values:
          cmode driverinit value 20

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-27 16:40:19 -08:00
Jianbo Liu 8aa9f3be73 net/mlx5: Change the name of steering mode param id
The prefix should be "MLX5_DEVLINK_PARAM_ID_" for all in
mlx5_devlink_param_id enum.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-27 16:40:17 -08:00
Vladyslav Tarasiuk c6acd629ee net/mlx5e: Add support for devlink-port in non-representors mode
Added devlink_port field to mlx5e_priv structure and a callback to
netdev ops to enable devlink to get info about the port. The port
registration happens at driver initialization.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-27 16:40:14 -08:00
Vladyslav Tarasiuk ab8f963a11 net/mlx5e: Rename representor get devlink port function
Rename representor's mlx5e_get_devlink_port() to
mlx5e_rep_get_devlink_port().
The downstream patch will add a non-representor mlx5e function called
mlx5e_get_devlink_phy_port().

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-27 16:40:11 -08:00
Roi Dayan 297eaf5b95 net/mlx5: E-Switch, Allow goto earlier chain if FW supports it
Mellanox FW can support this if ignore_flow_level capability exists.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-27 16:40:08 -08:00
Eli Cohen 96e326878f net/mlx5e: Eswitch, Use per vport tables for mirroring
When using port mirroring, we forward the traffic to another table and
use that table to forward to the mirrored vport. Since the hardware
loses the values of reg c, and in particular reg c0, we fail the match
on the input vport which previously existed in reg c0. To overcome this
situation, we use a set of per vport tables, positioned at the lowest
priority, and forward traffic to those tables. Since these tables are
per vport, we can avoid matching on reg c0.

Fixes: c01cfd0f11 ("net/mlx5: E-Switch, Add match on vport metadata for rule in fast path")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-27 16:40:05 -08:00
Eli Cohen 1708dd5468 net/mlx5: Eswitch, avoid redundant mask
misc_params.source_port is a 16 bit field already so no need for
redundant masking against 0xffff. Also change local variables type to
u16.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-27 16:40:02 -08:00
Tonghao Zhang ffec97020f net/mlx5e: Don't allow forwarding between uplink
We can install forwarding packets rule between uplink
in switchdev mode, as show below. But the hardware does
not do that as expected (mlnx_perf -i $PF1, we can't get
the counter of the PF1). By the way, if we add the uplink
PF0, PF1 to Open vSwitch and enable hw-offload, the rules
can be offloaded but not work fine too. This patch add a
check and if so return -EOPNOTSUPP.

$ tc filter add dev $PF0 protocol all parent ffff: prio 1 handle 1 \
    flower skip_sw action mirred egress redirect dev $PF1

$ tc -d -s filter show dev $PF0 ingress
    skip_sw
    in_hw in_hw_count 1
    action order 1: mirred (Egress Redirect to device enp130s0f1) stolen
    ...
    Sent hardware 408954 bytes 4173 pkt
    ...

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-27 16:39:59 -08:00
Linus Torvalds 7058b83789 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix leak in nl80211 AP start where we leak the ACL memory, from
    Johannes Berg.

 2) Fix double mutex unlock in mac80211, from Andrei Otcheretianski.

 3) Fix RCU stall in ipset, from Jozsef Kadlecsik.

 4) Fix devlink locking in devlink_dpipe_table_register, from Madhuparna
    Bhowmik.

 5) Fix race causing TX hang in ll_temac, from Esben Haabendal.

 6) Stale eth hdr pointer in br_dev_xmit(), from Nikolay Aleksandrov.

 7) Fix TX hash calculation bounds checking wrt. tc rules, from Amritha
    Nambiar.

 8) Size netlink responses properly in schedule action code to take into
    consideration TCA_ACT_FLAGS. From Jiri Pirko.

 9) Fix firmware paths for mscc PHY driver, from Antoine Tenart.

10) Don't register stmmac notifier multiple times, from Aaro Koskinen.

11) Various rmnet bug fixes, from Taehee Yoo.

12) Fix vsock deadlock in vsock transport release, from Stefano
    Garzarella.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
  net: dsa: mv88e6xxx: Fix masking of egress port
  mlxsw: pci: Wait longer before accessing the device after reset
  sfc: fix timestamp reconstruction at 16-bit rollover points
  vsock: fix potential deadlock in transport->release()
  unix: It's CONFIG_PROC_FS not CONFIG_PROCFS
  net: rmnet: fix packet forwarding in rmnet bridge mode
  net: rmnet: fix bridge mode bugs
  net: rmnet: use upper/lower device infrastructure
  net: rmnet: do not allow to change mux id if mux id is duplicated
  net: rmnet: remove rcu_read_lock in rmnet_force_unassociate_device()
  net: rmnet: fix suspicious RCU usage
  net: rmnet: fix NULL pointer dereference in rmnet_changelink()
  net: rmnet: fix NULL pointer dereference in rmnet_newlink()
  net: phy: marvell: don't interpret PHY status unless resolved
  mlx5: register lag notifier for init network namespace only
  unix: define and set show_fdinfo only if procfs is enabled
  hinic: fix a bug of rss configuration
  hinic: fix a bug of setting hw_ioctxt
  hinic: fix a irq affinity bug
  net/smc: check for valid ib_client_data
  ...
2020-02-27 16:34:41 -08:00
Gustavo A. R. Silva d7f10df862 bpf: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200227001744.GA3317@embeddedor
2020-02-28 01:21:02 +01:00
Andrew Lunn 3ee339eb28 net: dsa: mv88e6xxx: Fix masking of egress port
Add missing ~ to the usage of the mask.

Reported-by: Kevin Benson <Kevin.Benson@zii.aero>
Reported-by: Chris Healy <Chris.Healy@zii.aero>
Fixes: 5c74c54ce6 ("net: dsa: mv88e6xxx: Split monitor port configuration")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:29:09 -08:00
Amit Cohen ac004e8416 mlxsw: pci: Wait longer before accessing the device after reset
During initialization the driver issues a reset to the device and waits
for 100ms before checking if the firmware is ready. The waiting is
necessary because before that the device is irresponsive and the first
read can result in a completion timeout.

While 100ms is sufficient for Spectrum-1 and Spectrum-2, it is
insufficient for Spectrum-3.

Fix this by increasing the timeout to 200ms.

Fixes: da382875c6 ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:09:22 -08:00
Jiri Pirko ec4a514a68 mlxsw: reg: Update module_type values in PMTM register and map them to width
There are couple new values that PMTM register can return
in module_type field. Add them and map them to module width in
mlxsw_core_module_max_width(). Fix the existing names on the way.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:08:46 -08:00
Gustavo A. R. Silva 37e1244a79 WAN: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:06:55 -08:00
Gustavo A. R. Silva da60fbe7ef NFC: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:06:20 -08:00
Alex Maftei (amaftei) 23797b9890 sfc: fix timestamp reconstruction at 16-bit rollover points
We can't just use the top bits of the last sync event as they could be
off-by-one every 65,536 seconds, giving an error in reconstruction of
65,536 seconds.

This patch uses the difference in the bottom 16 bits (mod 2^16) to
calculate an offset that needs to be applied to the last sync event to
get to the current time.

Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com>
Acked-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:04:49 -08:00
Stefano Garzarella 3f74957fcb vsock: fix potential deadlock in transport->release()
Some transports (hyperv, virtio) acquire the sock lock during the
.release() callback.

In the vsock_stream_connect() we call vsock_assign_transport(); if
the socket was previously assigned to another transport, the
vsk->transport->release() is called, but the sock lock is already
held in the vsock_stream_connect(), causing a deadlock reported by
syzbot:

    INFO: task syz-executor280:9768 blocked for more than 143 seconds.
      Not tainted 5.6.0-rc1-syzkaller #0
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    syz-executor280 D27912  9768   9766 0x00000000
    Call Trace:
     context_switch kernel/sched/core.c:3386 [inline]
     __schedule+0x934/0x1f90 kernel/sched/core.c:4082
     schedule+0xdc/0x2b0 kernel/sched/core.c:4156
     __lock_sock+0x165/0x290 net/core/sock.c:2413
     lock_sock_nested+0xfe/0x120 net/core/sock.c:2938
     virtio_transport_release+0xc4/0xd60 net/vmw_vsock/virtio_transport_common.c:832
     vsock_assign_transport+0xf3/0x3b0 net/vmw_vsock/af_vsock.c:454
     vsock_stream_connect+0x2b3/0xc70 net/vmw_vsock/af_vsock.c:1288
     __sys_connect_file+0x161/0x1c0 net/socket.c:1857
     __sys_connect+0x174/0x1b0 net/socket.c:1874
     __do_sys_connect net/socket.c:1885 [inline]
     __se_sys_connect net/socket.c:1882 [inline]
     __x64_sys_connect+0x73/0xb0 net/socket.c:1882
     do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
     entry_SYSCALL_64_after_hwframe+0x49/0xbe

To avoid this issue, this patch remove the lock acquiring in the
.release() callback of hyperv and virtio transports, and it holds
the lock when we call vsk->transport->release() in the vsock core.

Reported-by: syzbot+731710996d79d0d58fbc@syzkaller.appspotmail.com
Fixes: 408624af4c ("vsock: use local transport when it is loaded")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:03:56 -08:00
David S. Miller 6dd7f1a1fe Merge branch 'rework-phylink-interface-for-split-MAC-PCS-support'
Russell King says:

====================
rework phylink interface for split MAC/PCS support

The following series changes the phylink interface to allow us to
better support split MAC / MAC PCS setups.  The fundamental change
required for this turns out to be quite simple.

Today, mac_config() is used for everything to do with setting the
parameters for the MAC, and mac_link_up() is used to inform the
MAC driver that the link is now up (and so to allow packet flow.)
mac_config() also has had a few implementation issues, with folk
who believe that members such as "speed" and "duplex" are always
valid, where "link" gets used inappropriately, etc.

With the proposed patches, all this changes subtly - but in a
backwards compatible way at this stage.

We pass the the full resolved link state (speed, duplex, pause) to
mac_link_up(), and it is now guaranteed that these parameters to
this function will always be valid (no more SPEED_UNKNOWN or
DUPLEX_UNKNOWN here - unless phylink is fed with such things.)

Drivers should convert over to using the state in mac_link_up()
rather than configuring the speed, duplex and pause in the
mac_config() method. The patch series includes a number of MAC
drivers which I've thought have been easy targets - I've left the
remainder as I think they need maintainer input. However, *all*
drivers will need conversion for future phylink development.

v2: add ocelot/felix and qca/ar9331 DSA drivers to patch 2, add
  received tested-by so far.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:02:14 -08:00
Russell King 24cb72df1a net: mvpp2: use resolved link config in mac_link_up()
Convert the Marvell mvpp2 ethernet driver to use the finalised link
parameters in mac_link_up() rather than the parameters in mac_config().

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:02:14 -08:00
Russell King ff03f0b152 net: mvneta: use resolved link config in mac_link_up()
Convert the Marvell mvneta ethernet driver to use the finalised link
parameters in mac_link_up() rather than the parameters in mac_config().

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:02:14 -08:00
Russell King 633e98a711 net: macb: use resolved link config in mac_link_up()
Convert the macb ethernet driver to use the finalised link
parameters in mac_link_up() rather than the parameters in mac_config().

Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:02:14 -08:00
Russell King 37556a4ac4 net: dpaa2-mac: use resolved link config in mac_link_up()
Convert the DPAA2 ethernet driver to use the finalised link parameters
in mac_link_up() rather than the parameters in mac_config(), which are
more suited to the needs of the DPAA2 MC firmware than those available
via mac_config().

Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:02:14 -08:00
Russell King 9534784209 net: axienet: use resolved link config in mac_link_up()
Convert the Xilinx AXI ethernet driver to use the finalised link
parameters in mac_link_up() rather than the parameters in mac_config().

Tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:02:14 -08:00
Russell King 30c4a5b0aa net: mv88e6xxx: use resolved link config in mac_link_up()
Use the resolved link configuration to set the MAC configuration when
mac_link_up() for non-internal-PHY ports.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:02:14 -08:00
Russell King 5b502a7b29 net: dsa: propagate resolved link config via mac_link_up()
Propagate the resolved link configuration down via DSA's
phylink_mac_link_up() operation to allow split PCS/MAC to work.

Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:02:14 -08:00
Russell King 91a208f218 net: phylink: propagate resolved link config via mac_link_up()
Propagate the resolved link parameters via the mac_link_up() call for
MACs that do not automatically track their PCS state. We propagate the
link parameters via function arguments so that inappropriate members
of struct phylink_link_state can't be accessed, and creating a new
structure just for this adds needless complexity to the API.

Tested-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:02:14 -08:00
David S. Miller 5c05a164d4 unix: It's CONFIG_PROC_FS not CONFIG_PROCFS
Fixes: 3a12500ed5 ("unix: define and set show_fdinfo only if procfs is enabled")
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 11:52:35 -08:00
David S. Miller 795c03a5d0 Merge branch 'net-rmnet-fix-several-bugs'
Taehee Yoo says:

====================
net: rmnet: fix several bugs

This patchset is to fix several bugs in RMNET module.

1. The first patch fixes NULL-ptr-deref in rmnet_newlink().
When rmnet interface is being created, it uses IFLA_LINK
without checking NULL.
So, if userspace doesn't set IFLA_LINK, panic will occur.
In this patch, checking NULL pointer code is added.

2. The second patch fixes NULL-ptr-deref in rmnet_changelink().
To get real device in rmnet_changelink(), it uses IFLA_LINK.
But, IFLA_LINK should not be used in rmnet_changelink().

3. The third patch fixes suspicious RCU usage in rmnet_get_port().
rmnet_get_port() uses rcu_dereference_rtnl().
But, rmnet_get_port() is used by datapath.
So, rcu_dereference_bh() should be used instead of rcu_dereference_rtnl().

4. The fourth patch fixes suspicious RCU usage in
rmnet_force_unassociate_device().
RCU critical section should not be scheduled.
But, unregister_netdevice_queue() in the rmnet_force_unassociate_device()
would be scheduled.
So, the RCU warning occurs.
In this patch, the rcu_read_lock() in the rmnet_force_unassociate_device()
is removed because it's unnecessary.

5. The fifth patch fixes duplicate MUX ID case.
RMNET MUX ID is unique.
So, rmnet interface isn't allowed to be created, which have
a duplicate MUX ID.
But, only rmnet_newlink() checks this condition, rmnet_changelink()
doesn't check this.
So, duplicate MUX ID case would happen.

6. The sixth patch fixes upper/lower interface relationship problems.
When IFLA_LINK is used, the upper/lower infrastructure should be used.
Because it checks the maximum depth of upper/lower interfaces and it also
checks circular interface relationship, etc.
In this patch, netdev_upper_dev_link() is used.

7. The seventh patch fixes bridge related problems.
a) ->ndo_del_slave() doesn't work.
b) It couldn't detect circular upper/lower interface relationship.
c) It couldn't prevent stack overflow because of too deep depth
of upper/lower interface
d) It doesn't check the number of lower interfaces.
e) Panics because of several reasons.
These problems are actually the same problem.
So, this patch fixes these problems.

8. The eighth patch fixes packet forwarding issue in bridge mode
Packet forwarding is not working in rmnet bridge mode.
Because when a packet is forwarded, skb_push() for an ethernet header
is needed. But it doesn't call skb_push().
So, the ethernet header will be lost.

Change log:
 - update commit logs.
 - drop two patches in this patchset because of wrong target branch.
   - ("net: rmnet: add missing module alias")
   - ("net: rmnet: print error message when command fails")
 - remove unneessary rcu_read_lock() in the third patch.
 - use rcu_dereference_bh() instead of rcu_dereference in third patch.
 - do not allow to add a bridge device if rmnet interface is already
   bridge mode in the seventh patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 11:45:07 -08:00
Taehee Yoo ad3cc31b59 net: rmnet: fix packet forwarding in rmnet bridge mode
Packet forwarding is not working in rmnet bridge mode.
Because when a packet is forwarded, skb_push() for an ethernet header
is needed. But it doesn't call skb_push().
So, the ethernet header will be lost.

Test commands:
    modprobe rmnet
    ip netns add nst
    ip netns add nst2
    ip link add veth0 type veth peer name veth1
    ip link add veth2 type veth peer name veth3
    ip link set veth1 netns nst
    ip link set veth3 netns nst2

    ip link add rmnet0 link veth0 type rmnet mux_id 1
    ip link set veth2 master rmnet0
    ip link set veth0 up
    ip link set veth2 up
    ip link set rmnet0 up
    ip a a 192.168.100.1/24 dev rmnet0

    ip netns exec nst ip link set veth1 up
    ip netns exec nst ip a a 192.168.100.2/24 dev veth1
    ip netns exec nst2 ip link set veth3 up
    ip netns exec nst2 ip a a 192.168.100.3/24 dev veth3
    ip netns exec nst2 ping 192.168.100.2

Fixes: 60d58f971c ("net: qualcomm: rmnet: Implement bridge mode")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 11:45:07 -08:00
Taehee Yoo d939b6d30b net: rmnet: fix bridge mode bugs
In order to attach a bridge interface to the rmnet interface,
"master" operation is used.
(e.g. ip link set dummy1 master rmnet0)
But, in the rmnet_add_bridge(), which is a callback of ->ndo_add_slave()
doesn't register lower interface.
So, ->ndo_del_slave() doesn't work.
There are other problems too.
1. It couldn't detect circular upper/lower interface relationship.
2. It couldn't prevent stack overflow because of too deep depth
of upper/lower interface
3. It doesn't check the number of lower interfaces.
4. Panics because of several reasons.

The root problem of these issues is actually the same.
So, in this patch, these all problems will be fixed.

Test commands:
    modprobe rmnet
    ip link add dummy0 type dummy
    ip link add rmnet0 link dummy0 type rmnet mux_id 1
    ip link add dummy1 master rmnet0 type dummy
    ip link add dummy2 master rmnet0 type dummy
    ip link del rmnet0
    ip link del dummy2
    ip link del dummy1

Splat looks like:
[   41.867595][ T1164] general protection fault, probably for non-canonical address 0xdffffc0000000101I
[   41.869993][ T1164] KASAN: null-ptr-deref in range [0x0000000000000808-0x000000000000080f]
[   41.872950][ T1164] CPU: 0 PID: 1164 Comm: ip Not tainted 5.6.0-rc1+ #447
[   41.873915][ T1164] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   41.875161][ T1164] RIP: 0010:rmnet_unregister_bridge.isra.6+0x71/0xf0 [rmnet]
[   41.876178][ T1164] Code: 48 89 ef 48 89 c6 5b 5d e9 fc fe ff ff e8 f7 f3 ff ff 48 8d b8 08 08 00 00 48 ba 00 7
[   41.878925][ T1164] RSP: 0018:ffff8880c4d0f188 EFLAGS: 00010202
[   41.879774][ T1164] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000101
[   41.887689][ T1164] RDX: dffffc0000000000 RSI: ffffffffb8cf64f0 RDI: 0000000000000808
[   41.888727][ T1164] RBP: ffff8880c40e4000 R08: ffffed101b3c0e3c R09: 0000000000000001
[   41.889749][ T1164] R10: 0000000000000001 R11: ffffed101b3c0e3b R12: 1ffff110189a1e3c
[   41.890783][ T1164] R13: ffff8880c4d0f200 R14: ffffffffb8d56160 R15: ffff8880ccc2c000
[   41.891794][ T1164] FS:  00007f4300edc0c0(0000) GS:ffff8880d9c00000(0000) knlGS:0000000000000000
[   41.892953][ T1164] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   41.893800][ T1164] CR2: 00007f43003bc8c0 CR3: 00000000ca53e001 CR4: 00000000000606f0
[   41.894824][ T1164] Call Trace:
[   41.895274][ T1164]  ? rcu_is_watching+0x2c/0x80
[   41.895895][ T1164]  rmnet_config_notify_cb+0x1f7/0x590 [rmnet]
[   41.896687][ T1164]  ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet]
[   41.897611][ T1164]  ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet]
[   41.898508][ T1164]  ? __module_text_address+0x13/0x140
[   41.899162][ T1164]  notifier_call_chain+0x90/0x160
[   41.899814][ T1164]  rollback_registered_many+0x660/0xcf0
[   41.900544][ T1164]  ? netif_set_real_num_tx_queues+0x780/0x780
[   41.901316][ T1164]  ? __lock_acquire+0xdfe/0x3de0
[   41.901958][ T1164]  ? memset+0x1f/0x40
[   41.902468][ T1164]  ? __nla_validate_parse+0x98/0x1ab0
[   41.903166][ T1164]  unregister_netdevice_many.part.133+0x13/0x1b0
[   41.903988][ T1164]  rtnl_delete_link+0xbc/0x100
[ ... ]

Fixes: 60d58f971c ("net: qualcomm: rmnet: Implement bridge mode")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 11:45:07 -08:00
Taehee Yoo 037f9cdf72 net: rmnet: use upper/lower device infrastructure
netdev_upper_dev_link() is useful to manage lower/upper interfaces.
And this function internally validates looping, maximum depth.
All or most virtual interfaces that could have a real interface
(e.g. macsec, macvlan, ipvlan etc.) use lower/upper infrastructure.

Test commands:
    modprobe rmnet
    ip link add dummy0 type dummy
    ip link add rmnet1 link dummy0 type rmnet mux_id 1
    for i in {2..100}
    do
        let A=$i-1
        ip link add rmnet$i link rmnet$A type rmnet mux_id $i
    done
    ip link del dummy0

The purpose of the test commands is to make stack overflow.

Splat looks like:
[   52.411438][ T1395] BUG: KASAN: slab-out-of-bounds in find_busiest_group+0x27e/0x2c00
[   52.413218][ T1395] Write of size 64 at addr ffff8880c774bde0 by task ip/1395
[   52.414841][ T1395]
[   52.430720][ T1395] CPU: 1 PID: 1395 Comm: ip Not tainted 5.6.0-rc1+ #447
[   52.496511][ T1395] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   52.513597][ T1395] Call Trace:
[   52.546516][ T1395]
[   52.558773][ T1395] Allocated by task 3171537984:
[   52.588290][ T1395] BUG: unable to handle page fault for address: ffffffffb999e260
[   52.589311][ T1395] #PF: supervisor read access in kernel mode
[   52.590529][ T1395] #PF: error_code(0x0000) - not-present page
[   52.591374][ T1395] PGD d6818067 P4D d6818067 PUD d6819063 PMD 0
[   52.592288][ T1395] Thread overran stack, or stack corrupted
[   52.604980][ T1395] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[   52.605856][ T1395] CPU: 1 PID: 1395 Comm: ip Not tainted 5.6.0-rc1+ #447
[   52.611764][ T1395] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   52.621520][ T1395] RIP: 0010:stack_depot_fetch+0x10/0x30
[   52.622296][ T1395] Code: ff e9 f9 fe ff ff 48 89 df e8 9c 1d 91 ff e9 ca fe ff ff cc cc cc cc cc cc cc 89 f8 0
[   52.627887][ T1395] RSP: 0018:ffff8880c774bb60 EFLAGS: 00010006
[   52.628735][ T1395] RAX: 00000000001f8880 RBX: ffff8880c774d140 RCX: 0000000000000000
[   52.631773][ T1395] RDX: 000000000000001d RSI: ffff8880c774bb68 RDI: 0000000000003ff0
[   52.649584][ T1395] RBP: ffffea00031dd200 R08: ffffed101b43e403 R09: ffffed101b43e403
[   52.674857][ T1395] R10: 0000000000000001 R11: ffffed101b43e402 R12: ffff8880d900e5c0
[   52.678257][ T1395] R13: ffff8880c774c000 R14: 0000000000000000 R15: dffffc0000000000
[   52.694541][ T1395] FS:  00007fe867f6e0c0(0000) GS:ffff8880da000000(0000) knlGS:0000000000000000
[   52.764039][ T1395] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   52.815008][ T1395] CR2: ffffffffb999e260 CR3: 00000000c26aa005 CR4: 00000000000606e0
[   52.862312][ T1395] Call Trace:
[   52.887133][ T1395] Modules linked in: dummy rmnet veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_dex
[   52.936749][ T1395] CR2: ffffffffb999e260
[   52.965695][ T1395] ---[ end trace 7e32ca99482dbb31 ]---
[   52.966556][ T1395] RIP: 0010:stack_depot_fetch+0x10/0x30
[   52.971083][ T1395] Code: ff e9 f9 fe ff ff 48 89 df e8 9c 1d 91 ff e9 ca fe ff ff cc cc cc cc cc cc cc 89 f8 0
[   53.003650][ T1395] RSP: 0018:ffff8880c774bb60 EFLAGS: 00010006
[   53.043183][ T1395] RAX: 00000000001f8880 RBX: ffff8880c774d140 RCX: 0000000000000000
[   53.076480][ T1395] RDX: 000000000000001d RSI: ffff8880c774bb68 RDI: 0000000000003ff0
[   53.093858][ T1395] RBP: ffffea00031dd200 R08: ffffed101b43e403 R09: ffffed101b43e403
[   53.112795][ T1395] R10: 0000000000000001 R11: ffffed101b43e402 R12: ffff8880d900e5c0
[   53.139837][ T1395] R13: ffff8880c774c000 R14: 0000000000000000 R15: dffffc0000000000
[   53.141500][ T1395] FS:  00007fe867f6e0c0(0000) GS:ffff8880da000000(0000) knlGS:0000000000000000
[   53.143343][ T1395] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.152007][ T1395] CR2: ffffffffb999e260 CR3: 00000000c26aa005 CR4: 00000000000606e0
[   53.156459][ T1395] Kernel panic - not syncing: Fatal exception
[   54.213570][ T1395] Shutting down cpus with NMI
[   54.354112][ T1395] Kernel Offset: 0x33000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0x)
[   54.355687][ T1395] Rebooting in 5 seconds..

Fixes: b37f78f234 ("net: qualcomm: rmnet: Fix crash on real dev unregistration")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 11:45:07 -08:00
Taehee Yoo 1dc49e9d16 net: rmnet: do not allow to change mux id if mux id is duplicated
Basically, duplicate mux id isn't be allowed.
So, the creation of rmnet will be failed if there is duplicate mux id
is existing.
But, changelink routine doesn't check duplicate mux id.

Test commands:
    modprobe rmnet
    ip link add dummy0 type dummy
    ip link add rmnet0 link dummy0 type rmnet mux_id 1
    ip link add rmnet1 link dummy0 type rmnet mux_id 2
    ip link set rmnet1 type rmnet mux_id 1

Fixes: 23790ef120 ("net: qualcomm: rmnet: Allow to configure flags for existing devices")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 11:45:07 -08:00
Taehee Yoo c026d97010 net: rmnet: remove rcu_read_lock in rmnet_force_unassociate_device()
The notifier_call() of the slave interface removes rmnet interface with
unregister_netdevice_queue().
But, before calling unregister_netdevice_queue(), it acquires
rcu readlock.
In the RCU critical section, sleeping isn't be allowed.
But, unregister_netdevice_queue() internally calls synchronize_net(),
which would sleep.
So, suspicious RCU usage warning occurs.

Test commands:
    modprobe rmnet
    ip link add dummy0 type dummy
    ip link add dummy1 type dummy
    ip link add rmnet0 link dummy0 type rmnet mux_id 1
    ip link set dummy1 master rmnet0
    ip link del dummy0

Splat looks like:
[   79.639245][ T1195] =============================
[   79.640134][ T1195] WARNING: suspicious RCU usage
[   79.640852][ T1195] 5.6.0-rc1+ #447 Not tainted
[   79.641657][ T1195] -----------------------------
[   79.642472][ T1195] ./include/linux/rcupdate.h:273 Illegal context switch in RCU read-side critical section!
[   79.644043][ T1195]
[   79.644043][ T1195] other info that might help us debug this:
[   79.644043][ T1195]
[   79.645682][ T1195]
[   79.645682][ T1195] rcu_scheduler_active = 2, debug_locks = 1
[   79.646980][ T1195] 2 locks held by ip/1195:
[   79.647629][ T1195]  #0: ffffffffa3cf64f0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x457/0x890
[   79.649312][ T1195]  #1: ffffffffa39256c0 (rcu_read_lock){....}, at: rmnet_config_notify_cb+0xf0/0x590 [rmnet]
[   79.651717][ T1195]
[   79.651717][ T1195] stack backtrace:
[   79.652650][ T1195] CPU: 3 PID: 1195 Comm: ip Not tainted 5.6.0-rc1+ #447
[   79.653702][ T1195] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   79.655037][ T1195] Call Trace:
[   79.655560][ T1195]  dump_stack+0x96/0xdb
[   79.656252][ T1195]  ___might_sleep+0x345/0x440
[   79.656994][ T1195]  synchronize_net+0x18/0x30
[   79.661132][ T1195]  netdev_rx_handler_unregister+0x40/0xb0
[   79.666266][ T1195]  rmnet_unregister_real_device+0x42/0xb0 [rmnet]
[   79.667211][ T1195]  rmnet_config_notify_cb+0x1f7/0x590 [rmnet]
[   79.668121][ T1195]  ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet]
[   79.669166][ T1195]  ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet]
[   79.670286][ T1195]  ? __module_text_address+0x13/0x140
[   79.671139][ T1195]  notifier_call_chain+0x90/0x160
[   79.671973][ T1195]  rollback_registered_many+0x660/0xcf0
[   79.672893][ T1195]  ? netif_set_real_num_tx_queues+0x780/0x780
[   79.675091][ T1195]  ? __lock_acquire+0xdfe/0x3de0
[   79.675825][ T1195]  ? memset+0x1f/0x40
[   79.676367][ T1195]  ? __nla_validate_parse+0x98/0x1ab0
[   79.677290][ T1195]  unregister_netdevice_many.part.133+0x13/0x1b0
[   79.678163][ T1195]  rtnl_delete_link+0xbc/0x100
[ ... ]

Fixes: ceed73a2cf ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 11:45:07 -08:00
Taehee Yoo 102210f766 net: rmnet: fix suspicious RCU usage
rmnet_get_port() internally calls rcu_dereference_rtnl(),
which checks RTNL.
But rmnet_get_port() could be called by packet path.
The packet path is not protected by RTNL.
So, the suspicious RCU usage problem occurs.

Test commands:
    modprobe rmnet
    ip netns add nst
    ip link add veth0 type veth peer name veth1
    ip link set veth1 netns nst
    ip link add rmnet0 link veth0 type rmnet mux_id 1
    ip netns exec nst ip link add rmnet1 link veth1 type rmnet mux_id 1
    ip netns exec nst ip link set veth1 up
    ip netns exec nst ip link set rmnet1 up
    ip netns exec nst ip a a 192.168.100.2/24 dev rmnet1
    ip link set veth0 up
    ip link set rmnet0 up
    ip a a 192.168.100.1/24 dev rmnet0
    ping 192.168.100.2

Splat looks like:
[  146.630958][ T1174] WARNING: suspicious RCU usage
[  146.631735][ T1174] 5.6.0-rc1+ #447 Not tainted
[  146.632387][ T1174] -----------------------------
[  146.633151][ T1174] drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c:386 suspicious rcu_dereference_check() !
[  146.634742][ T1174]
[  146.634742][ T1174] other info that might help us debug this:
[  146.634742][ T1174]
[  146.645992][ T1174]
[  146.645992][ T1174] rcu_scheduler_active = 2, debug_locks = 1
[  146.646937][ T1174] 5 locks held by ping/1174:
[  146.647609][ T1174]  #0: ffff8880c31dea70 (sk_lock-AF_INET){+.+.}, at: raw_sendmsg+0xab8/0x2980
[  146.662463][ T1174]  #1: ffffffff93925660 (rcu_read_lock_bh){....}, at: ip_finish_output2+0x243/0x2150
[  146.671696][ T1174]  #2: ffffffff93925660 (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x213/0x2940
[  146.673064][ T1174]  #3: ffff8880c19ecd58 (&dev->qdisc_running_key#7){+...}, at: ip_finish_output2+0x714/0x2150
[  146.690358][ T1174]  #4: ffff8880c5796898 (&dev->qdisc_xmit_lock_key#3){+.-.}, at: sch_direct_xmit+0x1e2/0x1020
[  146.699875][ T1174]
[  146.699875][ T1174] stack backtrace:
[  146.701091][ T1174] CPU: 0 PID: 1174 Comm: ping Not tainted 5.6.0-rc1+ #447
[  146.705215][ T1174] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  146.706565][ T1174] Call Trace:
[  146.707102][ T1174]  dump_stack+0x96/0xdb
[  146.708007][ T1174]  rmnet_get_port.part.9+0x76/0x80 [rmnet]
[  146.709233][ T1174]  rmnet_egress_handler+0x107/0x420 [rmnet]
[  146.710492][ T1174]  ? sch_direct_xmit+0x1e2/0x1020
[  146.716193][ T1174]  rmnet_vnd_start_xmit+0x3d/0xa0 [rmnet]
[  146.717012][ T1174]  dev_hard_start_xmit+0x160/0x740
[  146.717854][ T1174]  sch_direct_xmit+0x265/0x1020
[  146.718577][ T1174]  ? register_lock_class+0x14d0/0x14d0
[  146.719429][ T1174]  ? dev_watchdog+0xac0/0xac0
[  146.723738][ T1174]  ? __dev_queue_xmit+0x15fd/0x2940
[  146.724469][ T1174]  ? lock_acquire+0x164/0x3b0
[  146.725172][ T1174]  __dev_queue_xmit+0x20c7/0x2940
[ ... ]

Fixes: ceed73a2cf ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 11:45:07 -08:00
Taehee Yoo 1eb1f43a6e net: rmnet: fix NULL pointer dereference in rmnet_changelink()
In the rmnet_changelink(), it uses IFLA_LINK without checking
NULL pointer.
tb[IFLA_LINK] could be NULL pointer.
So, NULL-ptr-deref could occur.

rmnet already has a lower interface (real_dev).
So, after this patch, rmnet_changelink() does not use IFLA_LINK anymore.

Test commands:
    modprobe rmnet
    ip link add dummy0 type dummy
    ip link add rmnet0 link dummy0 type rmnet mux_id 1
    ip link set rmnet0 type rmnet mux_id 2

Splat looks like:
[   90.578726][ T1131] general protection fault, probably for non-canonical address 0xdffffc0000000000I
[   90.581121][ T1131] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[   90.582380][ T1131] CPU: 2 PID: 1131 Comm: ip Not tainted 5.6.0-rc1+ #447
[   90.584285][ T1131] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   90.587506][ T1131] RIP: 0010:rmnet_changelink+0x5a/0x8a0 [rmnet]
[   90.588546][ T1131] Code: 83 ec 20 48 c1 ea 03 80 3c 02 00 0f 85 6f 07 00 00 48 8b 5e 28 48 b8 00 00 00 00 00 0
[   90.591447][ T1131] RSP: 0018:ffff8880ce78f1b8 EFLAGS: 00010247
[   90.592329][ T1131] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffff8880ce78f8b0
[   90.593253][ T1131] RDX: 0000000000000000 RSI: ffff8880ce78f4a0 RDI: 0000000000000004
[   90.594058][ T1131] RBP: ffff8880cf543e00 R08: 0000000000000002 R09: 0000000000000002
[   90.594859][ T1131] R10: ffffffffc0586a40 R11: 0000000000000000 R12: ffff8880ca47c000
[   90.595690][ T1131] R13: ffff8880ca47c000 R14: ffff8880cf545000 R15: 0000000000000000
[   90.596553][ T1131] FS:  00007f21f6c7e0c0(0000) GS:ffff8880da400000(0000) knlGS:0000000000000000
[   90.597504][ T1131] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   90.599418][ T1131] CR2: 0000556e413db458 CR3: 00000000c917a002 CR4: 00000000000606e0
[   90.600289][ T1131] Call Trace:
[   90.600631][ T1131]  __rtnl_newlink+0x922/0x1270
[   90.601194][ T1131]  ? lock_downgrade+0x6e0/0x6e0
[   90.601724][ T1131]  ? rtnl_link_unregister+0x220/0x220
[   90.602309][ T1131]  ? lock_acquire+0x164/0x3b0
[   90.602784][ T1131]  ? is_bpf_image_address+0xff/0x1d0
[   90.603331][ T1131]  ? rtnl_newlink+0x4c/0x90
[   90.603810][ T1131]  ? kernel_text_address+0x111/0x140
[   90.604419][ T1131]  ? __kernel_text_address+0xe/0x30
[   90.604981][ T1131]  ? unwind_get_return_address+0x5f/0xa0
[   90.605616][ T1131]  ? create_prof_cpu_mask+0x20/0x20
[   90.606304][ T1131]  ? arch_stack_walk+0x83/0xb0
[   90.606985][ T1131]  ? stack_trace_save+0x82/0xb0
[   90.607656][ T1131]  ? stack_trace_consume_entry+0x160/0x160
[   90.608503][ T1131]  ? deactivate_slab.isra.78+0x2c5/0x800
[   90.609336][ T1131]  ? kasan_unpoison_shadow+0x30/0x40
[   90.610096][ T1131]  ? kmem_cache_alloc_trace+0x135/0x350
[   90.610889][ T1131]  ? rtnl_newlink+0x4c/0x90
[   90.611512][ T1131]  rtnl_newlink+0x65/0x90
[ ... ]

Fixes: 23790ef120 ("net: qualcomm: rmnet: Allow to configure flags for existing devices")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 11:45:07 -08:00