1
0
Fork 0
Commit Graph

513 Commits (redonkable)

Author SHA1 Message Date
Dmitry Safonov 09901e570c netlink: Don't shift on 64 for ngroups
commit 91874ecf32 upstream.

It's legal to have 64 groups for netlink_sock.

As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe
only to first 32 groups.

The check for correctness of .bind() userspace supplied parameter
is done by applying mask made from ngroups shift. Which broke Android
as they have 64 groups and the shift for mask resulted in an overflow.

Fixes: 61f4b23769 ("netlink: Don't shift with UB on nlk->ngroups")
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Reported-and-Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-09 12:16:38 +02:00
Jeremy Cline 1828cb3d10 netlink: Fix spectre v1 gadget in netlink_create()
[ Upstream commit bc5b6c0b62 ]

'protocol' is a user-controlled value, so sanitize it after the bounds
check to avoid using it for speculative out-of-bounds access to arrays
indexed by it.

This addresses the following accesses detected with the help of smatch:

* net/netlink/af_netlink.c:654 __netlink_create() warn: potential
  spectre issue 'nlk_cb_mutex_keys' [w]

* net/netlink/af_netlink.c:654 __netlink_create() warn: potential
  spectre issue 'nlk_cb_mutex_key_strings' [w]

* net/netlink/af_netlink.c:685 netlink_create() warn: potential spectre
  issue 'nl_table' [w] (local cap)

Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-06 16:20:48 +02:00
Dmitry Safonov fffd3058ea netlink: Don't shift with UB on nlk->ngroups
[ Upstream commit 61f4b23769 ]

On i386 nlk->ngroups might be 32 or 0. Which leads to UB, resulting in
hang during boot.
Check for 0 ngroups and use (unsigned long long) as a type to shift.

Fixes: 7acf9d4237 ("netlink: Do not subscribe to non-existent groups").
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03 07:50:45 +02:00
Dmitry Safonov e208cda5f1 netlink: Do not subscribe to non-existent groups
[ Upstream commit 7acf9d4237 ]

Make ABI more strict about subscribing to group > ngroups.
Code doesn't check for that and it looks bogus.
(one can subscribe to non-existing group)
Still, it's possible to bind() to all possible groups with (-1)

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03 07:50:45 +02:00
Eric Dumazet bf2ecb9346 netlink: fix uninit-value in netlink_sendmsg
commit 6091f09c2f upstream.

syzbot reported :

BUG: KMSAN: uninit-value in ffs arch/x86/include/asm/bitops.h:432 [inline]
BUG: KMSAN: uninit-value in netlink_sendmsg+0xb26/0x1310 net/netlink/af_netlink.c:1851

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-16 10:10:23 +02:00
Alexander Potapenko 787b940625 netlink: make sure nladdr has correct size in netlink_connect()
[ Upstream commit 7880287981 ]

KMSAN reports use of uninitialized memory in the case when |alen| is
smaller than sizeof(struct sockaddr_nl), and therefore |nladdr| isn't
fully copied from the userspace.

Signed-off-by: Alexander Potapenko <glider@google.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-12 12:32:23 +02:00
Nicolas Dichtel 06d3f43d52 netlink: avoid a double skb free in genlmsg_mcast()
[ Upstream commit 02a2385f37 ]

nlmsg_multicast() consumes always the skb, thus the original skb must be
freed only when this function is called with a clone.

Fixes: cb9f7a9a5c ("netlink: ensure to loop over all netns in genlmsg_multicast_allns()")
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-31 18:10:41 +02:00
Jason A. Donenfeld eec434c573 netlink: put module reference if dump start fails
[ Upstream commit b87b6194be ]

Before, if cb->start() failed, the module reference would never be put,
because cb->cb_running is intentionally false at this point. Users are
generally annoyed by this because they can no longer unload modules that
leak references. Also, it may be possible to tediously wrap a reference
counter back to zero, especially since module.c still uses atomic_inc
instead of refcount_inc.

This patch expands the error path to simply call module_put if
cb->start() fails.

Fixes: 41c87425a1 ("netlink: do not set cb_running if dump's start() errs")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:17 -08:00
Nicolas Dichtel 60b28d5ef3 netlink: ensure to loop over all netns in genlmsg_multicast_allns()
[ Upstream commit cb9f7a9a5c ]

Nowadays, nlmsg_multicast() returns only 0 or -ESRCH but this was not the
case when commit 134e63756d was pushed.
However, there was no reason to stop the loop if a netns does not have
listeners.
Returns -ESRCH only if there was no listeners in all netns.

To avoid having the same problem in the future, I didn't take the
assumption that nlmsg_multicast() returns only 0 or -ESRCH.

Fixes: 134e63756d ("genetlink: make netns aware")
CC: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:09 -08:00
Xin Long 48606bb1ee netlink: reset extack earlier in netlink_rcv_skb
[ Upstream commit cd443f1e91 ]

Move up the extack reset/initialization in netlink_rcv_skb, so that
those 'goto ack' will not skip it. Otherwise, later on netlink_ack
may use the uninitialized extack and cause kernel crash.

Fixes: cbbdf8433a ("netlink: extack needs to be reset each time through loop")
Reported-by: syzbot+03bee3680a37466775e7@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 14:03:48 +01:00
David Ahern 3eae0ba8c9 netlink: extack needs to be reset each time through loop
[ Upstream commit cbbdf8433a ]

syzbot triggered the WARN_ON in netlink_ack testing the bad_attr value.
The problem is that netlink_rcv_skb loops over the skb repeatedly invoking
the callback and without resetting the extack leaving potentially stale
data. Initializing each time through avoids the WARN_ON.

Fixes: 2d4bc93368 ("netlink: extended ACK reporting")
Reported-by: syzbot+315fa6766d0f7c359327@syzkaller.appspotmail.com
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 14:03:48 +01:00
Kevin Cernekee e3fb538e57 netlink: Add netns check on taps
[ Upstream commit 93c647643b ]

Currently, a nlmon link inside a child namespace can observe systemwide
netlink activity.  Filter the traffic so that nlmon can only sniff
netlink messages from its own netns.

Test case:

    vpnns -- bash -c "ip link add nlmon0 type nlmon; \
                      ip link set nlmon0 up; \
                      tcpdump -i nlmon0 -q -w /tmp/nlmon.pcap -U" &
    sudo ip xfrm state add src 10.1.1.1 dst 10.1.1.2 proto esp \
        spi 0x1 mode transport \
        auth sha1 0x6162633132330000000000000000000000000000 \
        enc aes 0x00000000000000000000000000000000
    grep --binary abc123 /tmp/nlmon.pcap

Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:31:06 +01:00
Jason A. Donenfeld 5856c858c6 af_netlink: ensure that NLMSG_DONE never fails in dumps
[ Upstream commit 0642840b8b ]

The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.

However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:

  nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);

It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.

In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.

This patch thus saves the NLMSG_DONE for an additional message, for the
case that things are dangerously close to the brim. This requires
keeping track of the errno from ->dump() across calls.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24 08:37:03 +01:00
Greg Kroah-Hartman b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Johannes Berg 48044eb490 netlink: fix netlink_ack() extack race
It seems that it's possible to toggle NETLINK_F_EXT_ACK
through setsockopt() while another thread/CPU is building
a message inside netlink_ack(), which could then trigger
the WARN_ON()s I added since if it goes from being turned
off to being turned on between allocating and filling the
message, the skb could end up being too small.

Avoid this whole situation by storing the value of this
flag in a separate variable and using that throughout the
function instead.

Fixes: 2d4bc93368 ("netlink: extended ACK reporting")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18 12:22:28 +01:00
Jason A. Donenfeld 41c87425a1 netlink: do not set cb_running if dump's start() errs
It turns out that multiple places can call netlink_dump(), which means
it's still possible to dereference partially initialized values in
dump() that were the result of a faulty returned start().

This fixes the issue by calling start() _before_ setting cb_running to
true, so that there's no chance at all of hitting the dump() function
through any indirect paths.

It also moves the call to start() to be when the mutex is held. This has
the nice side effect of serializing invocations to start(), which is
likely desirable anyway. It also prevents any possible other races that
might come out of this logic.

In testing this with several different pieces of tricky code to trigger
these issues, this commit fixes all avenues that I'm aware of.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09 10:27:49 -07:00
Jason A. Donenfeld fef0035c0f netlink: do not proceed if dump's start() errs
Drivers that use the start method for netlink dumping rely on dumpit not
being called if start fails. For example, ila_xlat.c allocates memory
and assigns it to cb->args[0] in its start() function. It might fail to
do that and return -ENOMEM instead. However, even when returning an
error, dumpit will be called, which, in the example above, quickly
dereferences the memory in cb->args[0], which will OOPS the kernel. This
is but one example of how this goes wrong.

Since start() has always been a function with an int return type, it
therefore makes sense to use it properly, rather than ignoring it. This
patch thus returns early and does not call dumpit() when start() fails.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-30 16:13:31 +01:00
Xin Long f773608026 netlink: access nlk groups safely in netlink bind and getname
Now there is no lock protecting nlk ngroups/groups' accessing in
netlink bind and getname. It's safe from nlk groups' setting in
netlink_release, but not from netlink_realloc_groups called by
netlink_setsockopt.

netlink_lock_table is needed in both netlink bind and getname when
accessing nlk groups.

Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-06 21:22:54 -07:00
Xin Long be82485fbc netlink: fix an use-after-free issue for nlk groups
ChunYu found a netlink use-after-free issue by syzkaller:

[28448.842981] BUG: KASAN: use-after-free in __nla_put+0x37/0x40 at addr ffff8807185e2378
[28448.969918] Call Trace:
[...]
[28449.117207]  __nla_put+0x37/0x40
[28449.132027]  nla_put+0xf5/0x130
[28449.146261]  sk_diag_fill.isra.4.constprop.5+0x5a0/0x750 [netlink_diag]
[28449.176608]  __netlink_diag_dump+0x25a/0x700 [netlink_diag]
[28449.202215]  netlink_diag_dump+0x176/0x240 [netlink_diag]
[28449.226834]  netlink_dump+0x488/0xbb0
[28449.298014]  __netlink_dump_start+0x4e8/0x760
[28449.317924]  netlink_diag_handler_dump+0x261/0x340 [netlink_diag]
[28449.413414]  sock_diag_rcv_msg+0x207/0x390
[28449.432409]  netlink_rcv_skb+0x149/0x380
[28449.467647]  sock_diag_rcv+0x2d/0x40
[28449.484362]  netlink_unicast+0x562/0x7b0
[28449.564790]  netlink_sendmsg+0xaa8/0xe60
[28449.661510]  sock_sendmsg+0xcf/0x110
[28449.865631]  __sys_sendmsg+0xf3/0x240
[28450.000964]  SyS_sendmsg+0x32/0x50
[28450.016969]  do_syscall_64+0x25c/0x6c0
[28450.154439]  entry_SYSCALL64_slow_path+0x25/0x25

It was caused by no protection between nlk groups' free in netlink_release
and nlk groups' accessing in sk_diag_dump_groups. The similar issue also
exists in netlink_seq_show().

This patch is to defer nlk groups' free in deferred_put_nlk_sk.

Reported-by: ChunYu Wang <chunwang@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-06 21:22:53 -07:00
Reshetova, Elena 41c6d650f6 net: convert sock.sk_refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

This patch uses refcount_inc_not_zero() instead of
atomic_inc_not_zero_hint() due to absense of a _hint()
version of refcount API. If the hint() version must
be used, we might need to revisit API.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 07:39:08 -07:00
Reshetova, Elena 14afee4b60 net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 07:39:08 -07:00
Reshetova, Elena 633547973f net: convert sk_buff.users from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 07:39:07 -07:00
Johannes Berg 4df864c1d9 networking: make skb_put & friends return void pointers
It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions (skb_put, __skb_put and pskb_put) return void *
and remove all the casts across the tree, adding a (u8 *) cast only
where the unsigned char pointer was used directly, all done with the
following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_put, __skb_put };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_put, __skb_put };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

which actually doesn't cover pskb_put since there are only three
users overall.

A handful of stragglers were converted manually, notably a macro in
drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
instances in net/bluetooth/hci_sock.c. In the former file, I also
had to fix one whitespace problem spatch introduced.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16 11:48:39 -04:00
Johannes Berg 59ae1d127a networking: introduce and use skb_put_data()
A common pattern with skb_put() is to just want to memcpy()
some data into the new space, introduce skb_put_data() for
this.

An spatch similar to the one for skb_put_zero() converts many
of the places using it:

    @@
    identifier p, p2;
    expression len, skb, data;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_data(skb, data, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_data(skb, data, len);
    )
    (
    p2 = (t2)p;
    -memcpy(p2, data, len);
    |
    -memcpy(p, data, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb, data;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_data(skb, data, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_data(skb, data, sizeof(t));
    )
    (
    p2 = (t2)p;
    -memcpy(p2, data, sizeof(*p));
    |
    -memcpy(p, data, sizeof(*p));
    )

    @@
    expression skb, len, data;
    @@
    -memcpy(skb_put(skb, len), data, len);
    +skb_put_data(skb, data, len);

(again, manually post-processed to retain some comments)

Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16 11:48:37 -04:00
Nicolas Dichtel 7212462fa6 netlink: don't send unknown nsid
The NETLINK_F_LISTEN_ALL_NSID otion enables to listen all netns that have a
nsid assigned into the netns where the netlink socket is opened.
The nsid is sent as metadata to userland, but the existence of this nsid is
checked only for netns that are different from the socket netns. Thus, if
no nsid is assigned to the socket netns, NETNSA_NSID_NOT_ASSIGNED is
reported to the userland. This value is confusing and useless.
After this patch, only valid nsid are sent to userland.

Reported-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-01 11:49:39 -04:00
Johannes Berg fe52145f91 netlink: pass extended ACK struct where available
This is an add-on to the previous patch that passes the extended ACK
structure where it's already available by existing genl_info or extack
function arguments.

This was done with this spatch (with some manual adjustment of
indentation):

@@
expression A, B, C, D, E;
identifier fn, info;
@@
fn(..., struct genl_info *info, ...) {
...
-nlmsg_parse(A, B, C, D, E, NULL)
+nlmsg_parse(A, B, C, D, E, info->extack)
...
}

@@
expression A, B, C, D, E;
identifier fn, info;
@@
fn(..., struct genl_info *info, ...) {
<...
-nla_parse_nested(A, B, C, D, NULL)
+nla_parse_nested(A, B, C, D, info->extack)
...>
}

@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nlmsg_parse(A, B, C, D, E, NULL)
+nlmsg_parse(A, B, C, D, E, extack)
...>
}

@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nla_parse(A, B, C, D, E, NULL)
+nla_parse(A, B, C, D, E, extack)
...>
}

@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
...
-nlmsg_parse(A, B, C, D, E, NULL)
+nlmsg_parse(A, B, C, D, E, extack)
...
}

@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nla_parse_nested(A, B, C, D, NULL)
+nla_parse_nested(A, B, C, D, extack)
...>
}

@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nlmsg_validate(A, B, C, D, NULL)
+nlmsg_validate(A, B, C, D, extack)
...>
}

@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nla_validate(A, B, C, D, NULL)
+nla_validate(A, B, C, D, extack)
...>
}

@@
expression A, B, C;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nla_validate_nested(A, B, C, NULL)
+nla_validate_nested(A, B, C, extack)
...>
}

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13 13:58:22 -04:00
Johannes Berg fceb6435e8 netlink: pass extended ACK struct to parsing functions
Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13 13:58:22 -04:00
Johannes Berg ba0dc5f6e0 netlink: allow sending extended ACK with cookie on success
Now that we have extended error reporting and a new message format for
netlink ACK messages, also extend this to be able to return arbitrary
cookie data on success.

This will allow, for example, nl80211 to not send an extra message for
cookies identifying newly created objects, but return those directly
in the ACK message.

The cookie data size is currently limited to 20 bytes (since Jamal
talked about using SHA1 for identifiers.)

Thanks to Jamal Hadi Salim for bringing up this idea during the
discussions.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13 13:58:21 -04:00
Johannes Berg 7ab606d160 genetlink: pass extended ACK report down
Pass the extended ACK reporting struct down from generic netlink to
the families, using the existing struct genl_info for simplicity.

Also add support to set the extended ACK information from generic
netlink users.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13 13:58:21 -04:00
Johannes Berg 2d4bc93368 netlink: extended ACK reporting
Add the base infrastructure and UAPI for netlink extended ACK
reporting. All "manual" calls to netlink_ack() pass NULL for now and
thus don't get extended ACK reporting.

Big thanks goes to Pablo Neira Ayuso for not only bringing up the
whole topic at netconf (again) but also coming up with the nlattr
passing trick and various other ideas.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13 13:58:20 -04:00
Andrey Vagin 457c79e544 netlink/diag: report flags for netlink sockets
cb_running is reported in /proc/self/net/netlink and it is reported by
the ss tool, when it gets information from the proc files.

sock_diag is a new interface which is used instead of proc files, so it
looks reasonable that this interface has to report no less information
about sockets than proc files.

We use these flags to dump and restore netlink sockets.

Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-05 07:13:56 -07:00
Stanislaw Gruszka 1d2a6a5e4b genetlink: fix counting regression on ctrl_dumpfamily()
Commit 2ae0f17df1 ("genetlink: use idr to track families") replaced

	if (++n < fams_to_skip)
		continue;
into:

	if (n++ < fams_to_skip)
		continue;

This subtle change cause that on retry ctrl_dumpfamily() call we omit
one family that failed to do ctrl_fill_info() on previous call, because
cb->args[0] = n number counts also family that failed to do
ctrl_fill_info().

Patch fixes the problem and avoid confusion in the future just decrease
n counter when ctrl_fill_info() fail.

User visible problem caused by this bug is failure to get access to
some genetlink family i.e. nl80211. However problem is reproducible
only if number of registered genetlink families is big enough to
cause second call of ctrl_dumpfamily().

Cc: Xose Vazquez Perez <xose.vazquez@gmail.com>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Fixes: 2ae0f17df1 ("genetlink: use idr to track families")
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 15:38:43 -07:00
Herbert Xu 8a0f5ccfb3 crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutex
On Tue, Mar 14, 2017 at 10:44:10AM +0100, Dmitry Vyukov wrote:
>
> Yes, please.
> Disregarding some reports is not a good way long term.

Please try this patch.

---8<---
Subject: netlink: Annotate nlk cb_mutex by protocol

Currently all occurences of nlk->cb_mutex are annotated by lockdep
as a single class.  This causes a false lcokdep cycle involving
genl and crypto_user.

This patch fixes it by dividing cb_mutex into individual classes
based on the netlink protocol.  As genl and crypto_user do not
use the same netlink protocol this breaks the false dependency
loop.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 14:38:15 -07:00
Eric Dumazet 158f323b98 net: adjust skb->truesize in pskb_expand_head()
Slava Shwartsman reported a warning in skb_try_coalesce(), when we
detect skb->truesize is completely wrong.

In his case, issue came from IPv6 reassembly coping with malicious
datagrams, that forced various pskb_may_pull() to reallocate a bigger
skb->head than the one allocated by NIC driver before entering GRO
layer.

Current code does not change skb->truesize, leaving this burden to
callers if they care enough.

Blindly changing skb->truesize in pskb_expand_head() is not
easy, as some producers might track skb->truesize, for example
in xmit path for back pressure feedback (sk->sk_wmem_alloc)

We can detect the cases where it should be safe to change
skb->truesize :

1) skb is not attached to a socket.
2) If it is attached to a socket, destructor is sock_edemux()

My audit gave only two callers doing their own skb->truesize
manipulation.

I had to remove skb parameter in sock_edemux macro when
CONFIG_INET is not set to avoid a compile error.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 12:03:29 -05:00
Eric Dumazet e89df81317 netlink: do not enter direct reclaim from netlink_trim()
In commit d35c99ff77 ("netlink: do not enter direct reclaim from
netlink_dump()") we made sure to not trigger expensive memory reclaim.

Problem is that a bit later, netlink_trim() might be called and
trigger memory reclaim.

netlink_trim() should be best effort, and really as fast as possible.
Under memory pressure, it is fine to not trim this skb.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16 13:39:35 -05:00
Linus Torvalds 7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
WANG Cong efa172f428 netlink: use blocking notifier
netlink_chain is called in ->release(), which is apparently
a process context, so we don't have to use an atomic notifier
here.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10 17:25:58 -05:00
David S. Miller c63d352f05 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-12-06 21:33:19 -05:00
Herbert Xu ed5d7788a9 netlink: Do not schedule work from sk_destruct
It is wrong to schedule a work from sk_destruct using the socket
as the memory reserve because the socket will be freed immediately
after the return from sk_destruct.

Instead we should do the deferral prior to sk_free.

This patch does just that.

Fixes: 707693c8a4 ("netlink: Call cb->done from a worker thread")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05 19:43:42 -05:00
David S. Miller 2745529ac7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Couple conflicts resolved here:

1) In the MACB driver, a bug fix to properly initialize the
   RX tail pointer properly overlapped with some changes
   to support variable sized rings.

2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
   overlapping with a reorganization of the driver to support
   ACPI, OF, as well as PCI variants of the chip.

3) In 'net' we had several probe error path bug fixes to the
   stmmac driver, meanwhile a lot of this code was cleaned up
   and reorganized in 'net-next'.

4) The cls_flower classifier obtained a helper function in
   'net-next' called __fl_delete() and this overlapped with
   Daniel Borkamann's bug fix to use RCU for object destruction
   in 'net'.  It also overlapped with Jiri's change to guard
   the rhashtable_remove_fast() call with a check against
   tc_skip_sw().

5) In mlx4, a revert bug fix in 'net' overlapped with some
   unrelated changes in 'net-next'.

6) In geneve, a stale header pointer after pskb_expand_head()
   bug fix in 'net' overlapped with a large reorganization of
   the same code in 'net-next'.  Since the 'net-next' code no
   longer had the bug in question, there was nothing to do
   other than to simply take the 'net-next' hunks.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03 12:29:53 -05:00
Herbert Xu 707693c8a4 netlink: Call cb->done from a worker thread
The cb->done interface expects to be called in process context.
This was broken by the netlink RCU conversion.  This patch fixes
it by adding a worker struct to make the cb->done call where
necessary.

Fixes: 21e4902aea ("netlink: Lockless lookup with RCU grace...")
Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29 19:48:38 -05:00
David S. Miller bb598c1b8c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of bug fixes in 'net' overlapping other changes in
'net-next-.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15 10:54:36 -05:00
WANG Cong 00ffc1ba02 genetlink: fix a memory leak on error path
In __genl_register_family(), when genl_validate_assign_mc_groups()
fails, we forget to free the memory we possibly allocate for
family->attrbuf.

Note, some callers call genl_unregister_family() to clean up
on error path, it doesn't work because the family is inserted
to the global list in the nearly last step.

Cc: Jakub Kicinski <kubakici@wp.pl>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-03 16:52:29 -04:00
Eric Dumazet 93636d1f1f netlink: netlink_diag_dump() runs without locks
A recent commit removed locking from netlink_diag_dump() but forgot
one error case.

=====================================
[ BUG: bad unlock balance detected! ]
4.9.0-rc3+ #336 Not tainted
-------------------------------------
syz-executor/4018 is trying to release lock ([   36.220068] nl_table_lock
) at:
[<ffffffff82dc8683>] netlink_diag_dump+0x1a3/0x250 net/netlink/diag.c:182
but there are no more locks to release!

other info that might help us debug this:
3 locks held by syz-executor/4018:
 #0: [   36.220068]  (
sock_diag_mutex[   36.220068] ){+.+.+.}
, at: [   36.220068] [<ffffffff82c3873b>] sock_diag_rcv+0x1b/0x40
 #1: [   36.220068]  (
sock_diag_table_mutex[   36.220068] ){+.+.+.}
, at: [   36.220068] [<ffffffff82c38e00>] sock_diag_rcv_msg+0x140/0x3a0
 #2: [   36.220068]  (
nlk->cb_mutex[   36.220068] ){+.+.+.}
, at: [   36.220068] [<ffffffff82db6600>] netlink_dump+0x50/0xac0

stack backtrace:
CPU: 1 PID: 4018 Comm: syz-executor Not tainted 4.9.0-rc3+ #336
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 ffff8800645df688 ffffffff81b46934 ffffffff84eb3e78 ffff88006ad85800
 ffffffff82dc8683 ffffffff84eb3e78 ffff8800645df6b8 ffffffff812043ca
 dffffc0000000000 ffff88006ad85ff8 ffff88006ad85fd0 00000000ffffffff
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff81b46934>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
 [<ffffffff812043ca>] print_unlock_imbalance_bug+0x17a/0x1a0
kernel/locking/lockdep.c:3388
 [<     inline     >] __lock_release kernel/locking/lockdep.c:3512
 [<ffffffff8120cfd8>] lock_release+0x8e8/0xc60 kernel/locking/lockdep.c:3765
 [<     inline     >] __raw_read_unlock ./include/linux/rwlock_api_smp.h:225
 [<ffffffff83fc001a>] _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
 [<ffffffff82dc8683>] netlink_diag_dump+0x1a3/0x250 net/netlink/diag.c:182
 [<ffffffff82db6947>] netlink_dump+0x397/0xac0 net/netlink/af_netlink.c:2110

Fixes: ad20207432 ("netlink: Use rhashtable walk interface in diag dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-03 16:16:51 -04:00
Wei Yongjun 22ca904ad7 genetlink: fix error return code in genl_register_family()
Fix to return a negative error code from the idr_alloc() error handling
case instead of 0, as done elsewhere in this function.

Also fix the return value check of idr_alloc() since idr_alloc return
negative errors on failure, not zero.

Fixes: 2ae0f17df1 ("genetlink: use idr to track families")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01 12:13:13 -04:00
pravin shelar 0e82c76359 genetlink: Fix generic netlink family unregister
This patch fixes a typo in unregister operation.

Following crash is fixed by this patch. It can be easily reproduced
by repeating modprobe and rmmod module that uses genetlink.

[  261.446686] BUG: unable to handle kernel paging request at ffffffffa0264088
[  261.448921] IP: [<ffffffff813cb70e>] strcmp+0xe/0x30
[  261.450494] PGD 1c09067
[  261.451266] PUD 1c0a063
[  261.452091] PMD 8068d5067
[  261.452525] PTE 0
[  261.453164]
[  261.453618] Oops: 0000 [#1] SMP
[  261.454577] Modules linked in: openvswitch(+) ...
[  261.480753] RIP: 0010:[<ffffffff813cb70e>]  [<ffffffff813cb70e>] strcmp+0xe/0x30
[  261.483069] RSP: 0018:ffffc90003c0bc28  EFLAGS: 00010282
[  261.510145] Call Trace:
[  261.510896]  [<ffffffff816f10ca>] genl_family_find_byname+0x5a/0x70
[  261.512819]  [<ffffffff816f2319>] genl_register_family+0xb9/0x630
[  261.514805]  [<ffffffffa02840bc>] dp_init+0xbc/0x120 [openvswitch]
[  261.518268]  [<ffffffff8100217d>] do_one_initcall+0x3d/0x160
[  261.525041]  [<ffffffff811808a9>] do_init_module+0x60/0x1f1
[  261.526754]  [<ffffffff8110687f>] load_module+0x22af/0x2860
[  261.530144]  [<ffffffff81107026>] SYSC_finit_module+0x96/0xd0
[  261.531901]  [<ffffffff8110707e>] SyS_finit_module+0xe/0x10
[  261.533605]  [<ffffffff8100391e>] do_syscall_64+0x6e/0x180
[  261.535284]  [<ffffffff817c2faf>] entry_SYSCALL64_slow_path+0x25/0x25
[  261.546512] RIP  [<ffffffff813cb70e>] strcmp+0xe/0x30
[  261.550198] ---[ end trace 76505a814dd68770 ]---

Fixes: 2ae0f17df1 ("genetlink: use idr to track families").

Reported-by: Jarno Rajahalme <jarno@ovn.org>
CC: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 20:58:15 -04:00
Johannes Berg 56989f6d85 genetlink: mark families as __ro_after_init
Now genl_register_family() is the only thing (other than the
users themselves, perhaps, but I didn't find any doing that)
writing to the family struct.

In all families that I found, genl_register_family() is only
called from __init functions (some indirectly, in which case
I've add __init annotations to clarifly things), so all can
actually be marked __ro_after_init.

This protects the data structure from accidental corruption.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27 16:16:09 -04:00
Johannes Berg 2ae0f17df1 genetlink: use idr to track families
Since generic netlink family IDs are small integers, allocated
densely, IDR is an ideal match for lookups. Replace the existing
hand-written hash-table with IDR for allocation and lookup.

This lets the families only be written to once, during register,
since the list_head can be removed and removal of a family won't
cause any writes.

It also slightly reduces the code size (by about 1.3k on x86-64).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27 16:16:09 -04:00
Johannes Berg 489111e5c2 genetlink: statically initialize families
Instead of providing macros/inline functions to initialize
the families, make all users initialize them statically and
get rid of the macros.

This reduces the kernel code size by about 1.6k on x86-64
(with allyesconfig).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27 16:16:09 -04:00
Johannes Berg a07ea4d994 genetlink: no longer support using static family IDs
Static family IDs have never really been used, the only
use case was the workaround I introduced for those users
that assumed their family ID was also their multicast
group ID.

Additionally, because static family IDs would never be
reserved by the generic netlink code, using a relatively
low ID would only work for built-in families that can be
registered immediately after generic netlink is started,
which is basically only the control family (apart from
the workaround code, which I also had to add code for so
it would reserve those IDs)

Thus, anything other than GENL_ID_GENERATE is flawed and
luckily not used except in the cases I mentioned. Move
those workarounds into a few lines of code, and then get
rid of GENL_ID_GENERATE entirely, making it more robust.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27 16:16:09 -04:00