1
0
Fork 0
Commit Graph

64 Commits (redonkable)

Author SHA1 Message Date
Ursula Braun 3f59cf41fb net/smc: no shutdown in state SMC_LISTEN
commit caa21e19e0 upstream.

Invoking shutdown for a socket in state SMC_LISTEN does not make
sense. Nevertheless programs like syzbot fuzzing the kernel may
try to do this. For SMC this means a socket refcounting problem.
This patch makes sure a shutdown call for an SMC socket in state
SMC_LISTEN simply returns with -ENOTCONN.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-24 13:09:22 +02:00
Stefan Raspl 194feed06b smc: fix sendpage() call
[ Upstream commit bda27ff5c4 ]

The sendpage() call grabs the sock lock before calling the default
implementation - which tries to grab it once again.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-21 04:02:53 +09:00
Ursula Braun 2a0bc4ad7c net/smc: pay attention to MAX_ORDER for CQ entries
[ Upstream commit c9f4c6cf53 ]

smc allocates a certain number of CQ entries for used RoCE devices. For
mlx5 devices the chosen constant number results in a large allocation
causing this warning:

[13355.124656] WARNING: CPU: 3 PID: 16535 at mm/page_alloc.c:3883 __alloc_pages_nodemask+0x2be/0x10c0
[13355.124657] Modules linked in: smc_diag(O) smc(O) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter mlx5_ib ib_core sunrpc mlx5_core s390_trng rng_core ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common ptp pps_core eadm_sch dm_multipath dm_mod vhost_net tun vhost tap sch_fq_codel kvm ip_tables x_tables autofs4 [last unloaded: smc]
[13355.124672] CPU: 3 PID: 16535 Comm: kworker/3:0 Tainted: G           O    4.14.0uschi #1
[13355.124673] Hardware name: IBM 3906 M04 704 (LPAR)
[13355.124675] Workqueue: events smc_listen_work [smc]
[13355.124677] task: 00000000e2f22100 task.stack: 0000000084720000
[13355.124678] Krnl PSW : 0704c00180000000 000000000029da76 (__alloc_pages_nodemask+0x2be/0x10c0)
[13355.124681]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[13355.124682] Krnl GPRS: 0000000000000000 00550e00014080c0 0000000000000000 0000000000000001
[13355.124684]            000000000029d8b6 00000000f3bfd710 0000000000000000 00000000014080c0
[13355.124685]            0000000000000009 00000000ec277a00 0000000000200000 0000000000000000
[13355.124686]            0000000000000000 00000000000001ff 000000000029d8b6 0000000084723720
[13355.124708] Krnl Code: 000000000029da6a: a7110200		tmll	%r1,512
                          000000000029da6e: a774ff29		brc	7,29d8c0
                         #000000000029da72: a7f40001		brc	15,29da74
                         >000000000029da76: a7f4ff25		brc	15,29d8c0
                          000000000029da7a: a7380000		lhi	%r3,0
                          000000000029da7e: a7f4fef1		brc	15,29d860
                          000000000029da82: 5820f0c4		l	%r2,196(%r15)
                          000000000029da86: a53e0048		llilh	%r3,72
[13355.124720] Call Trace:
[13355.124722] ([<000000000029d8b6>] __alloc_pages_nodemask+0xfe/0x10c0)
[13355.124724]  [<000000000013bd1e>] s390_dma_alloc+0x6e/0x148
[13355.124733]  [<000003ff802eeba6>] mlx5_dma_zalloc_coherent_node+0x8e/0xe0 [mlx5_core]
[13355.124740]  [<000003ff802eee18>] mlx5_buf_alloc_node+0x70/0x108 [mlx5_core]
[13355.124744]  [<000003ff804eb410>] mlx5_ib_create_cq+0x558/0x898 [mlx5_ib]
[13355.124749]  [<000003ff80407d40>] ib_create_cq+0x48/0x88 [ib_core]
[13355.124751]  [<000003ff80109fba>] smc_ib_setup_per_ibdev+0x52/0x118 [smc]
[13355.124753]  [<000003ff8010bcb6>] smc_conn_create+0x65e/0x728 [smc]
[13355.124755]  [<000003ff801081a2>] smc_listen_work+0x2d2/0x540 [smc]
[13355.124756]  [<0000000000162c66>] process_one_work+0x1be/0x440
[13355.124758]  [<0000000000162f40>] worker_thread+0x58/0x458
[13355.124759]  [<0000000000169e7e>] kthread+0x14e/0x168
[13355.124760]  [<00000000009ce8be>] kernel_thread_starter+0x6/0xc
[13355.124762]  [<00000000009ce8b8>] kernel_thread_starter+0x0/0xc
[13355.124762] Last Breaking-Event-Address:
[13355.124764]  [<000000000029da72>] __alloc_pages_nodemask+0x2ba/0x10c0
[13355.124764] ---[ end trace 34be38b581c0b585 ]---

This patch reduces the smc constant for the maximum number of allocated
completion queue entries SMC_MAX_CQE by 2 to avoid high round up values
in the mlx5 code, and reduces the number of allocated completion queue
entries even more, if the final allocation for an mlx5 device hits the
MAX_ORDER limit.

Reported-by: Ihnken Menssen <menssen@de.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:52:32 +02:00
Karsten Graul 953a64ca33 net/smc: use link_id of server in confirm link reply
[ Upstream commit 2be922f316 ]

The CONFIRM LINK reply message must contain the link_id sent
by the server. And set the link_id explicitly when
initializing the link.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:52:07 +02:00
Eric Biggers 2cedbdda01 net/smc: check for missing nlattrs in SMC_PNETID messages
[ Upstream commit d49baa7e12 ]

It's possible to crash the kernel in several different ways by sending
messages to the SMC_PNETID generic netlink family that are missing the
expected attributes:

- Missing SMC_PNETID_NAME => null pointer dereference when comparing
  names.
- Missing SMC_PNETID_ETHNAME => null pointer dereference accessing
  smc_pnetentry::ndev.
- Missing SMC_PNETID_IBNAME => null pointer dereference accessing
  smc_pnetentry::smcibdev.
- Missing SMC_PNETID_IBPORT => out of bounds array access to
  smc_ib_device::pattr[-1].

Fix it by validating that all expected attributes are present and that
SMC_PNETID_IBPORT is nonzero.

Reported-by: syzbot+5cd61039dc9b8bfa6e47@syzkaller.appspotmail.com
Fixes: 6812baabf2 ("smc: establish pnet table management")
Cc: <stable@vger.kernel.org> # v4.11+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25 16:17:24 +02:00
Ursula Braun 07cb0b54d7 net/smc: fix shutdown in state SMC_LISTEN
[ Upstream commit 1255fcb2a6 ]

Calling shutdown with SHUT_RD and SHUT_RDWR for a listening SMC socket
crashes, because
   commit 127f497058 ("net/smc: release clcsock from tcp_listen_worker")
releases the internal clcsock in smc_close_active() and sets smc->clcsock
to NULL.
For SHUT_RD the smc_close_active() call is removed.
For SHUT_RDWR the kernel_sock_shutdown() call is omitted, since the
clcsock is already released.

Fixes: 127f497058 ("net/smc: release clcsock from tcp_listen_worker")
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-29 11:33:11 +02:00
Davide Caratti bba05fb6a1 net/smc: fix NULL pointer dereference on sock_create_kern() error path
commit a5dcb73b96 upstream.

when sock_create_kern(..., a) returns an error, 'a' might not be a valid
pointer, so it shouldn't be dereferenced to read a->sk->sk_sndbuf and
and a->sk->sk_rcvbuf; not doing that caused the following crash:

general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 4254 Comm: syzkaller919713 Not tainted 4.16.0-rc1+ #18
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:smc_create+0x14e/0x300 net/smc/af_smc.c:1410
RSP: 0018:ffff8801b06afbc8 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff8801b63457c0 RCX: ffffffff85a3e746
RDX: 0000000000000004 RSI: 00000000ffffffff RDI: 0000000000000020
RBP: ffff8801b06afbf0 R08: 00000000000007c0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff8801b6345c08 R14: 00000000ffffffe9 R15: ffffffff8695ced0
FS:  0000000001afb880(0000) GS:ffff8801db200000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000040 CR3: 00000001b0721004 CR4: 00000000001606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  __sock_create+0x4d4/0x850 net/socket.c:1285
  sock_create net/socket.c:1325 [inline]
  SYSC_socketpair net/socket.c:1409 [inline]
  SyS_socketpair+0x1c0/0x6f0 net/socket.c:1366
  do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x26/0x9b
RIP: 0033:0x4404b9
RSP: 002b:00007fff44ab6908 EFLAGS: 00000246 ORIG_RAX: 0000000000000035
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004404b9
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000002b
RBP: 00007fff44ab6910 R08: 0000000000000002 R09: 00007fff44003031
R10: 0000000020000040 R11: 0000000000000246 R12: ffffffffffffffff
R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
Code: 48 c1 ea 03 80 3c 02 00 0f 85 b3 01 00 00 4c 8b a3 48 04 00 00 48
b8
00 00 00 00 00 fc ff df 49 8d 7c 24 20 48 89 fa 48 c1 ea 03 <80> 3c 02
00
0f 85 82 01 00 00 4d 8b 7c 24 20 48 b8 00 00 00 00
RIP: smc_create+0x14e/0x300 net/smc/af_smc.c:1410 RSP: ffff8801b06afbc8

Fixes: cd6851f303 smc: remote memory buffers (RMBs)
Reported-and-tested-by: syzbot+aa0227369be2dcc26ebe@syzkaller.appspotmail.com
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-15 10:54:28 +01:00
Ursula Braun 4d0d1bc65b net/smc: use sk_rcvbuf as start for rmb creation
[ Upstream commit 4e1061f4a2 ]

Commit 3e034725c0 ("net/smc: common functions for RMBs and send buffers")
merged handling of SMC receive and send buffers. It introduced sk_buf_size
as merged start value for size determination. But since sk_buf_size is not
used at all, sk_sndbuf is erroneously used as start for rmb creation.
This patch makes sure, sk_buf_size is really used as intended, and
sk_rcvbuf is used as start value for rmb creation.

Fixes: 3e034725c0 ("net/smc: common functions for RMBs and send buffers")
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14 09:53:05 +01:00
Greg Kroah-Hartman b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Ursula Braun 8c96feeeb3 net/smc: no close wait in case of process shut down
Usually socket closing is delayed if there is still data available in
the send buffer to be transmitted. If a process is killed, the delay
should be avoided.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:03 -07:00
Ursula Braun 18e537cd58 net/smc: introduce a delay
The number of outstanding work requests is limited. If all work
requests are in use, tx processing is postponed to another scheduling
of the tx worker. Switch to a delayed worker to have a gap for tx
completion queue events before the next retry.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:03 -07:00
Ursula Braun bfbedfd383 net/smc: terminate link group if out-of-sync is received
An out-of-sync condition can just be detected by the client.
If the server receives a CLC DECLINE message indicating an out-of-sync
condition for the link groups, the server must clean up the out-of-sync
link group.
There is no need for an extra third parameter in smc_clc_send_decline().

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:03 -07:00
Ursula Braun 5bc11ddbdf net/smc: longer delay for client link group removal
Client link group creation always follows the server linkgroup creation.
If peer creates a new server link group, client has to create a new
client link group. If peer reuses a server link group for a new
connection, client has to reuse its client link group as well. This
patch introduces a longer delay for client link group removal to make
sure this link group still exists, once the peer decides to reuse a
server link group. This avoids out-of-sync conditions for link groups.
If already scheduled, modify the delay.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:03 -07:00
Ursula Braun 8301fa44b4 net/smc: adapt send request completion notification
The solicited flag is meaningful for the receive completion queue.
Ask for next work completion of any type on the send queue.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:03 -07:00
Ursula Braun a6832c3acd net/smc: adjust net_device refcount
smc_pnet_fill_entry() uses dev_get_by_name() adding a refcount to ndev.
The following smc_pnet_enter() has to reduce the refcount if the entry
to be added exists already in the pnet table.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:02 -07:00
Ursula Braun 731b008560 net/smc: take RCU read lock for routing cache lookup
smc_netinfo_by_tcpsk() looks up the routing cache. Such a lookup requires
protection by an RCU read lock.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:02 -07:00
Hans Wippel 846e344eb7 net/smc: add receive timeout check
The SMC receive function currently lacks a timeout check under the
condition that no data were received and no data are available. This
patch adds such a check.

Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:02 -07:00
Hans Wippel 09579ac803 net/smc: add missing dev_put
In the infiniband part, SMC currently uses get_netdev which calls
dev_hold on the returned net device. However, the SMC code never calls
dev_put on that net device resulting in a wrong reference count.

This patch adds a dev_put after the usage of the net device to fix the
issue.

Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:02 -07:00
Ursula Braun 10428dd835 net/smc: synchronize buffer usage with device
Usage of send buffer "sndbuf" is synced
(a) before filling sndbuf for cpu access
(b) after filling sndbuf for device access

Usage of receive buffer "RMB" is synced
(a) before reading RMB content for cpu access
(b) after reading RMB content for device access

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 11:22:58 -07:00
Ursula Braun b33982c3a6 net/smc: cleanup function __smc_buf_create()
Split function __smc_buf_create() for better readability.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 11:22:58 -07:00
Ursula Braun 3e034725c0 net/smc: common functions for RMBs and send buffers
Creation and deletion of SMC receive and send buffers shares a high
amount of common code . This patch introduces common functions to get
rid of duplicate code.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 11:22:58 -07:00
Ursula Braun 9d8fb61734 net/smc: introduce sg-logic for send buffers
SMC send buffers are processed the same way as RMBs. Since RMBs have
been converted to sg-logic, do the same for send buffers.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 11:22:58 -07:00
Ursula Braun d5b361b0dc net/smc: remove Kconfig warning
Now separate memory regions are created and registered for separate
RMBs. The unsafe_global_rkey of the protection domain is no longer
used. Thus the exposing memory warning can be removed.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 11:22:58 -07:00
Ursula Braun 652a1e41ec net/smc: register RMB-related memory region
A memory region created for a new RMB must be registered explicitly,
before the peer can make use of it for remote DMA transfer.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 11:22:58 -07:00
Ursula Braun 897e1c2457 net/smc: use separate memory regions for RMBs
SMC currently uses the unsafe_global_rkey of the protection domain,
which exposes all memory for remote reads and writes once a connection
is established. This patch introduces separate memory regions with
separate rkeys for every RMB. Now the unsafe_global_rkey of the
protection domain is no longer needed.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 11:22:58 -07:00
Ursula Braun a3fe3d01bd net/smc: introduce sg-logic for RMBs
The follow-on patch makes use of ib_map_mr_sg() when introducing
separate memory regions for RMBs. This function is based on
scatterlists; thus this patch introduces scatterlists for RMBs.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 11:22:57 -07:00
Ursula Braun c45abf31e7 net/smc: shorten local bufsize variables
Initiate the coming rework of SMC buffer handling with this
small code cleanup. No functional changes here.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 11:22:57 -07:00
Ursula Braun 977bb32440 net/smc: serialize connection creation in all cases
If a link group for a new server connection exists already, the mutex
serializing the determination of link groups is given up early.
The coming registration of memory regions benefits from the serialization
as well, if the mutex is held till connection creation is finished.
This patch postpones the unlocking of the link group creation mutex.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 11:22:57 -07:00
Christoph Hellwig 19a0f7e37c net/smc: Add warning about remote memory exposure
The driver explicitly bypasses APIs to register all memory once a
connection is made, and thus allows remote access to memory.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16 14:49:43 -04:00
Ursula Braun 263eec9b2a smc: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEY
Currently, SMC enables remote access to physical memory when a user
has successfully configured and established an SMC-connection until ten
minutes after the last SMC connection is closed. Because this is considered
a security risk, drivers are supposed to use IB_PD_UNSAFE_GLOBAL_RKEY in
such a case.

This patch changes the current SMC code to use IB_PD_UNSAFE_GLOBAL_RKEY.
This improves user awareness, but does not remove the security risk itself.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16 14:49:42 -04:00
Linus Torvalds de4d195308 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main changes are:

   - Debloat RCU headers

   - Parallelize SRCU callback handling (plus overlapping patches)

   - Improve the performance of Tree SRCU on a CPU-hotplug stress test

   - Documentation updates

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
  rcu: Open-code the rcu_cblist_n_lazy_cbs() function
  rcu: Open-code the rcu_cblist_n_cbs() function
  rcu: Open-code the rcu_cblist_empty() function
  rcu: Separately compile large rcu_segcblist functions
  srcu: Debloat the <linux/rcu_segcblist.h> header
  srcu: Adjust default auto-expediting holdoff
  srcu: Specify auto-expedite holdoff time
  srcu: Expedite first synchronize_srcu() when idle
  srcu: Expedited grace periods with reduced memory contention
  srcu: Make rcutorture writer stalls print SRCU GP state
  srcu: Exact tracking of srcu_data structures containing callbacks
  srcu: Make SRCU be built by default
  srcu: Fix Kconfig botch when SRCU not selected
  rcu: Make non-preemptive schedule be Tasks RCU quiescent state
  srcu: Expedite srcu_schedule_cbs_snp() callback invocation
  srcu: Parallelize callback handling
  kvm: Move srcu_struct fields to end of struct kvm
  rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
  rcu: Use true/false in assignment to bool
  rcu: Use bool value directly
  ...
2017-05-10 10:30:46 -07:00
Dasaratharaman Chandramouli 44c58487d5 IB/core: Define 'ib' and 'roce' rdma_ah_attr types
rdma_ah_attr can now be either ib or roce allowing
core components to use one type or the other and also
to define attributes unique to a specific type. struct
ib_ah is also initialized with the type when its first
created. This ensures that calls such as modify_ah
dont modify the type of the address handle attribute.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli d8966fcd4c IB/core: Use rdma_ah_attr accessor functions
Modify core and driver components to use accessor functions
introduced to access individual fields of rdma_ah_attr

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Ingo Molnar 58d30c36d4 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

 - Documentation updates.

 - Miscellaneous fixes.

 - Parallelize SRCU callback handling (plus overlapping patches).

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-23 11:12:44 +02:00
Paul E. McKenney 5f0d5a3ae7 mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU
A group of Linux kernel hackers reported chasing a bug that resulted
from their assumption that SLAB_DESTROY_BY_RCU provided an existence
guarantee, that is, that no block from such a slab would be reallocated
during an RCU read-side critical section.  Of course, that is not the
case.  Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
slab of blocks.

However, there is a phrase for this, namely "type safety".  This commit
therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
to avoid future instances of this sort of confusion.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
[ paulmck: Add comments mentioning the old name, as requested by Eric
  Dumazet, in order to help people familiar with the old name find
  the new one. ]
Acked-by: David Rientjes <rientjes@google.com>
2017-04-18 11:42:36 -07:00
Ursula Braun 2c9c16825e net/smc: do not use IB_SEND_INLINE together with mapped data
smc specifies IB_SEND_INLINE for IB_WR_SEND ib_post_send calls, but
provides a mapped buffer to be sent. This is inconsistent, since
IB_SEND_INLINE works without mapped buffer. Problem has not been
detected in the past, because tests had been limited to Connect X3 cards
from Mellanox, whose mlx4 driver just ignored the IB_SEND_INLINE flag.
For now, the IB_SEND_INLINE flag is removed.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11 23:01:14 -04:00
Ursula Braun 288c83902a net/smc: destruct non-accepted sockets
Make sure sockets never accepted are removed cleanly.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11 23:01:14 -04:00
Ursula Braun f5227cd9f1 net/smc: remove duplicate unhash
unhash is already called in sock_put_work. Remove the second call.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11 23:01:14 -04:00
Ursula Braun a98bf8c0bc net/smc: guarantee ConnClosed send after shutdown SHUT_WR
State SMC_CLOSED should be reached only, if ConnClosed has been sent to
the peer. If ConnClosed is received from the peer, a socket with
shutdown SHUT_WR done, switches errorneously to state SMC_CLOSED, which
means the peer socket is dangling. The local SMC socket is supposed to
switch to state APPFINCLOSEWAIT to make sure smc_close_final() is called
during socket close.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11 23:01:14 -04:00
Ursula Braun 46c28dbd4c net/smc: no socket state changes in tasklet context
Several state changes occur during SMC socket closing. Currently
state changes triggered locally occur in process context with
lock_sock() taken while state changes triggered by peer occur in
tasklet context with bh_lock_sock() taken. bh_lock_sock() does not
wait till a lock_sock(() task in process context is finished. This
may lead to races in socket state transitions resulting in dangling
SMC-sockets, or it may lead to duplicate SMC socket freeing.
This patch introduces a closing worker to run all state changes under
lock_sock().

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11 23:01:14 -04:00
Ursula Braun 90e9517ed9 net/smc: always call the POLL_IN part of sk_wake_async
Wake up reading file descriptors for a closing socket as well, otherwise
some socket applications may stall.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11 23:01:14 -04:00
Ursula Braun 90cacb2ea6 net/smc: guarantee reset of write_blocked for heavy workload
If peer indicates write_blocked, the cursor state of the received data
should be send to the peer immediately (in smc_tx_consumer_update()).
Afterwards the write_blocked indicator is cleared.

If there is no free slot for another write request, sending is postponed
to worker smc_tx_work, and the write_blocked indicator is not cleared.
Therefore another clearing check is needed in smc_tx_work().

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11 23:01:14 -04:00
Ursula Braun 5da7e4d355 net/smc: return active RoCE port only
SMC requires an active ib port on the RoCE device.
smc_pnet_find_roce_resource() determines the matching RoCE device port
according to the configured PNET table. Do not return the found
RoCE device port, if it is not flagged active.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11 23:01:14 -04:00
Ursula Braun 249633a443 net/smc: remove useless smc_ib_devices_list check
The global event handler is created only, if the ib_device has already
been used by at least one link group. It is guaranteed that there exists
the corresponding entry in the smc_ib_devices list. Get rid of this
superfluous check.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11 23:01:14 -04:00
Ursula Braun 3c22e8f320 net/smc: get rid of old comment
This patch removes an outdated comment.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11 23:01:14 -04:00
Florian Westphal 282ccf6efb drivers: add explicit interrupt.h includes
These files all use functions declared in interrupt.h, but currently rely
on implicit inclusion of this file (via netns/xfrm.h).

That won't work anymore when the flow cache is removed so include that
header where needed.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-30 11:05:34 -07:00
David Howells cdfbabfb2f net: Work around lockdep limitation in sockets that use sockets
Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

 (1) If the pagefault handler decides it needs to read pages from AFS, it
     calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
     creating a call requires the socket lock:

	mmap_sem must be taken before sk_lock-AF_RXRPC

 (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
     binds the underlying UDP socket whilst holding its socket lock.
     inet_bind() takes its own socket lock:

	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

 (3) Reading from a TCP socket into a userspace buffer might cause a fault
     and thus cause the kernel to take the mmap_sem, but the TCP socket is
     locked whilst doing this:

	sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks.  The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace.  This is
a limitation in the design of lockdep.

Fix the general case by:

 (1) Double up all the locking keys used in sockets so that one set are
     used if the socket is created by userspace and the other set is used
     if the socket is created by the kernel.

 (2) Store the kern parameter passed to sk_alloc() in a variable in the
     sock struct (sk_kern_sock).  This informs sock_lock_init(),
     sock_init_data() and sk_clone_lock() as to the lock keys to be used.

     Note that the child created by sk_clone_lock() inherits the parent's
     kern setting.

 (3) Add a 'kern' parameter to ->accept() that is analogous to the one
     passed in to ->create() that distinguishes whether kernel_accept() or
     sys_accept4() was the caller and can be passed to sk_alloc().

     Note that a lot of accept functions merely dequeue an already
     allocated socket.  I haven't touched these as the new socket already
     exists before we get the parameter.

     Note also that there are a couple of places where I've made the accepted
     socket unconditionally kernel-based:

	irda_accept()
	rds_rcp_accept_one()
	tcp_accept_from_sock()

     because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal.  I wonder if these should do that so
that they use the new set of lock keys.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 18:23:27 -08:00
Ingo Molnar c3edc4010e sched/headers: Move task_struct::signal and task_struct::sighand types and accessors into <linux/sched/signal.h>
task_struct::signal and task_struct::sighand are pointers, which would normally make it
straightforward to not define those types in sched.h.

That is not so, because the types are accompanied by a myriad of APIs (macros and inline
functions) that dereference them.

Split the types and the APIs out of sched.h and move them into a new header, <linux/sched/signal.h>.

With this change sched.h does not know about 'struct signal' and 'struct sighand' anymore,
trying to put accessors into sched.h as a test fails the following way:

  ./include/linux/sched.h: In function ‘test_signal_types’:
  ./include/linux/sched.h:2461:18: error: dereferencing pointer to incomplete type ‘struct signal_struct’
                    ^

This reduces the size and complexity of sched.h significantly.

Update all headers and .c code that relied on getting the signal handling
functionality from <linux/sched.h> to include <linux/sched/signal.h>.

The list of affected files in the preparatory patch was partly generated by
grepping for the APIs, and partly by doing coverage build testing, both
all[yes|mod|def|no]config builds on 64-bit and 32-bit x86, and an array of
cross-architecture builds.

Nevertheless some (trivial) build breakage is still expected related to rare
Kconfig combinations and in-flight patches to various kernel code, but most
of it should be handled by this patch.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-03 01:43:37 +01:00
Dan Carpenter cdaf25dfc0 smc: some potential use after free bugs
Say we got really unlucky and these failed on the last iteration, then
it could lead to a use after free bug.

Fixes: cd6851f303 ("smc: remote memory buffers (RMBs)")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 16:37:55 -05:00
Ursula Braun 143c017108 smc: ETH_ALEN as memcpy length for mac addresses
When creating an SMC connection, there is a CLC (connection layer control)
handshake to prepare for RDMA traffic. The corresponding code is part of
commit 0cfdd8f92c ("smc: connection and link group creation").
Mac addresses to be exchanged in the handshake are copied with a wrong
length of 12 instead of 6 bytes. Following code overwrites the wrongly
copied code, but nevertheless the correct length should already be used for
the preceding mac address copying. Use ETH_ALEN for the memcpy length with
mac addresses.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Fixes: 0cfdd8f92c ("smc: connection and link group creation")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 09:47:01 -05:00