Commit graph

37 commits

Author SHA1 Message Date
Daniel Borkmann c16ee04c9b ulp: remove uid and user_visible members
They are not used anymore and therefore should be removed.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-20 23:13:32 -07:00
John Fastabend 924ad65ed0 tls: replace poll implementation with read hook
Instead of re-implementing poll routine use the poll callback to
trigger read from kTLS, we reuse the stream_memory_read callback
which is simpler and achieves the same. This helps to align sockmap
and kTLS so we can more easily embed BPF in kTLS.

Joint work with Daniel.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15 12:23:19 -07:00
Vakul Garg 9932a29ab1 net/tls: Fixed race condition in async encryption
On processors with multi-engine crypto accelerators, it is possible that
multiple records get encrypted in parallel and their encryption
completion is notified to different cpus in multicore processor. This
leads to the situation where tls_encrypt_done() starts executing in
parallel on different cores. In current implementation, encrypted
records are queued to tx_ready_list in tls_encrypt_done(). This requires
addition to linked list 'tx_ready_list' to be protected. As
tls_decrypt_done() could be executing in irq content, it is not possible
to protect linked list addition operation using a lock.

To fix the problem, we remove linked list addition operation from the
irq context. We do tx_ready_list addition/removal operation from
application context only and get rid of possible multiple access to
the linked list. Before starting encryption on the record, we add it to
the tail of tx_ready_list. To prevent tls_tx_records() from transmitting
it, we mark the record with a new flag 'tx_ready' in 'struct tls_rec'.
When record encryption gets completed, tls_encrypt_done() has to only
update the 'tx_ready' flag to true & linked list add operation is not
required.

The changed logic brings some other side benefits. Since the records
are always submitted in tls sequence number order for encryption, the
tx_ready_list always remains sorted and addition of new records to it
does not have to traverse the linked list.

Lastly, we renamed tx_ready_list in 'struct tls_sw_context_tx' to
'tx_list'. This is because now, the some of the records at the tail are
not ready to transmit.

Fixes: a42055e8d2 ("net/tls: Add support for async encryption")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24 12:24:24 -07:00
Vakul Garg a42055e8d2 net/tls: Add support for async encryption of records for performance
In current implementation, tls records are encrypted & transmitted
serially. Till the time the previously submitted user data is encrypted,
the implementation waits and on finish starts transmitting the record.
This approach of encrypt-one record at a time is inefficient when
asynchronous crypto accelerators are used. For each record, there are
overheads of interrupts, driver softIRQ scheduling etc. Also the crypto
accelerator sits idle most of time while an encrypted record's pages are
handed over to tcp stack for transmission.

This patch enables encryption of multiple records in parallel when an
async capable crypto accelerator is present in system. This is achieved
by allowing the user space application to send more data using sendmsg()
even while previously issued data is being processed by crypto
accelerator. This requires returning the control back to user space
application after submitting encryption request to accelerator. This
also means that zero-copy mode of encryption cannot be used with async
accelerator as we must be done with user space application buffer before
returning from sendmsg().

There can be multiple records in flight to/from the accelerator. Each of
the record is represented by 'struct tls_rec'. This is used to store the
memory pages for the record.

After the records are encrypted, they are added in a linked list called
tx_ready_list which contains encrypted tls records sorted as per tls
sequence number. The records from tx_ready_list are transmitted using a
newly introduced function called tls_tx_records(). The tx_ready_list is
polled for any record ready to be transmitted in sendmsg(), sendpage()
after initiating encryption of new tls records. This achieves parallel
encryption and transmission of records when async accelerator is
present.

There could be situation when crypto accelerator completes encryption
later than polling of tx_ready_list by sendmsg()/sendpage(). Therefore
we need a deferred work context to be able to transmit records from
tx_ready_list. The deferred work context gets scheduled if applications
are not sending much data through the socket. If the applications issue
sendmsg()/sendpage() in quick succession, then the scheduling of
tx_work_handler gets cancelled as the tx_ready_list would be polled from
application's context itself. This saves scheduling overhead of deferred
work.

The patch also brings some side benefit. We are able to get rid of the
concept of CLOSED record. This is because the records once closed are
either encrypted and then placed into tx_ready_list or if encryption
fails, the socket error is set. This simplifies the kernel tls
sendpath. However since tls_device.c is still using macros, accessory
functions for CLOSED records have been retained.

Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:17:34 -07:00
Sabrina Dubroca c844eb46b7 tls: clear key material from kernel memory when do_tls_setsockopt_conf fails
Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 12:03:47 -07:00
Sabrina Dubroca 86029d10af tls: zero the crypto information from tls_context before freeing
This contains key material in crypto_send_aes_gcm_128 and
crypto_recv_aes_gcm_128.

Introduce union tls_crypto_context, and replace the two identical
unions directly embedded in struct tls_context with it. We can then
use this union to clean up the memory in the new tls_ctx_free()
function.

Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 12:03:47 -07:00
John Fastabend 67db7cd249 tls: possible hang when do_tcp_sendpages hits sndbuf is full case
Currently, the lower protocols sk_write_space handler is not called if
TLS is sending a scatterlist via  tls_push_sg. However, normally
tls_push_sg calls do_tcp_sendpage, which may be under memory pressure,
that in turn may trigger a wait via sk_wait_event. Typically, this
happens when the in-flight bytes exceed the sdnbuf size. In the normal
case when enough ACKs are received sk_write_space() will be called and
the sk_wait_event will be woken up allowing it to send more data
and/or return to the user.

But, in the TLS case because the sk_write_space() handler does not
wake up the events the above send will wait until the sndtimeo is
exceeded. By default this is MAX_SCHEDULE_TIMEOUT so it look like a
hang to the user (especially this impatient user). To fix this pass
the sk_write_space event to the lower layers sk_write_space event
which in the TCP case will wake any pending events.

I observed the above while integrating sockmap and ktls. It
initially appeared as test_sockmap (modified to use ktls) occasionally
hanging. To reliably reproduce this reduce the sndbuf size and stress
the tls layer by sending many 1B sends. This results in every byte
needing a header and each byte individually being sent to the crypto
layer.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-22 21:57:14 +02:00
Daniel Borkmann 037b0b86ec tcp, ulp: add alias for all ulp modules
Lets not turn the TCP ULP lookup into an arbitrary module loader as
we only intend to load ULP modules through this mechanism, not other
unrelated kernel modules:

  [root@bar]# cat foo.c
  #include <sys/types.h>
  #include <sys/socket.h>
  #include <linux/tcp.h>
  #include <linux/in.h>

  int main(void)
  {
      int sock = socket(PF_INET, SOCK_STREAM, 0);
      setsockopt(sock, IPPROTO_TCP, TCP_ULP, "sctp", sizeof("sctp"));
      return 0;
  }

  [root@bar]# gcc foo.c -O2 -Wall
  [root@bar]# lsmod | grep sctp
  [root@bar]# ./a.out
  [root@bar]# lsmod | grep sctp
  sctp                 1077248  4
  libcrc32c              16384  3 nf_conntrack,nf_nat,sctp
  [root@bar]#

Fix it by adding module alias to TCP ULP modules, so probing module
via request_module() will be limited to tcp-ulp-[name]. The existing
modules like kTLS will load fine given tcp-ulp-tls alias, but others
will fail to load:

  [root@bar]# lsmod | grep sctp
  [root@bar]# ./a.out
  [root@bar]# lsmod | grep sctp
  [root@bar]#

Sockmap is not affected from this since it's either built-in or not.

Fixes: 734942cc4e ("tcp: ULP infrastructure")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-16 14:58:07 -07:00
Boris Pismenny 4799ac81e5 tls: Add rx inline crypto offload
This patch completes the generic infrastructure to offload TLS crypto to a
network device. It enables the kernel to skip decryption and
authentication of some skbs marked as decrypted by the NIC. In the fast
path, all packets received are decrypted by the NIC and the performance
is comparable to plain TCP.

This infrastructure doesn't require a TCP offload engine. Instead, the
NIC only decrypts packets that contain the expected TCP sequence number.
Out-Of-Order TCP packets are provided unmodified. As a result, at the
worst case a received TLS record consists of both plaintext and ciphertext
packets. These partially decrypted records must be reencrypted,
only to be decrypted.

The notable differences between SW KTLS Rx and this offload are as
follows:
1. Partial decryption - Software must handle the case of a TLS record
that was only partially decrypted by HW. This can happen due to packet
reordering.
2. Resynchronization - tls_read_size calls the device driver to
resynchronize HW after HW lost track of TLS record framing in
the TCP stream.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16 00:13:11 -07:00
Linus Torvalds a11e1d432b Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL
The poll() changes were not well thought out, and completely
unexplained.  They also caused a huge performance regression, because
"->poll()" was no longer a trivial file operation that just called down
to the underlying file operations, but instead did at least two indirect
calls.

Indirect calls are sadly slow now with the Spectre mitigation, but the
performance problem could at least be largely mitigated by changing the
"->get_poll_head()" operation to just have a per-file-descriptor pointer
to the poll head instead.  That gets rid of one of the new indirections.

But that doesn't fix the new complexity that is completely unwarranted
for the regular case.  The (undocumented) reason for the poll() changes
was some alleged AIO poll race fixing, but we don't make the common case
slower and more complex for some uncommon special case, so this all
really needs way more explanations and most likely a fundamental
redesign.

[ This revert is a revert of about 30 different commits, not reverted
  individually because that would just be unnecessarily messy  - Linus ]

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-28 10:40:47 -07:00
Daniel Borkmann f6fadff33e tls: fix NULL pointer dereference on poll
While hacking on kTLS, I ran into the following panic from an
unprivileged netserver / netperf TCP session:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  PGD 800000037f378067 P4D 800000037f378067 PUD 3c0e61067 PMD 0
  Oops: 0010 [#1] SMP KASAN PTI
  CPU: 1 PID: 2289 Comm: netserver Not tainted 4.17.0+ #139
  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
  RIP: 0010:          (null)
  Code: Bad RIP value.
  RSP: 0018:ffff88036abcf740 EFLAGS: 00010246
  RAX: dffffc0000000000 RBX: ffff88036f5f6800 RCX: 1ffff1006debed26
  RDX: ffff88036abcf920 RSI: ffff8803cb1a4f00 RDI: ffff8803c258c280
  RBP: ffff8803c258c280 R08: ffff8803c258c280 R09: ffffed006f559d48
  R10: ffff88037aacea43 R11: ffffed006f559d49 R12: ffff8803c258c280
  R13: ffff8803cb1a4f20 R14: 00000000000000db R15: ffffffffc168a350
  FS:  00007f7e631f4700(0000) GS:ffff8803d1c80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffffffffffffd6 CR3: 00000003ccf64005 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   ? tls_sw_poll+0xa4/0x160 [tls]
   ? sock_poll+0x20a/0x680
   ? do_select+0x77b/0x11a0
   ? poll_schedule_timeout.constprop.12+0x130/0x130
   ? pick_link+0xb00/0xb00
   ? read_word_at_a_time+0x13/0x20
   ? vfs_poll+0x270/0x270
   ? deref_stack_reg+0xad/0xe0
   ? __read_once_size_nocheck.constprop.6+0x10/0x10
  [...]

Debugging further, it turns out that calling into ctx->sk_poll() is
invalid since sk_poll itself is NULL which was saved from the original
TCP socket in order for tls_sw_poll() to invoke it.

Looks like the recent conversion from poll to poll_mask callback started
in 1525242310 ("net: add support for ->poll_mask in proto_ops") missed
to eventually convert kTLS, too: TCP's ->poll was converted over to the
->poll_mask in commit 2c7d3daceb ("net/tcp: convert to ->poll_mask")
and therefore kTLS wrongly saved the ->poll old one which is now NULL.

Convert kTLS over to use ->poll_mask instead. Also instead of POLLIN |
POLLRDNORM use the proper EPOLLIN | EPOLLRDNORM bits as the case in
tcp_poll_mask() as well that is mangled here.

Fixes: 2c7d3daceb ("net/tcp: convert to ->poll_mask")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Watson <davejwatson@fb.com>
Tested-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-11 16:29:54 -07:00
David S. Miller b2d6cee117 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The bpf syscall and selftests conflicts were trivial
overlapping changes.

The r8169 change involved moving the added mdelay from 'net' into a
different function.

A TLS close bug fix overlapped with the splitting of the TLS state
into separate TX and RX parts.  I just expanded the tests in the bug
fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf
== X".

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 20:53:22 -04:00
Andre Tomt 080324c36a net/tls: Fix connection stall on partial tls record
In the case of writing a partial tls record we forgot to clear the
ctx->in_tcp_sendpages flag, causing some connections to stall.

Fixes: c212d2c7fc ("net/tls: Don't recursively call push_record during tls_write_space callbacks")
Signed-off-by: Andre Tomt <andre@tomt.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07 23:47:58 -04:00
Eric Dumazet 98f0a39529 tls: fix use after free in tls_sk_proto_close
syzbot reported a use-after-free in tls_sk_proto_close

Add a boolean value to cleanup a bit this function.

BUG: KASAN: use-after-free in tls_sk_proto_close+0x8ab/0x9c0 net/tls/tls_main.c:297
Read of size 1 at addr ffff8801ae40a858 by task syz-executor363/4503

CPU: 0 PID: 4503 Comm: syz-executor363 Not tainted 4.17.0-rc3+ #34
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
 tls_sk_proto_close+0x8ab/0x9c0 net/tls/tls_main.c:297
 inet_release+0x104/0x1f0 net/ipv4/af_inet.c:427
 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:460
 sock_release+0x96/0x1b0 net/socket.c:594
 sock_close+0x16/0x20 net/socket.c:1149
 __fput+0x34d/0x890 fs/file_table.c:209
 ____fput+0x15/0x20 fs/file_table.c:243
 task_work_run+0x1e4/0x290 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x1aee/0x2730 kernel/exit.c:865
 do_group_exit+0x16f/0x430 kernel/exit.c:968
 get_signal+0x886/0x1960 kernel/signal.c:2469
 do_signal+0x98/0x2040 arch/x86/kernel/signal.c:810
 exit_to_usermode_loop+0x28a/0x310 arch/x86/entry/common.c:162
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
 do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4457b9
RSP: 002b:00007fdf4d766da8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00000000006dac3c RCX: 00000000004457b9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000006dac3c
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dac38
R13: 3692738801137283 R14: 6bf92c39443c4c1d R15: 0000000000000006

Allocated by task 4498:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
 kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
 kmalloc include/linux/slab.h:512 [inline]
 kzalloc include/linux/slab.h:701 [inline]
 create_ctx net/tls/tls_main.c:521 [inline]
 tls_init+0x1f9/0xb00 net/tls/tls_main.c:633
 tcp_set_ulp+0x1bc/0x520 net/ipv4/tcp_ulp.c:153
 do_tcp_setsockopt.isra.39+0x44a/0x2600 net/ipv4/tcp.c:2588
 tcp_setsockopt+0xc1/0xe0 net/ipv4/tcp.c:2893
 sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3039
 __sys_setsockopt+0x1bd/0x390 net/socket.c:1903
 __do_sys_setsockopt net/socket.c:1914 [inline]
 __se_sys_setsockopt net/socket.c:1911 [inline]
 __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 4503:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kfree+0xd9/0x260 mm/slab.c:3813
 tls_sw_free_resources+0x2a3/0x360 net/tls/tls_sw.c:1037
 tls_sk_proto_close+0x67c/0x9c0 net/tls/tls_main.c:288
 inet_release+0x104/0x1f0 net/ipv4/af_inet.c:427
 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:460
 sock_release+0x96/0x1b0 net/socket.c:594
 sock_close+0x16/0x20 net/socket.c:1149
 __fput+0x34d/0x890 fs/file_table.c:209
 ____fput+0x15/0x20 fs/file_table.c:243
 task_work_run+0x1e4/0x290 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x1aee/0x2730 kernel/exit.c:865
 do_group_exit+0x16f/0x430 kernel/exit.c:968
 get_signal+0x886/0x1960 kernel/signal.c:2469
 do_signal+0x98/0x2040 arch/x86/kernel/signal.c:810
 exit_to_usermode_loop+0x28a/0x310 arch/x86/entry/common.c:162
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
 do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801ae40a800
 which belongs to the cache kmalloc-256 of size 256
The buggy address is located 88 bytes inside of
 256-byte region [ffff8801ae40a800, ffff8801ae40a900)
The buggy address belongs to the page:
page:ffffea0006b90280 count:1 mapcount:0 mapping:ffff8801ae40a080 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffff8801ae40a080 0000000000000000 000000010000000c
raw: ffffea0006bea9e0 ffffea0006bc94a0 ffff8801da8007c0 0000000000000000
page dumped because: kasan: bad access detected

Fixes: dd0bed1665 ("tls: support for Inline tls record")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Atul Gupta <atul.gupta@chelsio.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Ilya Lesokhin <ilyal@mellanox.com>
Cc: Aviad Yehezkel <aviadye@mellanox.com>
Cc: Dave Watson <davejwatson@fb.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07 23:40:26 -04:00
David S. Miller a7b15ab887 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Overlapping changes in selftests Makefile.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-04 09:58:56 -04:00
Dave Watson c212d2c7fc net/tls: Don't recursively call push_record during tls_write_space callbacks
It is reported that in some cases, write_space may be called in
do_tcp_sendpages, such that we recursively invoke do_tcp_sendpages again:

[  660.468802]  ? do_tcp_sendpages+0x8d/0x580
[  660.468826]  ? tls_push_sg+0x74/0x130 [tls]
[  660.468852]  ? tls_push_record+0x24a/0x390 [tls]
[  660.468880]  ? tls_write_space+0x6a/0x80 [tls]
...

tls_push_sg already does a loop over all sending sg's, so ignore
any tls_write_space notifications until we are done sending.
We then have to call the previous write_space to wake up
poll() waiters after we are done with the send loop.

Reported-by: Andre Tomt <andre@tomt.net>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 18:57:52 -04:00
Ilya Lesokhin e8f6979981 net/tls: Add generic NIC offload infrastructure
This patch adds a generic infrastructure to offload TLS crypto to a
network device. It enables the kernel TLS socket to skip encryption
and authentication operations on the transmit side of the data path.
Leaving those computationally expensive operations to the NIC.

The NIC offload infrastructure builds TLS records and pushes them to
the TCP layer just like the SW KTLS implementation and using the same
API.
TCP segmentation is mostly unaffected. Currently the only exception is
that we prevent mixed SKBs where only part of the payload requires
offload. In the future we are likely to add a similar restriction
following a change cipher spec record.

The notable differences between SW KTLS and NIC offloaded TLS
implementations are as follows:
1. The offloaded implementation builds "plaintext TLS record", those
records contain plaintext instead of ciphertext and place holder bytes
instead of authentication tags.
2. The offloaded implementation maintains a mapping from TCP sequence
number to TLS records. Thus given a TCP SKB sent from a NIC offloaded
TLS socket, we can use the tls NIC offload infrastructure to obtain
enough context to encrypt the payload of the SKB.
A TLS record is released when the last byte of the record is ack'ed,
this is done through the new icsk_clean_acked callback.

The infrastructure should be extendable to support various NIC offload
implementations.  However it is currently written with the
implementation below in mind:
The NIC assumes that packets from each offloaded stream are sent as
plaintext and in-order. It keeps track of the TLS records in the TCP
stream. When a packet marked for offload is transmitted, the NIC
encrypts the payload in-place and puts authentication tags in the
relevant place holders.

The responsibility for handling out-of-order packets (i.e. TCP
retransmission, qdisc drops) falls on the netdev driver.

The netdev driver keeps track of the expected TCP SN from the NIC's
perspective.  If the next packet to transmit matches the expected TCP
SN, the driver advances the expected TCP SN, and transmits the packet
with TLS offload indication.

If the next packet to transmit does not match the expected TCP SN. The
driver calls the TLS layer to obtain the TLS record that includes the
TCP of the packet for transmission. Using this TLS record, the driver
posts a work entry on the transmit queue to reconstruct the NIC TLS
state required for the offload of the out-of-order packet. It updates
the expected TCP SN accordingly and transmits the now in-order packet.
The same queue is used for packet transmission and TLS context
reconstruction to avoid the need for flushing the transmit queue before
issuing the context reconstruction request.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 09:42:47 -04:00
Boris Pismenny f66de3ee2c net/tls: Split conf to rx + tx
In TLS inline crypto, we can have one direction in software
and another in hardware. Thus, we split the TLS configuration to separate
structures for receive and transmit.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 09:42:47 -04:00
Atul Gupta dd0bed1665 tls: support for Inline tls record
Facility to register Inline TLS drivers to net/tls. Setup
TLS_HW_RECORD prot to listen on offload device.

Cases handled
- Inline TLS device exists, setup prot for TLS_HW_RECORD
- Atleast one Inline TLS exists, sets TLS_HW_RECORD.
- If non-inline device establish connection, move to TLS_SW_TX

Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31 23:37:32 -04:00
Dave Watson c46234ebb4 tls: RX path for ktls
Add rx path for tls software implementation.

recvmsg, splice_read, and poll implemented.

An additional sockopt TLS_RX is added, with the same interface as
TLS_TX.  Either TLX_RX or TLX_TX may be provided separately, or
together (with two different setsockopt calls with appropriate keys).

Control messages are passed via CMSG in a similar way to transmit.
If no cmsg buffer is passed, then only application data records
will be passed to userspace, and EIO is returned for other types of
alerts.

EBADMSG is passed for decryption errors, and EMSGSIZE is passed for
framing too big, and EBADMSG for framing too small (matching openssl
semantics). EINVAL is returned for TLS versions that do not match the
original setsockopt call.  All are unrecoverable.

strparser is used to parse TLS framing.   Decryption is done directly
in to userspace buffers if they are large enough to support it, otherwise
sk_cow_data is called (similar to ipsec), and buffers are decrypted in
place and copied.  splice_read always decrypts in place, since no
buffers are provided to decrypt in to.

sk_poll is overridden, and only returns POLLIN if a full TLS message is
received.  Otherwise we wait for strparser to finish reading a full frame.
Actual decryption is only done during recvmsg or splice_read calls.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 12:25:54 -04:00
Dave Watson 583715853a tls: Refactor variable names
Several config variables are prefixed with tx, drop the prefix
since these will be used for both tx and rx.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 12:25:54 -04:00
Dave Watson dbe425599b tls: Move cipher info to a separate struct
Separate tx crypto parameters to a separate cipher_context struct.
The same parameters will be used for rx using the same struct.

tls_advance_record_sn is modified to only take the cipher info.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 12:25:53 -04:00
Boris Pismenny c113187d38 tls: Use correct sk->sk_prot for IPV6
The tls ulp overrides sk->prot with a new tls specific proto structs.
The tls specific structs were previously based on the ipv4 specific
tcp_prot sturct.
As a result, attaching the tls ulp to an ipv6 tcp socket replaced
some ipv6 callback with the ipv4 equivalents.

This patch adds ipv6 tls proto structs and uses them when
attached to ipv6 sockets.

Fixes: 3c4d755915 ('tls: kernel TLS support')
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:41:48 -05:00
Boris Pismenny c410c1966f tls: getsockopt return record sequence number
Return the TLS record sequence number in getsockopt.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14 15:05:19 -05:00
Boris Pismenny 257082e6ae tls: reset the crypto info if copy_from_user fails
copy_from_user could copy some partial information, as a result
TLS_CRYPTO_INFO_READY(crypto_info) could be true while crypto_info is
using uninitialzed data.

This patch resets crypto_info when copy_from_user fails.

fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14 15:05:19 -05:00
Boris Pismenny a1dfa6812b tls: retrun the correct IV in getsockopt
Current code returns four bytes of salt followed by four bytes of IV.
This patch returns all eight bytes of IV.

fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14 15:05:19 -05:00
John Fastabend b11a632c44 net: add a UID to use for ULP socket assignment
Create a UID field and enum that can be used to assign ULPs to
sockets. This saves a set of string comparisons if the ULP id
is known.

For sockmap, which is added in the next patches, a ULP is used to
hook into TCP sockets close state. In this case the ULP being added
is done at map insert time and the ULP is known and done on the kernel
side. In this case the named lookup is not needed. Because we don't
want to expose psock internals to user space socket options a user
visible flag is also added. For TLS this is set for BPF it will be
cleared.

Alos remove pr_notice, user gets an error code back and should check
that rather than rely on logs.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-06 11:39:31 +01:00
Sabrina Dubroca 6db959c82e tls: reset crypto_info when do_tls_setsockopt_tx fails
The current code copies directly from userspace to ctx->crypto_send, but
doesn't always reinitialize it to 0 on failure. This causes any
subsequent attempt to use this setsockopt to fail because of the
TLS_CRYPTO_INFO_READY check, eventhough crypto_info is not actually
ready.

This should result in a correctly set up socket after the 3rd call, but
currently it does not:

    size_t s = sizeof(struct tls12_crypto_info_aes_gcm_128);
    struct tls12_crypto_info_aes_gcm_128 crypto_good = {
        .info.version = TLS_1_2_VERSION,
        .info.cipher_type = TLS_CIPHER_AES_GCM_128,
    };

    struct tls12_crypto_info_aes_gcm_128 crypto_bad_type = crypto_good;
    crypto_bad_type.info.cipher_type = 42;

    setsockopt(sock, SOL_TLS, TLS_TX, &crypto_bad_type, s);
    setsockopt(sock, SOL_TLS, TLS_TX, &crypto_good, s - 1);
    setsockopt(sock, SOL_TLS, TLS_TX, &crypto_good, s);

Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17 16:16:04 -05:00
Sabrina Dubroca 877d17c79b tls: return -EBUSY if crypto_info is already set
do_tls_setsockopt_tx returns 0 without doing anything when crypto_info
is already set. Silent failure is confusing for users.

Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17 16:16:03 -05:00
Ilya Lesokhin d91c3e17f7 net/tls: Only attach to sockets in ESTABLISHED state
Calling accept on a TCP socket with a TLS ulp attached results
in two sockets that share the same ulp context.
The ulp context is freed while a socket is destroyed, so
after one of the sockets is released, the second second will
trigger a use after free when it tries to access the ulp context
attached to it.
We restrict the TLS ulp to sockets in ESTABLISHED state
to prevent the scenario above.

Fixes: 3c4d755915 ("tls: kernel TLS support")
Reported-by: syzbot+904e7cd6c5c741609228@syzkaller.appspotmail.com
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17 16:05:28 -05:00
Ilya Lesokhin ee181e5201 tls: don't override sk_write_space if tls_set_sw_offload fails.
If we fail to enable tls in the kernel we shouldn't override
the sk_write_space callback

Fixes: 3c4d755915 ('tls: kernel TLS support')
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-14 16:26:35 +09:00
Ilya Lesokhin 196c31b4b5 tls: Avoid copying crypto_info again after cipher_type check.
Avoid copying crypto_info again after cipher_type check
to avoid a TOCTOU exploits.
The temporary array on the stack is removed as we don't really need it

Fixes: 3c4d755915 ('tls: kernel TLS support')
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-14 16:26:34 +09:00
Ilya Lesokhin ff45d820a2 tls: Fix TLS ulp context leak, when TLS_TX setsockopt is not used.
Previously the TLS ulp context would leak if we attached a TLS ulp
to a socket but did not use the TLS_TX setsockopt,
or did use it but it failed.
This patch solves the issue by overriding prot[TLS_BASE_TX].close
and fixing tls_sk_proto_close to work properly
when its called with ctx->tx_conf == TLS_BASE_TX.
This patch also removes ctx->free_resources as we can use ctx->tx_conf
to obtain the relevant information.

Fixes: 3c4d755915 ('tls: kernel TLS support')
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-14 16:26:34 +09:00
Ilya Lesokhin 6d88207fcf tls: Add function to update the TLS socket configuration
The tx configuration is now stored in ctx->tx_conf.
And sk->sk_prot is updated trough a function
This will simplify things when we add rx
and support for different possible
tx and rx cross configurations.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-14 16:26:34 +09:00
Matthias Rosenfelder 5a3b886c3c TLS: Fix length check in do_tls_getsockopt_tx()
copy_to_user() copies the struct the pointer is pointing to, but the
length check compares against sizeof(pointer) and not sizeof(struct).
On 32-bit the size is probably the same, so it might have worked
accidentally.

Signed-off-by: Matthias Rosenfelder <mrosenfelder.lkml@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-06 10:58:19 +01:00
Dan Carpenter ac55cd6193 tls: return -EFAULT if copy_to_user() fails
The copy_to_user() function returns the number of bytes remaining but we
want to return -EFAULT here.

Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23 14:19:27 -04:00
Dave Watson 3c4d755915 tls: kernel TLS support
Software implementation of transport layer security, implemented using ULP
infrastructure.  tcp proto_ops are replaced with tls equivalents of sendmsg and
sendpage.

Only symmetric crypto is done in the kernel, keys are passed by setsockopt
after the handshake is complete.  All control messages are supported via CMSG
data - the actual symmetric encryption is the same, just the message type needs
to be passed separately.

For user API, please see Documentation patch.

Pieces that can be shared between hw and sw implementation
are in tls_main.c

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 12:12:40 -04:00