1
0
Fork 0
Commit Graph

383 Commits (redonkable)

Author SHA1 Message Date
Jeremy Cline 45c8178cf6 net: socket: fix potential spectre v1 gadget in socketcall
commit c8e8cd579b upstream.

'call' is a user-controlled value, so sanitize the array index after the
bounds check to avoid speculating past the bounds of the 'nargs' array.

Found with the help of Smatch:

net/socket.c:2508 __do_sys_socketcall() warn: potential spectre issue
'nargs' [r] (local cap)

Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-06 16:20:48 +02:00
Cong Wang 91717ffc90 socket: close race condition between sock_close() and sockfs_setattr()
[ Upstream commit 6d8c50dcb0 ]

fchownat() doesn't even hold refcnt of fd until it figures out
fd is really needed (otherwise is ignored) and releases it after
it resolves the path. This means sock_close() could race with
sockfs_setattr(), which leads to a NULL pointer dereference
since typically we set sock->sk to NULL in ->release().

As pointed out by Al, this is unique to sockfs. So we can fix this
in socket layer by acquiring inode_lock in sock_close() and
checking against NULL in sockfs_setattr().

sock_release() is called in many places, only the sock_close()
path matters here. And fortunately, this should not affect normal
sock_close() as it is only called when the last fd refcnt is gone.
It only affects sock_close() with a parallel sockfs_setattr() in
progress, which is not common.

Fixes: 86741ec254 ("net: core: Add a UID field to struct sock.")
Reported-by: shankarapailoor <shankarapailoor@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26 08:06:28 +08:00
Levin, Alexander (Sasha Levin) 2abfcdf8e7 kmemcheck: remove annotations
commit 4950276672 upstream.

Patch series "kmemcheck: kill kmemcheck", v2.

As discussed at LSF/MM, kill kmemcheck.

KASan is a replacement that is able to work without the limitation of
kmemcheck (single CPU, slow).  KASan is already upstream.

We are also not aware of any users of kmemcheck (or users who don't
consider KASan as a suitable replacement).

The only objection was that since KASAN wasn't supported by all GCC
versions provided by distros at that time we should hold off for 2
years, and try again.

Now that 2 years have passed, and all distros provide gcc that supports
KASAN, kill kmemcheck again for the very same reasons.

This patch (of 4):

Remove kmemcheck annotations, and calls to kmemcheck from the kernel.

[alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs]
  Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com
Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-22 15:42:23 +01:00
Alexei Starovoitov 6fde36d5ce bpf: introduce BPF_JIT_ALWAYS_ON config
[ upstream commit 290af86629 ]

The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.

A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."

To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64

The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_harden

v2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)

v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
  It will be sent when the trees are merged back to net-next

Considered doing:
  int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 14:03:49 +01:00
John Fastabend db5980d804 net: fixes for skb_send_sock
A couple fixes to new skb_send_sock infrastructure. However, no users
currently exist for this code (adding user in next handful of patches)
so it should not be possible to trigger a panic with existing in-kernel
code.

Fixes: 306b13eb3c ("proto_ops: Add locked held versions of sendmsg and sendpage")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:27:52 -07:00
Tom Herbert 306b13eb3c proto_ops: Add locked held versions of sendmsg and sendpage
Add new proto_ops sendmsg_locked and sendpage_locked that can be
called when the socket lock is already held. Correspondingly, add
kernel_sendmsg_locked and kernel_sendpage_locked as front end
functions.

These functions will be used in zero proxy so that we can take
the socket lock in a ULP sendmsg/sendpage and then directly call the
backend transport proto_ops functions.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 15:26:18 -07:00
David S. Miller 29fda25a2d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two minor conflicts in virtio_net driver (bug fix overlapping addition
of a helper) and MAINTAINERS (new driver edit overlapping revamp of
PHY entry).

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 10:07:50 -07:00
stephen hemminger 614d79c09e socket: fix set not used warning
The variable owned_by_user is always set, but only used
when kernel is configured with LOCKDEP enabled.

Get rid of the warning by moving the code to put the call
to owned_by_user into the the rcu_protected call.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-25 12:31:37 -07:00
Paolo Abeni 864d966424 net/socket: fix type in assignment and trim long line
The commit ffb07550c7 ("copy_msghdr_from_user(): get rid of
field-by-field copyin") introduce a new sparse warning:

net/socket.c:1919:27: warning: incorrect type in assignment (different address spaces)
net/socket.c:1919:27:    expected void *msg_control
net/socket.c:1919:27:    got void [noderef] <asn:1>*[addressable] msg_control

and a line above 80 chars, let's fix them

Fixes: ffb07550c7 ("copy_msghdr_from_user(): get rid of field-by-field copyin")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 14:17:01 -07:00
Linus Torvalds 2173bd0631 Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull network field-by-field copy-in updates from Al Viro:
 "This part of the misc compat queue was held back for review from
  networking folks and since davem has jus ACKed those..."

* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  get_compat_bpf_fprog(): don't copyin field-by-field
  get_compat_msghdr(): get rid of field-by-field copyin
  copy_msghdr_from_user(): get rid of field-by-field copyin
2017-07-15 11:06:17 -07:00
Linus Torvalds 3bad2f1c67 Merge branch 'work.misc-set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc user access cleanups from Al Viro:
 "The first pile is assorted getting rid of cargo-culted access_ok(),
  cargo-culted set_fs() and field-by-field copyouts.

  The same description applies to a lot of stuff in other branches -
  this is just the stuff that didn't fit into a more specific topical
  branch"

* 'work.misc-set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Switch flock copyin/copyout primitives to copy_{from,to}_user()
  fs/fcntl: return -ESRCH in f_setown when pid/pgid can't be found
  fs/fcntl: f_setown, avoid undefined behaviour
  fs/fcntl: f_setown, allow returning error
  lpfc debugfs: get rid of pointless access_ok()
  adb: get rid of pointless access_ok()
  isdn: get rid of pointless access_ok()
  compat statfs: switch to copy_to_user()
  fs/locks: don't mess with the address limit in compat_fcntl64
  nfsd_readlink(): switch to vfs_get_link()
  drbd: ->sendpage() never needed set_fs()
  fs/locks: pass kernel struct flock to fcntl_getlk/setlk
  fs: locks: Fix some troubles at kernel-doc comments
2017-07-05 13:13:32 -07:00
Al Viro ffb07550c7 copy_msghdr_from_user(): get rid of field-by-field copyin
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-04 13:14:33 -04:00
Jiri Slaby 393cc3f511 fs/fcntl: f_setown, allow returning error
Allow f_setown to return an error value. We will fail in the next patch
with EINVAL for bad input to f_setown, so tile the path for the later
patch.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2017-06-14 08:46:36 -04:00
Rosen, Rami 241c4667fc net: socket: fix a typo in sockfd_lookup().
This patch fixes a typo in sockfd_lookup() in net/socket.c.

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-22 12:14:04 -04:00
Miroslav Lichvar b50a5c70ff net: allow simultaneous SW and HW transmit timestamping
Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to
be looped to the socket's error queue with a software timestamp even
when a hardware transmit timestamp is expected to be provided by the
driver.

Applications using this option will receive two separate messages from
the error queue, one with a software timestamp and the other with a
hardware timestamp. As the hardware timestamp is saved to the shared skb
info, which may happen before the first message with software timestamp
is received by the application, the hardware timestamp is copied to the
SCM_TIMESTAMPING control message only when the skb has no software
timestamp or it is an incoming packet.

While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as
there are no other users.

CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21 13:37:32 -04:00
Miroslav Lichvar aad9c8c470 net: add new control message for incoming HW-timestamped packets
Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message
for incoming packets with hardware timestamps. It contains the index of
the real interface which received the packet and the length of the
packet at layer 2.

The index is useful with bonding, bridges and other interfaces, where
IP_PKTINFO doesn't allow applications to determine which PHC made the
timestamp. With the L2 length (and link speed) it is possible to
transpose preamble timestamps to trailer timestamps, which are used in
the NTP protocol.

While this information could be provided by two new socket options
independently from timestamping, it doesn't look like they would be very
useful. With this option any performance impact is limited to hardware
timestamping.

Use dev_get_by_napi_id() to get the device and its index. On kernels
with disabled CONFIG_NET_RX_BUSY_POLL or drivers not using NAPI, a zero
index will be returned in the control message.

CC: Richard Cochran <richardcochran@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21 13:37:32 -04:00
R. Parameswaran 57240d0078 l2tp: device MTU setup, tunnel socket needs a lock
The MTU overhead calculation in L2TP device set-up
merged via commit b784e7ebfc
needs to be adjusted to lock the tunnel socket while
referencing the sub-data structures to derive the
socket's IP overhead.

Reported-by: Guillaume Nault <g.nault@alphalink.fr>
Tested-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: R. Parameswaran <rparames@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 13:01:48 -04:00
R. Parameswaran 113c307593 New kernel function to get IP overhead on a socket.
A new function, kernel_sock_ip_overhead(), is provided
to calculate the cumulative overhead imposed by the IP
Header and IP options, if any, on a socket's payload.
The new function returns an overhead of zero for sockets
that do not belong to the IPv4 or IPv6 address families.
This is used in the L2TP code path to compute the
total outer IP overhead on the L2TP tunnel socket when
calculating the default MTU for Ethernet pseudowires.

Signed-off-by: R. Parameswaran <rparames@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-06 13:43:31 -07:00
Soheil Hassas Yeganeh 4ef1b28694 tcp: mark skbs with SCM_TIMESTAMPING_OPT_STATS
SOF_TIMESTAMPING_OPT_STATS can be enabled and disabled
while packets are collected on the error queue.
So, checking SOF_TIMESTAMPING_OPT_STATS in sk->sk_tsflags
is not enough to safely assume that the skb contains
OPT_STATS data.

Add a bit in sock_exterr_skb to indicate whether the
skb contains opt_stats data.

Fixes: 1c885808e4 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING")
Reported-by: JongHwan Kim <zzoru007@gmail.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 18:44:17 -07:00
Soheil Hassas Yeganeh 8605330aac tcp: fix SCM_TIMESTAMPING_OPT_STATS for normal skbs
__sock_recv_timestamp can be called for both normal skbs (for
receive timestamps) and for skbs on the error queue (for transmit
timestamps).

Commit 1c885808e4
(tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING)
assumes any skb passed to __sock_recv_timestamp are from
the error queue, containing OPT_STATS in the content of the skb.
This results in accessing invalid memory or generating junk
data.

To fix this, set skb->pkt_type to PACKET_OUTGOING for packets
on the error queue. This is safe because on the receive path
on local sockets skb->pkt_type is never set to PACKET_OUTGOING.
With that, copy OPT_STATS from a packet, only if its pkt_type
is PACKET_OUTGOING.

Fixes: 1c885808e4 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING")
Reported-by: JongHwan Kim <zzoru007@gmail.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 18:44:17 -07:00
David Howells cdfbabfb2f net: Work around lockdep limitation in sockets that use sockets
Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

 (1) If the pagefault handler decides it needs to read pages from AFS, it
     calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
     creating a call requires the socket lock:

	mmap_sem must be taken before sk_lock-AF_RXRPC

 (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
     binds the underlying UDP socket whilst holding its socket lock.
     inet_bind() takes its own socket lock:

	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

 (3) Reading from a TCP socket into a userspace buffer might cause a fault
     and thus cause the kernel to take the mmap_sem, but the TCP socket is
     locked whilst doing this:

	sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks.  The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace.  This is
a limitation in the design of lockdep.

Fix the general case by:

 (1) Double up all the locking keys used in sockets so that one set are
     used if the socket is created by userspace and the other set is used
     if the socket is created by the kernel.

 (2) Store the kern parameter passed to sk_alloc() in a variable in the
     sock struct (sk_kern_sock).  This informs sock_lock_init(),
     sock_init_data() and sk_clone_lock() as to the lock keys to be used.

     Note that the child created by sk_clone_lock() inherits the parent's
     kern setting.

 (3) Add a 'kern' parameter to ->accept() that is analogous to the one
     passed in to ->create() that distinguishes whether kernel_accept() or
     sys_accept4() was the caller and can be passed to sk_alloc().

     Note that a lot of accept functions merely dequeue an already
     allocated socket.  I haven't touched these as the new socket already
     exists before we get the parameter.

     Note also that there are a couple of places where I've made the accepted
     socket unconditionally kernel-based:

	irda_accept()
	rds_rcp_accept_one()
	tcp_accept_from_sock()

     because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal.  I wonder if these should do that so
that they use the new set of lock keys.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 18:23:27 -08:00
Alexander Potapenko 9f138fa609 net: initialize msg.msg_flags in recvfrom
KMSAN reports a use of uninitialized memory in put_cmsg() because
msg.msg_flags in recvfrom haven't been initialized properly.
The flag values don't affect the result on this path, but it's still a
good idea to initialize them explicitly.

Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 17:21:21 -08:00
Maxime Jayat e623a9e9de net: socket: fix recvmmsg not returning error from sock_error
Commit 34b88a68f2 ("net: Fix use after free in the recvmmsg exit path"),
changed the exit path of recvmmsg to always return the datagrams
variable and modified the error paths to set the variable to the error
code returned by recvmsg if necessary.

However in the case sock_error returned an error, the error code was
then ignored, and recvmmsg returned 0.

Change the error path of recvmmsg to correctly return the error code
of sock_error.

The bug was triggered by using recvmmsg on a CAN interface which was
not up. Linux 4.6 and later return 0 in this case while earlier
releases returned -ENETDOWN.

Fixes: 34b88a68f2 ("net: Fix use after free in the recvmmsg exit path")
Signed-off-by: Maxime Jayat <maxime.jayat@mobile-devices.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-21 13:35:25 -05:00
David S. Miller 02ac5d1487 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two AF_* families adding entries to the lockdep tables
at the same time.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 14:43:39 -05:00
Tobias Klauser dc647ec88e net: socket: Make unnecessarily global sockfs_setattr() static
Make sockfs_setattr() static as it is not used outside of net/socket.c

This fixes the following GCC warning:
net/socket.c:534:5: warning: no previous prototype for ‘sockfs_setattr’ [-Wmissing-prototypes]

Fixes: 86741ec254 ("net: core: Add a UID field to struct sock.")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 11:29:50 -05:00
yuan linyu 1e9116327e net: change init_inodecache() return void
sock_init() call it but not check it's return value,
so change it to void return and add an internal BUG_ON() check.

Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 12:05:29 -05:00
David S. Miller 76eb75be79 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-01-05 11:03:07 -05:00
David S. Miller ac4340fc3c net: Assert at build time the assumptions we make about the CMSG header.
It must always be the case that CMSG_ALIGN(sizeof(hdr)) == sizeof(hdr).

Otherwise there are missing adjustments in the various calculations
that parse and build these things.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04 13:24:19 -05:00
Eric Biggers e1a3a60a2e net: socket: don't set sk_uid to garbage value in ->setattr()
->setattr() was recently implemented for socket files to sync the socket
inode's uid to the new 'sk_uid' member of struct sock.  It does this by
copying over the ia_uid member of struct iattr.  However, ia_uid is
actually only valid when ATTR_UID is set in ia_valid, indicating that
the uid is being changed, e.g. by chown.  Other metadata operations such
as chmod or utimes leave ia_uid uninitialized.  Therefore, sk_uid could
be set to a "garbage" value from the stack.

Fix this by only copying the uid over when ATTR_UID is set.

Fixes: 86741ec254 ("net: core: Add a UID field to struct sock.")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-01 11:53:34 -05:00
Thomas Gleixner 2456e85535 ktime: Get rid of the union
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.

Get rid of the union and just keep ktime_t as simple typedef of type s64.

The conversion was done with coccinelle and some manual mopping up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:22 +01:00
Linus Torvalds 7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Amit Kushwaha fa1bd57a63 net: socket: removed an unnecessary newline
This patch removes a newline which was added
in socket.c file in net-next

Signed-off-by: Amit Kushwaha <kushwaha.a@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10 17:27:07 -05:00
Amit Kushwaha 846cc1231a net: socket: preferred __aligned(size) for control buffer
This patch cleanup checkpatch.pl warning
WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))

Signed-off-by: Amit Kushwaha <kushwaha.a@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 18:20:46 -05:00
Francis Yan 1c885808e4 tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING
This patch exports the sender chronograph stats via the socket
SO_TIMESTAMPING channel. Currently we can instrument how long a
particular application unit of data was queued in TCP by tracking
SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
these sender chronograph stats exported simultaneously along with
these timestamps allow further breaking down the various sender
limitation.  For example, a video server can tell if a particular
chunk of video on a connection takes a long time to deliver because
TCP was experiencing small receive window. It is not possible to
tell before this patch without packet traces.

To prepare these stats, the user needs to set
SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
while requesting other SOF_TIMESTAMPING TX timestamps. When the
timestamps are available in the error queue, the stats are returned
in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.

Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 10:04:25 -05:00
David S. Miller f9aa9dc7d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All conflicts were simple overlapping changes except perhaps
for the Thunder driver.

That driver has a change_mtu method explicitly for sending
a message to the hardware.  If that fails it returns an
error.

Normally a driver doesn't need an ndo_change_mtu method becuase those
are usually just range changes, which are now handled generically.
But since this extra operation is needed in the Thunder driver, it has
to stay.

However, if the message send fails we have to restore the original
MTU before the change because the entire call chain expects that if
an error is thrown by ndo_change_mtu then the MTU did not change.
Therefore code is added to nicvf_change_mtu to remember the original
MTU, and to restore it upon nicvf_update_hw_max_frs() failue.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22 13:27:16 -05:00
Andreas Gruenbacher 4a59015372 xattr: Fix setting security xattrs on sockfs
The IOP_XATTR flag is set on sockfs because sockfs supports getting the
"system.sockprotoname" xattr.  Since commit 6c6ef9f2, this flag is checked for
setxattr support as well.  This is wrong on sockfs because security xattr
support there is supposed to be provided by security_inode_setsecurity.  The
smack security module relies on socket labels (xattrs).

Fix this by adding a security xattr handler on sockfs that returns
-EAGAIN, and by checking for -EAGAIN in setxattr.

We cannot simply check for -EOPNOTSUPP in setxattr because there are
filesystems that neither have direct security xattr support nor support
via security_inode_setsecurity.  A more proper fix might be to move the
call to security_inode_setsecurity into sockfs, but it's not clear to me
if that is safe: we would end up calling security_inode_post_setxattr after
that as well.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-11-17 00:00:23 -05:00
David S. Miller bb598c1b8c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of bug fixes in 'net' overlapping other changes in
'net-next-.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15 10:54:36 -05:00
Soheil Hassas Yeganeh 3023898b7d sock: fix sendmmsg for partial sendmsg
Do not send the next message in sendmmsg for partial sendmsg
invocations.

sendmmsg assumes that it can continue sending the next message
when the return value of the individual sendmsg invocations
is positive. It results in corrupting the data for TCP,
SCTP, and UNIX streams.

For example, sendmmsg([["abcd"], ["efgh"]]) can result in a stream
of "aefgh" if the first sendmsg invocation sends only the first
byte while the second sendmsg goes through.

Datagram sockets either send the entire datagram or fail, so
this patch affects only sockets of type SOCK_STREAM and
SOCK_SEQPACKET.

Fixes: 228e548e60 ("net: Add sendmmsg socket system call")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:18:12 -05:00
Lorenzo Colitti 86741ec254 net: core: Add a UID field to struct sock.
Protocol sockets (struct sock) don't have UIDs, but most of the
time, they map 1:1 to userspace sockets (struct socket) which do.

Various operations such as the iptables xt_owner match need
access to the "UID of a socket", and do so by following the
backpointer to the struct socket. This involves taking
sk_callback_lock and doesn't work when there is no socket
because userspace has already called close().

Simplify this by adding a sk_uid field to struct sock whose value
matches the UID of the corresponding struct socket. The semantics
are as follows:

1. Whenever sk_socket is non-null: sk_uid is the same as the UID
   in sk_socket, i.e., matches the return value of sock_i_uid.
   Specifically, the UID is set when userspace calls socket(),
   fchown(), or accept().
2. When sk_socket is NULL, sk_uid is defined as follows:
   - For a socket that no longer has a sk_socket because
     userspace has called close(): the previous UID.
   - For a cloned socket (e.g., an incoming connection that is
     established but on which userspace has not yet called
     accept): the UID of the socket it was cloned from.
   - For a socket that has never had an sk_socket: UID 0 inside
     the user namespace corresponding to the network namespace
     the socket belongs to.

Kernel sockets created by sock_create_kern are a special case
of #1 and sk_uid is the user that created them. For kernel
sockets created at network namespace creation time, such as the
per-processor ICMP and TCP sockets, this is the user that created
the network namespace.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-04 14:45:22 -04:00
Andrey Vagin c62cce2cae net: add an ioctl to get a socket network namespace
Each socket operates in a network namespace where it has been created,
so if we want to dump and restore a socket, we have to know its network
namespace.

We have a socket_diag to get information about sockets, it doesn't
report sockets which are not bound or connected.

This patch introduces a new socket ioctl, which is called SIOCGSKNS
and used to get a file descriptor for a socket network namespace.

A task must have CAP_NET_ADMIN in a target network namespace to
use this ioctl.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31 10:56:36 -04:00
Andreas Gruenbacher fd50ecaddf vfs: Remove {get,set,remove}xattr inode operations
These inode operations are no longer used; remove them.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-07 21:48:36 -04:00
Andreas Gruenbacher bba0bd31b1 sockfs: Get rid of getxattr iop
If we allow pseudo-filesystems created with mount_pseudo to have xattr
handlers, we can replace sockfs_getxattr with a sockfs_xattr_get handler
to use the xattr handler name parsing.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-06 22:17:38 -04:00
Andreas Gruenbacher 971df15bd5 sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
The standard return value for unsupported attribute names is
-EOPNOTSUPP, as opposed to undefined but supported attributes
(-ENODATA).

Also, fail for attribute names like "system.sockprotonameXXX" and
simplify the code a bit.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-06 22:17:38 -04:00
Deepa Dinamani 766b9f928b fs: poll/select/recvmmsg: use timespec64 for timeout events
struct timespec is not y2038 safe.  Even though timespec might be
sufficient to represent timeouts, use struct timespec64 here as the plan
is to get rid of all timespec reference in the kernel.

The patch transitions the common functions: poll_select_set_timeout()
and select_estimate_accuracy() to use timespec64.  And, all the syscalls
that use these functions are transitioned in the same patch.

The restart block parameters for poll uses monotonic time.  Use
timespec64 here as well to assign timeout value.  This parameter in the
restart block need not change because this only holds the monotonic
timestamp at which timeout should occur.  And, unsigned long data type
should be big enough for this timestamp.

The system call interfaces will be handled in a separate series.

Compat interfaces need not change as timespec64 is an alias to struct
timespec on a 64 bit system.

Link: http://lkml.kernel.org/r/1461947989-21926-3-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19 19:12:14 -07:00
Linus Torvalds a7fd20d1c4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support SPI based w5100 devices, from Akinobu Mita.

   2) Partial Segmentation Offload, from Alexander Duyck.

   3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE.

   4) Allow cls_flower stats offload, from Amir Vadai.

   5) Implement bpf blinding, from Daniel Borkmann.

   6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is
      actually using FASYNC these atomics are superfluous.  From Eric
      Dumazet.

   7) Run TCP more preemptibly, also from Eric Dumazet.

   8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e
      driver, from Gal Pressman.

   9) Allow creating ppp devices via rtnetlink, from Guillaume Nault.

  10) Improve BPF usage documentation, from Jesper Dangaard Brouer.

  11) Support tunneling offloads in qed, from Manish Chopra.

  12) aRFS offloading in mlx5e, from Maor Gottlieb.

  13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo
      Leitner.

  14) Add MSG_EOR support to TCP, this allows controlling packet
      coalescing on application record boundaries for more accurate
      socket timestamp sampling.  From Martin KaFai Lau.

  15) Fix alignment of 64-bit netlink attributes across the board, from
      Nicolas Dichtel.

  16) Per-vlan stats in bridging, from Nikolay Aleksandrov.

  17) Several conversions of drivers to ethtool ksettings, from Philippe
      Reynes.

  18) Checksum neutral ILA in ipv6, from Tom Herbert.

  19) Factorize all of the various marvell dsa drivers into one, from
      Vivien Didelot

  20) Add VF support to qed driver, from Yuval Mintz"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits)
  Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m"
  Revert "phy dp83867: Make rgmii parameters optional"
  r8169: default to 64-bit DMA on recent PCIe chips
  phy dp83867: Make rgmii parameters optional
  phy dp83867: Fix compilation with CONFIG_OF_MDIO=m
  bpf: arm64: remove callee-save registers use for tmp registers
  asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions
  switchdev: pass pointer to fib_info instead of copy
  net_sched: close another race condition in tcf_mirred_release()
  tipc: fix nametable publication field in nl compat
  drivers: net: Don't print unpopulated net_device name
  qed: add support for dcbx.
  ravb: Add missing free_irq() calls to ravb_close()
  qed: Remove a stray tab
  net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings
  net: ethernet: fec-mpc52xx: use phydev from struct net_device
  bpf, doc: fix typo on bpf_asm descriptions
  stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set
  net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings
  net: ethernet: fs-enet: use phydev from struct net_device
  ...
2016-05-17 16:26:30 -07:00
Soheil Hassas Yeganeh 0a2cf20c3f tcp: remove SKBTX_ACK_TSTAMP since it is redundant
The SKBTX_ACK_TSTAMP flag is set in skb_shinfo->tx_flags when
the timestamp of the TCP acknowledgement should be reported on
error queue. Since accessing skb_shinfo is likely to incur a
cache-line miss at the time of receiving the ack, the
txstamp_ack bit was added in tcp_skb_cb, which is set iff
the SKBTX_ACK_TSTAMP flag is set for an skb. This makes
SKBTX_ACK_TSTAMP flag redundant.

Remove the SKBTX_ACK_TSTAMP and instead use the txstamp_ack bit
everywhere.

Note that this frees one bit in shinfo->tx_flags.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Suggested-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:06:10 -04:00
David S. Miller 6c61403dae Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-04-14 00:39:15 -04:00
Al Viro ce23e64013 ->getxattr(): pass dentry and inode as separate arguments
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-11 00:48:00 -04:00
Hannes Frederic Sowa 1e1d04e678 net: introduce lockdep_is_held and update various places to use it
The socket is either locked if we hold the slock spin_lock for
lock_sock_fast and unlock_sock_fast or we own the lock (sk_lock.owned
!= 0). Check for this and at the same time improve that the current
thread/cpu is really holding the lock.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07 16:44:14 -04:00
Soheil Hassas Yeganeh c14ac9451c sock: enable timestamping using control messages
Currently, SOL_TIMESTAMPING can only be enabled using setsockopt.
This is very costly when users want to sample writes to gather
tx timestamps.

Add support for enabling SO_TIMESTAMPING via control messages by
using tsflags added in `struct sockcm_cookie` (added in the previous
patches in this series) to set the tx_flags of the last skb created in
a sendmsg. With this patch, the timestamp recording bits in tx_flags
of the skbuff is overridden if SO_TIMESTAMPING is passed in a cmsg.

Please note that this is only effective for overriding the recording
timestamps flags. Users should enable timestamp reporting (e.g.,
SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_OPT_ID) using
socket options and then should ask for SOF_TIMESTAMPING_TX_*
using control messages per sendmsg to sample timestamps for each
write.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04 15:50:30 -04:00