1
0
Fork 0
remarkable-linux/net/ipv4
David Howells cdfbabfb2f net: Work around lockdep limitation in sockets that use sockets
Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

 (1) If the pagefault handler decides it needs to read pages from AFS, it
     calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
     creating a call requires the socket lock:

	mmap_sem must be taken before sk_lock-AF_RXRPC

 (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
     binds the underlying UDP socket whilst holding its socket lock.
     inet_bind() takes its own socket lock:

	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

 (3) Reading from a TCP socket into a userspace buffer might cause a fault
     and thus cause the kernel to take the mmap_sem, but the TCP socket is
     locked whilst doing this:

	sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks.  The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace.  This is
a limitation in the design of lockdep.

Fix the general case by:

 (1) Double up all the locking keys used in sockets so that one set are
     used if the socket is created by userspace and the other set is used
     if the socket is created by the kernel.

 (2) Store the kern parameter passed to sk_alloc() in a variable in the
     sock struct (sk_kern_sock).  This informs sock_lock_init(),
     sock_init_data() and sk_clone_lock() as to the lock keys to be used.

     Note that the child created by sk_clone_lock() inherits the parent's
     kern setting.

 (3) Add a 'kern' parameter to ->accept() that is analogous to the one
     passed in to ->create() that distinguishes whether kernel_accept() or
     sys_accept4() was the caller and can be passed to sk_alloc().

     Note that a lot of accept functions merely dequeue an already
     allocated socket.  I haven't touched these as the new socket already
     exists before we get the parameter.

     Note also that there are a couple of places where I've made the accepted
     socket unconditionally kernel-based:

	irda_accept()
	rds_rcp_accept_one()
	tcp_accept_from_sock()

     because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal.  I wonder if these should do that so
that they use the new set of lock keys.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 18:23:27 -08:00
..
netfilter lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
Kconfig Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2017-02-16 21:25:49 -05:00
Makefile esp: Add a software GRO codepath 2017-02-15 11:04:11 +01:00
af_inet.c net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
ah4.c IPsec: do not ignore crypto err in ah4 input 2017-01-16 12:57:48 +01:00
arp.c NET: Fix /proc/net/arp for AX.25 2017-02-13 22:15:03 -05:00
cipso_ipv4.c netlabel: out of bound access in cipso_v4_validate() 2017-02-04 19:44:22 -05:00
datagram.c net: Set sk_txhash from a random number 2015-07-29 22:44:04 -07:00
devinet.c sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
esp4.c esp: Introduce a helper to setup the trailer 2017-01-17 10:23:08 +01:00
esp4_offload.c esp: Add a software GRO codepath 2017-02-15 11:04:11 +01:00
fib_frontend.c net: route: add missing nla_policy entry for RTA_MARK attribute 2017-03-01 10:25:56 -08:00
fib_lookup.h ipv4: consider TOS in fib_select_default 2015-07-24 22:46:11 -07:00
fib_rules.c switchdev: remove FIB offload infrastructure 2016-09-28 04:48:00 -04:00
fib_semantics.c ipv4: fib: Notify about nexthop status changes 2017-02-08 15:25:18 -05:00
fib_trie.c lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
fou.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-10-30 12:42:58 -04:00
gre_demux.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-06-30 05:03:36 -04:00
gre_offload.c net: add recursion limit to GRO 2016-10-20 14:32:22 -04:00
icmp.c net: for rate-limited ICMP replies save one atomic operation 2017-01-09 15:49:12 -05:00
igmp.c igmp, mld: Fix memory leak in igmpv3/mld_del_delrec() 2017-02-09 16:43:45 -05:00
inet_connection_sock.c net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
inet_diag.c tcp: remove early retransmit 2017-01-13 22:37:16 -05:00
inet_fragment.c net: disable fragment reassembly if high_thresh is zero 2016-06-05 22:56:42 -04:00
inet_hashtables.c inet: kill smallest_size and smallest_port 2017-01-18 13:04:29 -05:00
inet_timewait_sock.c ipv4: Namespaceify tcp_tw_recycle and tcp_max_tw_buckets knob 2016-12-29 11:38:31 -05:00
inetpeer.c net: Add helper function to compare inetpeer addresses 2015-08-28 13:32:36 -07:00
ip_forward.c ipv4: allow local fragmentation in ip_finish_output_gso() 2016-11-03 16:10:26 -04:00
ip_fragment.c net: rename IP_INC_STATS_BH() 2016-04-27 22:48:23 -04:00
ip_gre.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ip_input.c net: VRF: Pass original iif to ip_route_input() 2016-09-16 04:24:07 -04:00
ip_options.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ip_output.c net: rename dst_neigh_output back to neigh_output 2017-02-11 21:25:18 -05:00
ip_sockglue.c ip: fix IP_CHECKSUM handling 2017-02-21 12:23:53 -05:00
ip_tunnel.c netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
ip_tunnel_core.c lwtunnel: remove device arg to lwtunnel_build_state 2017-01-30 15:14:22 -05:00
ip_vti.c netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
ipcomp.c ipv4: coding style: comparison for equality with NULL 2015-04-03 12:11:15 -04:00
ipconfig.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ipip.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ipmr.c lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
netfilter.c netfilter: use skb_to_full_sk in ip_route_me_harder 2017-02-28 12:49:36 +01:00
ping.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-11 02:31:11 -05:00
proc.c net: add LINUX_MIB_PFMEMALLOCDROP counter 2017-02-02 23:34:19 -05:00
protocol.c net: Export inet_offloads and inet6_offloads 2014-09-19 17:15:31 -04:00
raw.c net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP 2017-02-07 13:07:47 -05:00
raw_diag.c net: ip, raw_diag -- Use jump for exiting from nested loop 2016-11-03 15:25:26 -04:00
route.c ipv4: mask tos for input route 2017-02-26 11:03:38 -05:00
syncookies.c syncookies: use SipHash in place of SHA1 2017-01-09 13:58:57 -05:00
sysctl_net_ipv4.c net: Avoid receiving packets with an l3mdev on unbound UDP sockets 2017-01-30 15:00:58 -05:00
tcp.c tcp: fix potential double free issue for fastopen_req 2017-03-02 14:05:41 -08:00
tcp_bbr.c tcp_bbr: add a state transition diagram and accompanying comment 2016-10-29 17:12:43 -04:00
tcp_bic.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_cdg.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> 2017-03-02 08:42:27 +01:00
tcp_cong.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-11-22 13:27:16 -05:00
tcp_cubic.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_dctcp.c Revert "dctcp: update cwnd on congestion event" 2016-12-06 11:34:24 -05:00
tcp_diag.c net: diag: Fix refcnt leak in error path destroying socket 2016-08-23 23:11:36 -07:00
tcp_fastopen.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-01-28 10:33:06 -05:00
tcp_highspeed.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tcp_htcp.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_hybla.c tcp: make undo_cwnd mandatory for congestion modules 2016-11-21 13:20:17 -05:00
tcp_illinois.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tcp_input.c tcp/dccp: block BH for SYN processing 2017-03-01 15:03:31 -08:00
tcp_ipv4.c tcp: fix various issues for sockets morphing to listen state 2017-03-07 13:58:33 -08:00
tcp_lp.c tcp: make undo_cwnd mandatory for congestion modules 2016-11-21 13:20:17 -05:00
tcp_metrics.c tcp: replace dst_confirm with sk_dst_confirm 2017-02-07 13:07:46 -05:00
tcp_minisocks.c tcp: account for ts offset only if tsecr not zero 2017-02-22 16:35:58 -05:00
tcp_nv.c tcp: add NV congestion control 2016-06-10 23:07:49 -07:00
tcp_offload.c gso: Support partial splitting at the frag_list pointer 2016-09-19 20:59:34 -04:00
tcp_output.c tcp: accommodate sequence number to a peer's shrunk receive window caused by precision loss in window scaling 2017-02-17 15:30:33 -05:00
tcp_probe.c tcp: Revert "tcp: tcp_probe: use spin_lock_bh()" 2017-02-21 13:26:03 -05:00
tcp_rate.c tcp: export data delivery rate 2016-09-21 00:23:00 -04:00
tcp_recovery.c tcp: enable RACK loss detection to trigger recovery 2017-01-13 22:37:16 -05:00
tcp_scalable.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tcp_timer.c tcp: fix various issues for sockets morphing to listen state 2017-03-07 13:58:33 -08:00
tcp_vegas.c tcp: make undo_cwnd mandatory for congestion modules 2016-11-21 13:20:17 -05:00
tcp_vegas.h tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_veno.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tcp_westwood.c tcp: make undo_cwnd mandatory for congestion modules 2016-11-21 13:20:17 -05:00
tcp_yeah.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tunnel4.c tunnels: correct conditional build of MPLS and IPv6 2016-07-11 13:27:06 -07:00
udp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-07 16:29:30 -05:00
udp_diag.c net: inet: diag: expose the socket mark to privileged processes. 2016-09-08 16:13:09 -07:00
udp_impl.h udplite: call proper backlog handlers 2016-11-24 15:32:14 -05:00
udp_offload.c net: add recursion limit to GRO 2016-10-20 14:32:22 -04:00
udp_tunnel.c net: Remove deprecated tunnel specific UDP offload functions 2016-06-17 20:23:32 -07:00
udplite.c udplite: call proper backlog handlers 2016-11-24 15:32:14 -05:00
xfrm4_input.c esp: Add a software GRO codepath 2017-02-15 11:04:11 +01:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c esp: Add a software GRO codepath 2017-02-15 11:04:11 +01:00
xfrm4_mode_tunnel.c ipv4: hash net ptr into fragmentation bucket selection 2015-03-25 14:07:04 -04:00
xfrm4_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-10-24 06:54:12 -07:00
xfrm4_policy.c xfrm: policy: make policy backend const 2017-02-09 10:22:19 +01:00
xfrm4_protocol.c xfrm: input: constify xfrm_input_afinfo 2017-02-09 10:22:17 +01:00
xfrm4_state.c xfrm: remove unused function 2017-01-10 10:57:12 +01:00
xfrm4_tunnel.c