1
0
Fork 0
remarkable-linux/net/ipv4
Paolo Abeni 4ec264d812 netfilter: on sockopt() acquire sock lock only in the required scope
commit 3f34cfae12 upstream.

Syzbot reported several deadlocks in the netfilter area caused by
rtnl lock and socket lock being acquired with a different order on
different code paths, leading to backtraces like the following one:

======================================================
WARNING: possible circular locking dependency detected
4.15.0-rc9+ #212 Not tainted
------------------------------------------------------
syzkaller041579/3682 is trying to acquire lock:
  (sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>] lock_sock
include/net/sock.h:1463 [inline]
  (sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>]
do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167

but task is already holding lock:
  (rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (rtnl_mutex){+.+.}:
        __mutex_lock_common kernel/locking/mutex.c:756 [inline]
        __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
        mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
        rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
        register_netdevice_notifier+0xad/0x860 net/core/dev.c:1607
        tee_tg_check+0x1a0/0x280 net/netfilter/xt_TEE.c:106
        xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:845
        check_target net/ipv6/netfilter/ip6_tables.c:538 [inline]
        find_check_entry.isra.7+0x935/0xcf0
net/ipv6/netfilter/ip6_tables.c:580
        translate_table+0xf52/0x1690 net/ipv6/netfilter/ip6_tables.c:749
        do_replace net/ipv6/netfilter/ip6_tables.c:1165 [inline]
        do_ip6t_set_ctl+0x370/0x5f0 net/ipv6/netfilter/ip6_tables.c:1691
        nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
        nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
        ipv6_setsockopt+0x115/0x150 net/ipv6/ipv6_sockglue.c:928
        udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
        sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
        SYSC_setsockopt net/socket.c:1849 [inline]
        SyS_setsockopt+0x189/0x360 net/socket.c:1828
        entry_SYSCALL_64_fastpath+0x29/0xa0

-> #0 (sk_lock-AF_INET6){+.+.}:
        lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914
        lock_sock_nested+0xc2/0x110 net/core/sock.c:2780
        lock_sock include/net/sock.h:1463 [inline]
        do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167
        ipv6_setsockopt+0xd7/0x150 net/ipv6/ipv6_sockglue.c:922
        udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
        sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
        SYSC_setsockopt net/socket.c:1849 [inline]
        SyS_setsockopt+0x189/0x360 net/socket.c:1828
        entry_SYSCALL_64_fastpath+0x29/0xa0

other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(rtnl_mutex);
                                lock(sk_lock-AF_INET6);
                                lock(rtnl_mutex);
   lock(sk_lock-AF_INET6);

  *** DEADLOCK ***

1 lock held by syzkaller041579/3682:
  #0:  (rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74

The problem, as Florian noted, is that nf_setsockopt() is always
called with the socket held, even if the lock itself is required only
for very tight scopes and only for some operation.

This patch addresses the issues moving the lock_sock() call only
where really needed, namely in ipv*_getorigdst(), so that nf_setsockopt()
does not need anymore to acquire both locks.

Fixes: 22265a5c3c ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
Reported-by: syzbot+a4c2dc980ac1af699b36@syzkaller.appspotmail.com
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25 11:05:43 +01:00
..
netfilter netfilter: on sockopt() acquire sock lock only in the required scope 2018-02-25 11:05:43 +01:00
Kconfig tcp: Set DEFAULT_TCP_CONG to bbr if DEFAULT_BBR is set 2016-11-28 12:15:00 -05:00
Makefile tcp_bbr: add BBR congestion control 2016-09-21 00:23:01 -04:00
af_inet.c igmp: Fix regression caused by igmp sysctl namespace code. 2017-08-12 19:31:22 -07:00
ah4.c IPsec: do not ignore crypto err in ah4 input 2017-11-15 15:53:15 +01:00
arp.c ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY 2018-01-31 12:55:55 +01:00
cipso_ipv4.c tcp/dccp: fix ireq->opt races 2017-11-18 11:22:22 +01:00
datagram.c net: Set sk_txhash from a random number 2015-07-29 22:44:04 -07:00
devinet.c ipv4: igmp: guard against silly MTU values 2018-01-02 20:35:10 +01:00
esp4.c esp4: Fix integrity verification when ESN are used 2016-11-30 11:09:39 +01:00
fib_frontend.c ipv4: Fix use-after-free when flushing FIB tables 2018-01-02 20:35:12 +01:00
fib_lookup.h ipv4: consider TOS in fib_select_default 2015-07-24 22:46:11 -07:00
fib_rules.c switchdev: remove FIB offload infrastructure 2016-09-28 04:48:00 -04:00
fib_semantics.c ipv4: fix NULL dereference in free_fib_info_rcu() 2017-08-30 10:21:39 +02:00
fib_trie.c net: Improve handling of failures on link and route dumps 2017-06-07 12:07:44 +02:00
fou.c net: add recursion limit to GRO 2016-10-20 14:32:22 -04:00
gre_demux.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-06-30 05:03:36 -04:00
gre_offload.c gso: fix payload length when gso_size is zero 2017-11-18 11:22:20 +01:00
icmp.c icmp: don't fail on fragment reassembly time exceeded 2017-12-20 10:07:33 +01:00
igmp.c net: igmp: add a missing rcu locking section 2018-02-13 12:35:56 +01:00
inet_connection_sock.c tcp/dccp: fix other lockdep splats accessing ireq_opt 2017-11-18 11:22:22 +01:00
inet_diag.c net: inet: diag: expose the socket mark to privileged processes. 2016-09-08 16:13:09 -07:00
inet_fragment.c Revert "net: use lib/percpu_counter API for fragmentation mem accounting" 2017-09-20 08:19:55 +02:00
inet_hashtables.c soreuseport: fix initialization race 2017-11-18 11:22:22 +01:00
inet_timewait_sock.c timers, net/ipv4/inet: Initialize connection request timers as pinned 2016-07-07 10:35:06 +02:00
inetpeer.c net: Add helper function to compare inetpeer addresses 2015-08-28 13:32:36 -07:00
ip_forward.c ipv4: allow local fragmentation in ip_finish_output_gso() 2016-11-03 16:10:26 -04:00
ip_fragment.c inet: frag: release spinlock before calling icmp_send() 2017-12-25 14:23:39 +01:00
ip_gre.c net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id() 2016-09-10 20:53:55 -07:00
ip_input.c net: VRF: Pass original iif to ip_route_input() 2016-09-16 04:24:07 -04:00
ip_options.c net: ipv4: Convert IP network timestamps to be y2038 safe 2016-03-01 17:18:44 -05:00
ip_output.c udp: consistently apply ufo or fragmentation 2017-08-12 19:31:22 -07:00
ip_sockglue.c netfilter: on sockopt() acquire sock lock only in the required scope 2018-02-25 11:05:43 +01:00
ip_tunnel.c ipv4: igmp: guard against silly MTU values 2018-01-02 20:35:10 +01:00
ip_tunnel_core.c net: Specify the owning module for lwtunnel ops 2017-02-04 09:47:11 +01:00
ip_vti.c vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit 2017-10-12 11:51:22 +02:00
ipcomp.c ipv4: coding style: comparison for equality with NULL 2015-04-03 12:11:15 -04:00
ipconfig.c net: ipconfig: fix ic_close_devs() use-after-free 2017-12-25 14:23:42 +01:00
ipip.c ipip: only increase err_count for some certain type icmp in ipip_err 2017-11-18 11:22:23 +01:00
ipmr.c ipv4: allow local fragmentation in ip_finish_output_gso() 2016-11-03 16:10:26 -04:00
netfilter.c netfilter: use skb_to_full_sk in ip_route_me_harder 2017-07-05 14:40:28 +02:00
ping.c ping: implement proper locking 2017-05-03 08:36:34 -07:00
proc.c net: Suppress the "Comparison to NULL could be written" warnings 2016-09-30 01:50:45 -04:00
protocol.c net: Export inet_offloads and inet6_offloads 2014-09-19 17:15:31 -04:00
raw.c net: ipv4: fix for a race condition in raw_sendmsg 2018-01-02 20:35:12 +01:00
route.c route: update fnhe_expires for redirect when the fnhe exists 2017-12-14 09:28:22 +01:00
syncookies.c tcp/dccp: fix ireq->opt races 2017-11-18 11:22:22 +01:00
sysctl_net_ipv4.c ipv4: make tcp_notsent_lowat sysctl knob behave as true unsigned int 2017-08-11 08:49:35 -07:00
tcp.c tcp: release sk_frag.page in tcp_disconnect 2018-02-13 12:35:56 +01:00
tcp_bbr.c tcp_bbr: fix pacing_gain to always be unity when using lt_bw 2018-02-13 12:35:56 +01:00
tcp_bic.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_cdg.c tcp: cdg: rename struct minmax in tcp_cdg.c to avoid a naming conflict 2016-09-21 00:22:59 -04:00
tcp_cong.c tcp: disallow cwnd undo when switching congestion control 2017-06-14 15:05:52 +02:00
tcp_cubic.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_dctcp.c dctcp: avoid bogus doubling of cwnd after loss 2016-10-31 15:16:28 -04:00
tcp_diag.c net: diag: Fix refcnt leak in error path destroying socket 2016-08-23 23:11:36 -07:00
tcp_fastopen.c tcp: initialize max window for a new fastopen socket 2017-02-04 09:47:10 +01:00
tcp_highspeed.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_htcp.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_hybla.c tcp: do not slow start when cwnd equals ssthresh 2015-07-09 14:22:52 -07:00
tcp_illinois.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_input.c tcp: invalidate rate samples during SACK reneging 2018-01-02 20:35:13 +01:00
tcp_ipv4.c tcp md5sig: Use skb's saddr when replying to an incoming segment 2018-01-02 20:35:11 +01:00
tcp_lp.c tcp: fix wraparound issue in tcp_lp 2017-05-14 14:00:21 +02:00
tcp_metrics.c tcp: make nla_policy const 2016-09-01 14:09:01 -07:00
tcp_minisocks.c tcp/dccp: block bh before arming time_wait timer 2017-12-16 16:25:46 +01:00
tcp_nv.c tcp_nv: fix division by zero in tcpnv_acked() 2017-11-24 08:33:40 +01:00
tcp_offload.c gso: validate gso_type in GSO handlers 2018-01-31 12:55:55 +01:00
tcp_output.c tcp: do not mangle skb->cb[] in tcp_make_synack() 2017-11-24 08:33:40 +01:00
tcp_probe.c tcp: tcp_probe: use spin_lock_bh() 2017-06-17 06:41:49 +02:00
tcp_rate.c tcp: invalidate rate samples during SACK reneging 2018-01-02 20:35:13 +01:00
tcp_recovery.c tcp: do not assume TCP code is non preemptible 2016-05-02 17:02:25 -04:00
tcp_scalable.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_timer.c net: tcp: close sock if net namespace is exiting 2018-01-31 12:55:54 +01:00
tcp_vegas.c tcp: fix under-evaluated ssthresh in TCP Vegas 2017-12-25 14:23:45 +01:00
tcp_vegas.h tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_veno.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_westwood.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_yeah.c tcp: cwnd does not increase in TCP YeAH 2016-09-08 17:16:12 -07:00
tunnel4.c tunnels: correct conditional build of MPLS and IPv6 2016-07-11 13:27:06 -07:00
udp.c soreuseport: fix initialization race 2017-11-18 11:22:22 +01:00
udp_diag.c net: inet: diag: expose the socket mark to privileged processes. 2016-09-08 16:13:09 -07:00
udp_impl.h udplite: call proper backlog handlers 2016-11-24 15:32:14 -05:00
udp_offload.c gso: validate gso_type in GSO handlers 2018-01-31 12:55:55 +01:00
udp_tunnel.c net: Remove deprecated tunnel specific UDP offload functions 2016-06-17 20:23:32 -07:00
udplite.c udplite: call proper backlog handlers 2016-11-24 15:32:14 -05:00
xfrm4_input.c netfilter: Pass net into okfn 2015-09-17 17:18:37 -07:00
xfrm4_mode_beet.c ipv4: ERROR: code indent should use tabs where possible 2013-12-26 13:43:21 -05:00
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c ipv4: hash net ptr into fragmentation bucket selection 2015-03-25 14:07:04 -04:00
xfrm4_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-10-24 06:54:12 -07:00
xfrm4_policy.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-09-12 15:52:44 -07:00
xfrm4_protocol.c xfrm4: Remove duplicate semicolon 2014-06-30 07:49:47 +02:00
xfrm4_state.c inet: make no_pmtu_disc per namespace and kill ipv4_config 2013-12-18 16:58:20 -05:00
xfrm4_tunnel.c sit: add IPv4 over IPv4 support 2013-05-31 17:19:05 -07:00