remarkable-linux/net
Pablo Neira Ayuso e53376bef2 netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt
With this patch, the conntrack refcount is initially set to zero and
it is bumped once it is added to any of the list, so we fulfill
Eric's golden rule which is that all released objects always have a
refcount that equals zero.

Andrey Vagin reports that nf_conntrack_free can't be called for a
conntrack with non-zero ref-counter, because it can race with
nf_conntrack_find_get().

A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
ref-counter says that this conntrack is used. So when we release
a conntrack with non-zero counter, we break this assumption.

CPU1                                    CPU2
____nf_conntrack_find()
                                        nf_ct_put()
                                         destroy_conntrack()
                                        ...
                                        init_conntrack
                                         __nf_conntrack_alloc (set use = 1)
atomic_inc_not_zero(&ct->use) (use = 2)
                                         if (!l4proto->new(ct, skb, dataoff, timeouts))
                                          nf_conntrack_free(ct); (use = 2 !!!)
                                        ...
                                        __nf_conntrack_alloc (set use = 1)
 if (!nf_ct_key_equal(h, tuple, zone))
  nf_ct_put(ct); (use = 0)
   destroy_conntrack()
                                        /* continue to work with CT */

After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
race in nf_conntrack_find_get" another bug was triggered in
destroy_conntrack():

<4>[67096.759334] ------------[ cut here ]------------
<2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
...
<4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G         C ---------------    2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
<4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>]  [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
<4>[67096.760255] Call Trace:
<4>[67096.760255]  [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30
<4>[67096.760255]  [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
<4>[67096.760255]  [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
<4>[67096.760255]  [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
<4>[67096.760255]  [<ffffffff81484419>] nf_iterate+0x69/0xb0
<4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255]  [<ffffffff814845d4>] nf_hook_slow+0x74/0x110
<4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255]  [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910
<4>[67096.760255]  [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0
<4>[67096.760255]  [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140
<4>[67096.760255]  [<ffffffff81444f97>] sock_sendmsg+0x117/0x140
<4>[67096.760255]  [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60
<4>[67096.760255]  [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20
<4>[67096.760255]  [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40
<4>[67096.760255]  [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff814457c9>] sys_sendto+0x139/0x190
<4>[67096.760255]  [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200
<4>[67096.760255]  [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290
<4>[67096.760255]  [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210
<4>[67096.760255]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5

I have reused the original title for the RFC patch that Andrey posted and
most of the original patch description.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Andrew Vagin <avagin@parallels.com>
Cc: Florian Westphal <fw@strlen.de>
Reported-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-02-05 17:46:06 +01:00
..
9p
802 neigh: use NEIGH_VAR_INIT in ndo_neigh_setup functions. 2014-01-16 11:31:58 -08:00
8021q 8021q: Use ether_addr_copy 2014-01-21 18:13:04 -08:00
appletalk net: Fix some fallout from the etner_addr_copy() changes. 2014-01-21 18:57:26 -08:00
atm net: Fix some fallout from the etner_addr_copy() changes. 2014-01-21 18:57:26 -08:00
ax25 net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-01-18 00:55:41 -08:00
bluetooth net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
bridge bridge: Remove unnecessary vlan_put_tag in br_handle_vlan 2014-01-22 21:29:27 -08:00
caif net: Missing change from the ether_addr_copy() fixups. 2014-01-21 22:54:01 -08:00
can net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
ceph
core net: add and use skb_gso_transport_seglen() 2014-01-26 22:38:23 -08:00
dcb dcb: use __dev_get_by_name instead of dev_get_by_name to find interface 2014-01-14 18:50:46 -08:00
dccp ipv4: introduce hardened ip_no_pmtu_disc mode 2014-01-13 11:22:55 -08:00
decnet net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
dns_resolver
dsa dsa: Use ether_addr_copy 2014-01-21 18:13:05 -08:00
ethernet net: eth_type_trans() should use skb_header_pointer() 2014-01-16 15:30:31 -08:00
hsr
ieee802154 net: 6lowpan: fixup for code movement 2014-01-27 16:43:03 -08:00
ipv4 netfilter: nf_nat_h323: fix crash in nf_ct_unlink_expect_report() 2014-02-05 17:46:05 +01:00
ipv6 net: Fix memory leak if TPROXY used with TCP early demux 2014-01-27 16:22:11 -08:00
ipx net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
irda net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
iucv
key
l2tp ipv6: protect protocols not handling ipv4 from v4 connection/bind attempts 2014-01-21 16:59:19 -08:00
lapb
llc net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
mac80211 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-01-17 14:43:17 -05:00
mac802154
mpls
netfilter netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt 2014-02-05 17:46:06 +01:00
netlabel
netlink net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
netrom net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
openvswitch net: replace macros net_random and net_srandom with direct calls to prandom 2014-01-14 15:15:25 -08:00
packet af_packet: Add Queue mapping mode to af_packet fanout operation 2014-01-22 17:35:50 -08:00
phonet net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
rds net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
rfkill Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
rose net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
rxrpc rxrpc: out of bound read in debug code 2014-01-21 17:02:52 -08:00
sched net: add and use skb_gso_transport_seglen() 2014-01-26 22:38:23 -08:00
sctp sctp: remove macros sctp_bh_[un]lock_sock 2014-01-21 18:41:36 -08:00
sunrpc net: replace macros net_random and net_srandom with direct calls to prandom 2014-01-14 15:15:25 -08:00
tipc net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
unix net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
vmw_vsock net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
wimax
wireless Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-01-17 14:43:17 -05:00
x25 net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
compat.c
Kconfig
Makefile net: move 6lowpan compression code to separate module 2014-01-15 15:36:38 -08:00
nonet.c
socket.c
sysctl_net.c