KASAN reported this bug:
BUG: KASAN: use-after-free in icmp_packet+0x25/0x50 [nf_conntrack_ipv4] at
addr ffff880002db08c8
Read of size 4 by task lt-nf-queue/19041
Call Trace:
<IRQ> [<ffffffff815eeebb>] dump_stack+0x63/0x88
[<ffffffff813386f8>] kasan_report_error+0x528/0x560
[<ffffffff81338cc8>] kasan_report+0x58/0x60
[<ffffffffa07393f5>] ? icmp_packet+0x25/0x50 [nf_conntrack_ipv4]
[<ffffffff81337551>] __asan_load4+0x61/0x80
[<ffffffffa07393f5>] icmp_packet+0x25/0x50 [nf_conntrack_ipv4]
[<ffffffffa06ecaa0>] nf_conntrack_in+0x550/0x980 [nf_conntrack]
[<ffffffffa06ec550>] ? __nf_conntrack_confirm+0xb10/0xb10 [nf_conntrack]
[ ... ]
The main reason is that we missed to unlink the timeout objects in the
unconfirmed ct lists, so we will access the timeout objects that have
already been freed.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We forget to call nf_ct_l4proto_put when replacing the existing
timeout policy. Acctually, there's no need to get ct l4proto
before doing replace, so we can move it to a later position.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
cttimeout and acct objects are deleted from the list while traversing
it, so use list_for_each_entry is unsafe here.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In general, when we want to delete a netns, cttimeout_net_exit will
be called before ipt_unregister_table, i.e. before ctnl_timeout_put.
But after call kfree_rcu in cttimeout_net_exit, we will still decrease
the timeout object's refcnt in ctnl_timeout_put, this is incorrect,
and will cause a use after free error.
It is easy to reproduce this problem:
# while : ; do
ip netns add xxx
ip netns exec xxx nfct add timeout testx inet icmp timeout 200
ip netns exec xxx iptables -t raw -p icmp -I OUTPUT -j CT --timeout testx
ip netns del xxx
done
=======================================================================
BUG kmalloc-96 (Tainted: G B E ): Poison overwritten
-----------------------------------------------------------------------
INFO: 0xffff88002b5161e8-0xffff88002b5161e8. First byte 0x6a instead of
0x6b
INFO: Allocated in cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout]
age=104 cpu=0 pid=3330
___slab_alloc+0x4da/0x540
__slab_alloc+0x20/0x40
__kmalloc+0x1c8/0x240
cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout]
nfnetlink_rcv_msg+0x21a/0x230 [nfnetlink]
[ ... ]
So only when the refcnt decreased to 0, we call kfree_rcu to free the
timeout object. And like nfnetlink_acct do, use atomic_cmpxchg to
avoid race between ctnl_timeout_try_del and ctnl_timeout_put.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Imagine such situation, nf_conntrack_htable_size now is 4096, we are doing
ctnl_untimeout, and iterate on 3000# bucket.
Meanwhile, another user try to reduce hash size to 2048, then all nf_conn
are removed to the new hashtable. When this hash resize operation finished,
we still try to itreate ct begin from 3000# bucket, find nothing to do and
just return.
We may miss unlinking some timeout objects. And later we will end up with
invalid references to timeout object that are already gone.
So when we find that hash resize happened, try to unlink timeout objects
from the 0# bucket again.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We already include netns address in the hash and compare the netns pointers
during lookup, so even if namespaces have overlapping addresses entries
will be spread across the table.
Assuming 64k bucket size, this change saves 0.5 mbyte per namespace on a
64bit system.
NAT bysrc and expectation hash is still per namespace, those will
changed too soon.
Future patch will also make conntrack object slab cache global again.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The spin_unlock call should have been left as-is, revert.
Fixes: b16c29191d ("netfilter: nf_conntrack: use safer way to lock all buckets")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When we need to lock all buckets in the connection hashtable we'd attempt to
lock 1024 spinlocks, which is way more preemption levels than supported by
the kernel. Furthermore, this behavior was hidden by checking if lockdep is
enabled, and if it was - use only 8 buckets(!).
Fix this by using a global lock and synchronize all buckets on it when we
need to lock them all. This is pretty heavyweight, but is only done when we
need to resize the hashtable, and that doesn't happen often enough (or at all).
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The object and module refcounts are updated for each conntrack template,
however, if we delete the iptables rules and we flush the timeout
database, we may end up with invalid references to timeout object that
are just gone.
Resolve this problem by setting the timeout reference to NULL when the
custom timeout entry is removed from our base. This patch requires some
RCU trickery to ensure safe pointer handling.
This handling is similar to what we already do with conntrack helpers,
the idea is to avoid bumping the timeout object reference counter from
the packet path to avoid the cost of atomic ops.
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Default timeouts are currently set via proc/sysctl interface, the
typical pattern is a file name like:
/proc/sys/net/netfilter/nf_conntrack_PROTOCOL_timeout_STATE
This results in one entry per default protocol state timeout.
This patch simplifies this by allowing to set default protocol
timeouts via cttimeout netlink interface.
This should allow us to get rid of the existing proc/sysctl code
in the midterm.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
These are the only calls under net/ that do not check nla_parse_nested()
for its error code, but simply continue execution. If parsing of netlink
attributes fails, we should return with an error instead of continuing.
In nearly all of these calls we have a policy attached, that is being
type verified during nla_parse_nested(), which we would miss checking
for otherwise.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fix broken incomplete object dumping if the list of objects does not
fit into one single netlink message.
Reported-by: Gabriel Lazar <Gabriel.Lazar@com.utcluj.ro>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Chen Gang reports:
the length of nla_data(cda[CTA_TIMEOUT_NAME]) is not limited in server side.
And indeed, its used to strcpy to a fixed-sized buffer.
Fortunately, nfnetlink users need CAP_NET_ADMIN.
Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.
I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.
I have successfully built an allyesconfig kernel with this change.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds namespace support for cttimeout.
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces nf_conntrack_l4proto_find_get() and
nf_conntrack_l4proto_put() to fix module dependencies between
timeout objects and l4-protocol conntrack modules.
Thus, we make sure that the module cannot be removed if it is
used by any of the cttimeout objects.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch allows you to attach the timeout policy via the
CT target, it adds a new revision of the target to ensure
backward compatibility. Moreover, it also contains the glue
code to stick the timeout object defined via nfnetlink_cttimeout
to the given flow.
Example usage (it requires installing the nfct tool and
libnetfilter_cttimeout):
1) create the timeout policy:
nfct timeout add tcp-policy0 inet tcp \
established 1000 close 10 time_wait 10 last_ack 10
2) attach the timeout policy to the packet:
iptables -I PREROUTING -t raw -p tcp -j CT --timeout tcp-policy0
You have to install the following user-space software:
a) libnetfilter_cttimeout:
git://git.netfilter.org/libnetfilter_cttimeout
b) nfct:
git://git.netfilter.org/nfct
You also have to get iptables with -j CT --timeout support.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds the timeout extension, which allows you to attach
specific timeout policies to flows.
This extension is only used by the template conntrack.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds the infrastructure to add fine timeout tuning
over nfnetlink. Now you can use the NFNL_SUBSYS_CTNETLINK_TIMEOUT
subsystem to create/delete/dump timeout objects that contain some
specific timeout policy for one flow.
The follow up patches will allow you attach timeout policy object
to conntrack via the CT target and the conntrack extension
infrastructure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>