Commit graph

265 commits

Author SHA1 Message Date
Pablo Neira Ayuso 7c62234547 netfilter: nfnetlink_queue: fix compilation with NF_CONNTRACK disabled
In "9cb0176 netfilter: add glue code to integrate nfnetlink_queue and ctnetlink"
the compilation with NF_CONNTRACK disabled is broken. This patch fixes this
issue.

I have moved the conntrack part into nfnetlink_queue_ct.c to avoid
peppering the entire nfnetlink_queue.c code with ifdefs.

I also needed to rename nfnetlink_queue.c to nfnetlink_queue_pkt.c
to update the net/netfilter/Makefile to support conditional compilation
of the conntrack integration.

This patch also adds CONFIG_NETFILTER_QUEUE_CT in case you want to explicitly
disable the integration between nf_conntrack and nfnetlink_queue.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-19 04:44:57 +02:00
Pablo Neira Ayuso fd7462de46 netfilter: ctnetlink: fix NULL dereference while trying to change helper
The patch 1afc56794e: "netfilter: nf_ct_helper: implement variable
length helper private data" from Jun 7, 2012, leads to the following
Smatch complaint:

net/netfilter/nf_conntrack_netlink.c:1231 ctnetlink_change_helper()
         error: we previously assumed 'help->helper' could be null (see line 1228)

This NULL dereference can be triggered with the following sequence:

1) attach the helper for first time when the conntrack is created.
2) remove the helper module or detach the helper from the conntrack
   via ctnetlink.
3) attach helper again (the same or different one, no matter) to the
   that existing conntrack again via ctnetlink.

This patch fixes the problem by removing the use case that allows you
to re-assign again a helper for one conntrack entry via ctnetlink since
I cannot find any practical use for it.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-19 00:18:38 +02:00
Pablo Neira Ayuso ae243bee39 netfilter: ctnetlink: add CTA_HELP_INFO attribute
This attribute can be used to modify and to dump the internal
protocol information.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16 15:09:15 +02:00
Pablo Neira Ayuso 8c88f87cb2 netfilter: nfnetlink_queue: add NAT TCP sequence adjustment if packet mangled
User-space programs that receive traffic via NFQUEUE may mangle packets.
If NAT is enabled, this usually puzzles sequence tracking, leading to
traffic disruptions.

With this patch, nfnl_queue will make the corresponding NAT TCP sequence
adjustment if:

1) The packet has been mangled,
2) the NFQA_CFG_F_CONNTRACK flag has been set, and
3) NAT is detected.

There are some records on the Internet complaning about this issue:
http://stackoverflow.com/questions/260757/packet-mangling-utilities-besides-iptables

By now, we only support TCP since we have no helpers for DCCP or SCTP.
Better to add this if we ever have some helper over those layer 4 protocols.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16 15:09:08 +02:00
Pablo Neira Ayuso 9cb0176654 netfilter: add glue code to integrate nfnetlink_queue and ctnetlink
This patch allows you to include the conntrack information together
with the packet that is sent to user-space via NFQUEUE.

Previously, there was no integration between ctnetlink and
nfnetlink_queue. If you wanted to access conntrack information
from your libnetfilter_queue program, you required to query
ctnetlink from user-space to obtain it. Thus, delaying the packet
processing even more.

Including the conntrack information is optional, you can set it
via NFQA_CFG_F_CONNTRACK flag with the new NFQA_CFG_FLAGS attribute.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16 15:09:02 +02:00
Pablo Neira Ayuso 1afc56794e netfilter: nf_ct_helper: implement variable length helper private data
This patch uses the new variable length conntrack extensions.

Instead of using union nf_conntrack_help that contain all the
helper private data information, we allocate variable length
area to store the private helper data.

This patch includes the modification of all existing helpers.
It also includes a couple of include header to avoid compilation
warnings.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16 15:08:55 +02:00
Kelvie Wong 9768e1ace4 netfilter: nf_ct_expect: partially implement ctnetlink_change_expect
This refreshes the "timeout" attribute in existing expectations if one is
given.

The use case for this would be for userspace helpers to extend the lifetime
of the expectation when requested, as this is not possible right now
without deleting/recreating the expectation.

I use this specifically for forwarding DCERPC traffic through:

DCERPC has a port mapper daemon that chooses a (seemingly) random port for
future traffic to go to. We expect this traffic (with a reasonable
timeout), but sometimes the port mapper will tell the client to continue
using the same port. This allows us to extend the expectation accordingly.

Signed-off-by: Kelvie Wong <kelvie@ieee.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:40:59 +02:00
David S. Miller cc1eb43134 nf_conntrack_netlink: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:57:48 -04:00
David S. Miller 4da0bd7365 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-03-18 23:29:41 -04:00
Pablo Neira Ayuso a16a1647fa netfilter: ctnetlink: fix race between delete and timeout expiration
Kerin Millar reported hardlockups while running `conntrackd -c'
in a busy firewall. That system (with several processors) was
acting as backup in a primary-backup setup.

After several tries, I found a race condition between the deletion
operation of ctnetlink and timeout expiration. This patch fixes
this problem.

Tested-by: Kerin Millar <kerframil@gmail.com>
Reported-by: Kerin Millar <kerframil@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-17 01:47:08 -07:00
David S. Miller b2d3298e09 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-03-09 14:34:20 -08:00
Hans Schillstrom 3b988ece9b netfilter: ctnetlink: fix lockep splats
net/netfilter/nf_conntrack_proto.c:70 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
3 locks held by conntrack/3235:
nfnl_lock+0x17/0x20
netlink_dump+0x32/0x240
ctnetlink_dump_table+0x3e/0x170 [nf_conntrack_netlink]

stack backtrace:
Pid: 3235, comm: conntrack Tainted: G W  3.2.0+ #511
Call Trace:
[<ffffffff8108ce45>] lockdep_rcu_suspicious+0xe5/0x100
[<ffffffffa00ec6e1>] __nf_ct_l4proto_find+0x81/0xb0 [nf_conntrack]
[<ffffffffa0115675>] ctnetlink_fill_info+0x215/0x5f0 [nf_conntrack_netlink]
[<ffffffffa0115dc1>] ctnetlink_dump_table+0xd1/0x170 [nf_conntrack_netlink]
[<ffffffff815fbdbf>] netlink_dump+0x7f/0x240
[<ffffffff81090f9d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff815fd34f>] netlink_dump_start+0xdf/0x190
[<ffffffffa0111490>] ? ctnetlink_change_nat_seq_adj+0x160/0x160 [nf_conntrack_netlink]
[<ffffffffa0115cf0>] ? ctnetlink_get_conntrack+0x2a0/0x2a0 [nf_conntrack_netlink]
[<ffffffffa0115ad9>] ctnetlink_get_conntrack+0x89/0x2a0 [nf_conntrack_netlink]
[<ffffffff81603a47>] nfnetlink_rcv_msg+0x467/0x5f0
[<ffffffff81603a7c>] ? nfnetlink_rcv_msg+0x49c/0x5f0
[<ffffffff81603922>] ? nfnetlink_rcv_msg+0x342/0x5f0
[<ffffffff81071b21>] ? get_parent_ip+0x11/0x50
[<ffffffff816035e0>] ? nfnetlink_subsys_register+0x60/0x60
[<ffffffff815fed49>] netlink_rcv_skb+0xa9/0xd0
[<ffffffff81603475>] nfnetlink_rcv+0x15/0x20
[<ffffffff815fe70e>] netlink_unicast+0x1ae/0x1f0
[<ffffffff815fea16>] netlink_sendmsg+0x2c6/0x320
[<ffffffff815b2a87>] sock_sendmsg+0x117/0x130
[<ffffffff81125093>] ? might_fault+0x53/0xb0
[<ffffffff811250dc>] ? might_fault+0x9c/0xb0
[<ffffffff81125093>] ? might_fault+0x53/0xb0
[<ffffffff815b5991>] ? move_addr_to_kernel+0x71/0x80
[<ffffffff815b644e>] sys_sendto+0xfe/0x130
[<ffffffff815b5c94>] ? sys_bind+0xb4/0xd0
[<ffffffff817a8a0e>] ? retint_swapgs+0xe/0x13
[<ffffffff817afcd2>] system_call_fastpath+0x16/0x1b

Reported-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
2012-03-07 17:41:10 +01:00
Pablo Neira Ayuso 544d5c7d9f netfilter: ctnetlink: allow to set expectfn for expectations
This patch allows you to set expectfn which is specifically used
by the NAT side of most of the existing conntrack helpers.

I have added a symbol map that uses a string as key to look up for
the function that is attached to the expectation object. This is
the best solution I came out with to solve this issue.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:46 +01:00
Pablo Neira Ayuso 076a0ca026 netfilter: ctnetlink: add NAT support for expectations
This patch adds the missing bits to create expectations that
are created in NAT setups.
2012-03-07 17:40:44 +01:00
Pablo Neira Ayuso b8c5e52c13 netfilter: ctnetlink: allow to set expectation class
This patch allows you to set the expectation class.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:42 +01:00
Pablo Neira Ayuso 660fdb2a0f netfilter: ctnetlink: allow to set helper for new expectations
This patch allow you to set the helper for newly created
expectations based of the CTA_EXPECT_HELP_NAME attribute.
Before this, the helper set was NULL.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:40 +01:00
Pablo Neira Ayuso 8be619d1e4 netfilter: ctnetlink: remove incorrect spin_[un]lock_bh on NAT module autoload
Since 7d367e0, ctnetlink_new_conntrack is called without holding
the nf_conntrack_lock spinlock. Thus, ctnetlink_parse_nat_setup
does not require to release that spinlock anymore in the NAT module
autoload case.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 14:43:49 -05:00
David S. Miller ff4783ce78 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/sfc/rx.c

Overlapping changes in drivers/net/ethernet/sfc/rx.c, one to change
the rx_buf->is_page boolean into a set of u16 flags, and another to
adjust how ->ip_summed is initialized.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-26 21:55:51 -05:00
Pablo Neira Ayuso 0f298a285f netfilter: ctnetlink: support kernel-space dump filtering by ctmark
This patch adds CTA_MARK_MASK which, together with CTA_MARK, allows
you to selectively send conntrack entries to user-space by
returning those that match mark & mask.

With this, we can save cycles in the building and the parsing of
the entries that may be later on filtered out in user-space by using
the ctmark & mask.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-26 14:12:33 -05:00
Pablo Neira Ayuso 80d326fab5 netlink: add netlink_dump_control structure for netlink_dump_start()
Davem considers that the argument list of this interface is getting
out of control. This patch tries to address this issue following
his proposal:

struct netlink_dump_control c = { .dump = dump, .done = done, ... };

netlink_dump_start(..., &c);

Suggested by David S. Miller.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-26 14:10:06 -05:00
Jozsef Kadlecsik 7d367e0668 netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)
Marcell Zambo and Janos Farago noticed and reported that when
new conntrack entries are added via netlink and the conntrack table
gets full, soft lockup happens. This is because the nf_conntrack_lock
is held while nf_conntrack_alloc is called, which is in turn wants
to lock nf_conntrack_lock while evicting entries from the full table.

The patch fixes the soft lockup with limiting the holding of the
nf_conntrack_lock to the minimum, where it's absolutely required.
It required to extend (and thus change) nf_conntrack_hash_insert
so that it makes sure conntrack and ctnetlink do not add the same entry
twice to the conntrack table.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-02-24 12:24:15 +01:00
Pablo Neira Ayuso 279072882d Revert "netfilter: ctnetlink: fix soft lockup when netlink adds new entries"
This reverts commit af14cca162.

This patch contains a race condition between packets and ctnetlink
in the conntrack addition. A new patch to fix this issue follows up.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-02-24 12:19:57 +01:00
Jozsef Kadlecsik af14cca162 netfilter: ctnetlink: fix soft lockup when netlink adds new entries
Marcell Zambo and Janos Farago noticed and reported that when
new conntrack entries are added via netlink and the conntrack table
gets full, soft lockup happens. This is because the nf_conntrack_lock
is held while nf_conntrack_alloc is called, which is in turn wants
to lock nf_conntrack_lock while evicting entries from the full table.

The patch fixes the soft lockup with limiting the holding of the
nf_conntrack_lock to the minimum, where it's absolutely required.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-02-21 16:25:31 +01:00
Pablo Neira Ayuso 9bf04646b0 netfilter: revert user-space expectation helper support
This patch partially reverts:
3d058d7 netfilter: rework user-space expectation helper support
that was applied during the 3.2 development cycle.

After this patch, the tree remains just like before patch bc01bef,
that initially added the preliminary infrastructure.

I decided to partially revert this patch because the approach
that I proposed to resolve this problem is broken in NAT setups.
Moreover, a new infrastructure will be submitted for the 3.3.x
development cycle that resolve the existing issues while
providing a neat solution.

Since nobody has been seriously using this infrastructure in
user-space, the removal of this feature should affect any know
FOSS project (to my knowledge).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-01-16 14:01:23 +01:00
Eric Dumazet cf778b00e9 net: reintroduce missing rcu_assign_pointer() calls
commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to
RCU_INIT_POINTER) did a lot of incorrect changes, since it did a
complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x,
y).

We miss needed barriers, even on x86, when y is not NULL.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12 12:26:56 -08:00
David S. Miller 455ffa607f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-01-02 18:56:49 -05:00
Xi Wang c121638277 netfilter: ctnetlink: fix timeout calculation
The sanity check (timeout < 0) never works; the dividend is unsigned
and so is the division, which should have been a signed division.

	long timeout = (ct->timeout.expires - jiffies) / HZ;
	if (timeout < 0)
		timeout = 0;

This patch converts the time values to signed for the division.

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-31 16:59:04 +01:00
David S. Miller 7f8e3234c5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2011-12-30 13:04:14 -05:00
Pablo Neira Ayuso 1a31a4a838 netfilter: ctnetlink: fix scheduling while atomic if helper is autoloaded
This patch fixes one scheduling while atomic error:

[  385.565186] ctnetlink v0.93: registering with nfnetlink.
[  385.565349] BUG: scheduling while atomic: lt-expect_creat/16163/0x00000200

It can be triggered with utils/expect_create included in
libnetfilter_conntrack if the FTP helper is not loaded.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-24 19:49:04 +01:00
Pablo Neira Ayuso 81378f728f netfilter: ctnetlink: fix return value of ctnetlink_get_expect()
This fixes one bogus error that is returned to user-space:

libnetfilter_conntrack/utils# ./expect_get
TEST: get expectation (-1)(Unknown error 18446744073709551504)

This patch includes the correct handling for EAGAIN (nfnetlink
uses this error value to restart the operation after module
auto-loading).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-24 19:49:04 +01:00
Pablo Neira Ayuso 80e60e67bc netfilter: ctnetlink: get and zero operations must be atomic
The get and zero operations have to be done in an atomic context,
otherwise counters added between them will be lost.

This problem was spotted by Changli Gao while discussing the
nfacct infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-24 14:35:59 +01:00
Patrick McHardy cbc9f2f4fc netfilter: nf_nat: export NAT definitions to userspace
Export the NAT definitions to userspace. So far userspace (specifically,
iptables) has been copying the headers files from include/net. Also
rename some structures and definitions in preparation for IPv6 NAT.
Since these have never been officially exported, this doesn't affect
existing userspace code.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-23 14:36:43 +01:00
Pablo Neira Ayuso 3d058d7bc2 netfilter: rework user-space expectation helper support
This partially reworks bc01befdcf
which added userspace expectation support.

This patch removes the nf_ct_userspace_expect_list since now we
force to use the new iptables CT target feature to add the helper
extension for conntracks that have attached expectations from
userspace.

A new version of the proof-of-concept code to implement userspace
helpers from userspace is available at:

http://people.netfilter.org/pablo/userspace-conntrack-helpers/nf-ftp-helper-POC.tar.bz2

This patch also modifies the CT target to allow to set the
conntrack's userspace helper status flags. This flag is used
to tell the conntrack system to explicitly allocate the helper
extension.

This helper extension is useful to link the userspace expectations
with the master conntrack that is being tracked from one userspace
helper.

This feature fixes a problem in the current approach of the
userspace helper support. Basically, if the master conntrack that
has got a userspace expectation vanishes, the expectations point to
one invalid memory address. Thus, triggering an oops in the
expectation deletion event path.

I decided not to add a new revision of the CT target because
I only needed to add a new flag for it. I'll document in this
issue in the iptables manpage. I have also changed the return
value from EINVAL to EOPNOTSUPP if one flag not supported is
specified. Thus, in the future adding new features that only
require a new flag can be added without a new revision.

There is no official code using this in userspace (apart from
the proof-of-concept) that uses this infrastructure but there
will be some by beginning 2012.

Reported-by: Sam Roberts <vieuxtech@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-23 14:36:39 +01:00
Pablo Neira Ayuso c4042a339f netfilter: ctnetlink: support individual atomic-get-and-reset of counters
This allows to use the get operation to atomically get-and-reset
counters.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-18 01:31:49 +01:00
Pablo Neira Ayuso 35dba1d7f3 netfilter: ctnetlink: use expect instead of master tuple in get operation
Use the expect tuple (if possible) instead of the master tuple for
the get operation. If two or more expectations come from the same
master, the returned expectation may not be the one that user-space
is requesting.

This is how it works for the expect deletion operation.

Although I think that nobody has been seriously using this. We
accept both possibilities, using the expect tuple if possible.
I decided to do it like this to avoid breaking backward
compatibility.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-18 01:31:47 +01:00
Eric Dumazet b3e0bfa71b netfilter: nf_conntrack: use atomic64 for accounting counters
We can use atomic64_t infrastructure to avoid taking a spinlock in fast
path, and remove inaccuracies while reading values in
ctnetlink_dump_counters() and connbytes_mt() on 32bit arches.

Suggested by Pablo.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-18 01:19:19 +01:00
Pablo Neira Ayuso 70e9942f17 netfilter: nf_conntrack: make event callback registration per-netns
This patch fixes an oops that can be triggered following this recipe:

0) make sure nf_conntrack_netlink and nf_conntrack_ipv4 are loaded.
1) container is started.
2) connect to it via lxc-console.
3) generate some traffic with the container to create some conntrack
   entries in its table.
4) stop the container: you hit one oops because the conntrack table
   cleanup tries to report the destroy event to user-space but the
   per-netns nfnetlink socket has already gone (as the nfnetlink
   socket is per-netns but event callback registration is global).

To fix this situation, we make the ctnl_notifier per-netns so the
callback is registered/unregistered if the container is
created/destroyed.

Alex Bligh and Alexey Dobriyan originally proposed one small patch to
check if the nfnetlink socket is gone in nfnetlink_has_listeners,
but this is a very visited path for events, thus, it may reduce
performance and it looks a bit hackish to check for the nfnetlink
socket only to workaround this situation. As a result, I decided
to follow the bigger path choice, which seems to look nicer to me.

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-11-22 00:34:47 +01:00
Stephen Hemminger a9b3cd7f32 rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER
When assigning a NULL value to an RCU protected pointer, no barrier
is needed. The rcu_assign_pointer, used to handle that but will soon
change to not handle the special case.

Convert all rcu_assign_pointer of NULL value.

//smpl
@@ expression P; @@

- rcu_assign_pointer(P, NULL)
+ RCU_INIT_POINTER(P, NULL)

// </smpl>

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-02 04:29:23 -07:00
Greg Rose c7ac8679be rtnetlink: Compute and store minimum ifinfo dump size
The message size allocated for rtnl ifinfo dumps was limited to
a single page.  This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.

Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-06-09 20:38:07 -07:00
Pablo Neira Ayuso 315c34dae0 netfilter: ctnetlink: fix timestamp support for new conntracks
This patch fixes the missing initialization of the start time if
the timestamp support is enabled.

libnetfilter_conntrack/utils# conntrack -E &
libnetfilter_conntrack/utils# ./conntrack_create
tcp      6 109 ESTABLISHED src=1.1.1.1 dst=2.2.2.2 sport=1025 dport=21 packets=0 bytes=0 [UNREPLIED] src=2.2.2.2 dst=1.1.1.1 sport=21 dport=1025 packets=0 bytes=0 mark=0 delta-time=1303296401 use=2

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-05-10 09:48:06 +02:00
David S. Miller bd4a6974cc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-02-04 14:28:58 -08:00
David S. Miller 8fe73503fa Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2011-02-02 15:24:48 -08:00
Patrick McHardy a00f1f3686 netfilter: ctnetlink: fix ctnetlink_parse_tuple() warning
net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_parse_tuple':
net/netfilter/nf_conntrack_netlink.c:832:11: warning: comparison between 'enum ctattr_tuple' and 'enum ctattr_type'

Use ctattr_type for the 'type' parameter since that's the type of all attributes
passed to this function.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 17:26:37 +01:00
David S. Miller 5bdc22a565 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/sched/sch_hfsc.c
	net/sched/sch_htb.c
	net/sched/sch_tbf.c
2011-01-24 14:09:35 -08:00
Pablo Neira Ayuso c71caf4114 netfilter: ctnetlink: fix missing refcount increment during dumps
In 13ee6ac netfilter: fix race in conntrack between dump_table and
destroy, we recovered spinlocks to protect the dump of the conntrack
table according to reports from Stephen and acknowledgments on the
issue from Eric.

In that patch, the refcount bump that allows to keep a reference
to the current ct object was removed. However, we still decrement
the refcount for that object in the output path of
ctnetlink_dump_table():

        if (last)
                nf_ct_put(last)

Cc: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-24 19:01:07 +01:00
Patrick McHardy 14f0290ba4 Merge branch 'master' of /repos/git/net-next-2.6 2011-01-19 23:51:37 +01:00
David S. Miller b8f3ab4290 Revert "netlink: test for all flags of the NLM_F_DUMP composite"
This reverts commit 0ab03c2b14.

It breaks several things including the avahi daemon.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 13:34:20 -08:00
Pablo Neira Ayuso a992ca2a04 netfilter: nf_conntrack_tstamp: add flow-based timestamp extension
This patch adds flow-based timestamping for conntracks. This
conntrack extension is disabled by default. Basically, we use
two 64-bits variables to store the creation timestamp once the
conntrack has been confirmed and the other to store the deletion
time. This extension is disabled by default, to enable it, you
have to:

echo 1 > /proc/sys/net/netfilter/nf_conntrack_timestamp

This patch allows to save memory for user-space flow-based
loogers such as ulogd2. In short, ulogd2 does not need to
keep a hashtable with the conntrack in user-space to know
when they were created and destroyed, instead we use the
kernel timestamp. If we want to have a sane IPFIX implementation
in user-space, this nanosecs resolution timestamps are also
useful. Other custom user-space applications can benefit from
this via libnetfilter_conntrack.

This patch modifies the /proc output to display the delta time
in seconds since the flow start. You can also obtain the
flow-start date by means of the conntrack-tools.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-19 16:00:07 +01:00
Pablo Neira Ayuso f31e8d4982 netfilter: ctnetlink: fix loop in ctnetlink_get_conntrack()
This patch fixes a loop in ctnetlink_get_conntrack() that can be
triggered if you use the same socket to receive events and to
perform a GET operation. Under heavy load, netlink_unicast()
may return -EAGAIN, this error code is reserved in nfnetlink for
the module load-on-demand. Instead, we return -ENOBUFS which is
the appropriate error code that has to be propagated to
user-space.

Reported-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-13 17:03:39 +01:00
Simon Horman fee1cc0895 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 into HEAD 2011-01-13 10:29:21 +09:00
Stephen Hemminger 13ee6ac579 netfilter: fix race in conntrack between dump_table and destroy
The netlink interface to dump the connection tracking table has a race
when entries are deleted at the same time. A customer reported a crash
and the backtrace showed thatctnetlink_dump_table was running while a
conntrack entry was being destroyed.
(see https://bugzilla.vyatta.com/show_bug.cgi?id=6402).

According to RCU documentation, when using hlist_nulls the reader
must handle the case of seeing a deleted entry and not proceed
further down the linked list.  The old code would continue
which caused the scan to walk into the free list.

This patch uses locking (rather than RCU) for this operation which
is guaranteed safe, and no longer requires getting reference while
doing dump operation.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-11 23:54:42 +01:00
Jan Engelhardt 0ab03c2b14 netlink: test for all flags of the NLM_F_DUMP composite
Due to NLM_F_DUMP is composed of two bits, NLM_F_ROOT | NLM_F_MATCH,
when doing "if (x & NLM_F_DUMP)", it tests for _either_ of the bits
being set. Because NLM_F_MATCH's value overlaps with NLM_F_EXCL,
non-dump requests with NLM_F_EXCL set are mistaken as dump requests.

Substitute the condition to test for _all_ bits being set.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-09 16:25:03 -08:00
Pablo Neira Ayuso cba85b532e netfilter: fix export secctx error handling
In 1ae4de0cdf, the secctx was exported
via the /proc/net/netfilter/nf_conntrack and ctnetlink interfaces
instead of the secmark.

That patch introduced the use of security_secid_to_secctx() which may
return a non-zero value on error.

In one of my setups, I have NF_CONNTRACK_SECMARK enabled but no
security modules. Thus, security_secid_to_secctx() returns a negative
value that results in the breakage of the /proc and `conntrack -L'
outputs. To fix this, we skip the inclusion of secctx if the
aforementioned function fails.

This patch also fixes the dynamic netlink message size calculation
if security_secid_to_secctx() returns an error, since its logic is
also wrong.

This problem exists in Linux kernel >= 2.6.37.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-06 11:25:00 -08:00
Changli Gao e5fc9e7a66 netfilter: nf_conntrack: don't always initialize ct->proto
ct->proto is big(60 bytes) due to structure ip_ct_tcp, and we don't need
to initialize the whole for all the other protocols. This patch moves
proto to the end of structure nf_conn, and pushes the initialization down
to the individual protocols.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-12 17:33:17 +01:00
Linus Torvalds 5f05647dd8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits)
  bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
  vlan: Calling vlan_hwaccel_do_receive() is always valid.
  tproxy: use the interface primary IP address as a default value for --on-ip
  tproxy: added IPv6 support to the socket match
  cxgb3: function namespace cleanup
  tproxy: added IPv6 support to the TPROXY target
  tproxy: added IPv6 socket lookup function to nf_tproxy_core
  be2net: Changes to use only priority codes allowed by f/w
  tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
  tproxy: added tproxy sockopt interface in the IPV6 layer
  tproxy: added udp6_lib_lookup function
  tproxy: added const specifiers to udp lookup functions
  tproxy: split off ipv6 defragmentation to a separate module
  l2tp: small cleanup
  nf_nat: restrict ICMP translation for embedded header
  can: mcp251x: fix generation of error frames
  can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set
  can-raw: add msg_flags to distinguish local traffic
  9p: client code cleanup
  rds: make local functions/variables static
  ...

Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and
drivers/net/wireless/ath/ath9k/debug.c as per David
2010-10-23 11:47:02 -07:00
Eric Paris ff660c80d0 secmark: fix config problem when CONFIG_NF_CONNTRACK_SECMARK is not set
When CONFIG_NF_CONNTRACK_SECMARK is not set we accidentally attempt to use
the secmark fielf of struct nf_conn.  Problem is when that config isn't set
the field doesn't exist.  whoops.  Wrap the incorrect usage in the config.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21 10:13:00 +11:00
Eric Paris 1cc63249ad conntrack: export lsm context rather than internal secid via netlink
The conntrack code can export the internal secid to userspace.  These are
dynamic, can change on lsm changes, and have no meaning in userspace.  We
should instead be sending lsm contexts to userspace instead.  This patch sends
the secctx (rather than secid) to userspace over the netlink socket.  We use a
new field CTA_SECCTX and stop using the the old CTA_SECMARK field since it did
not send particularly useful information.

Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21 10:12:51 +11:00
Pablo Neira Ayuso ebbf41df4a netfilter: ctnetlink: add expectation deletion events
This patch allows to listen to events that inform about
expectations destroyed.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-19 10:19:06 +02:00
Pablo Neira Ayuso bc01befdcf netfilter: ctnetlink: add support for user-space expectation helpers
This patch adds the basic infrastructure to support user-space
expectation helpers via ctnetlink and the netfilter queuing
infrastructure NFQUEUE. Basically, this patch:

* adds NF_CT_EXPECT_USERSPACE flag to identify user-space
  created expectations. I have also added a sanity check in
  __nf_ct_expect_check() to avoid that kernel-space helpers
  may create an expectation if the master conntrack has no
  helper assigned.
* adds some branches to check if the master conntrack helper
  exists, otherwise we skip the code that refers to kernel-space
  helper such as the local expectation list and the expectation
  policy.
* allows to set the timeout for user-space expectations with
  no helper assigned.
* a list of expectations created from user-space that depends
  on ctnetlink (if this module is removed, they are deleted).
* includes USERSPACE in the /proc output for expectations
  that have been created by a user-space helper.

This patch also modifies ctnetlink to skip including the helper
name in the Netlink messages if no kernel-space helper is set
(since no user-space expectation has not kernel-space kernel
assigned).

You can access an example user-space FTP conntrack helper at:
http://people.netfilter.org/pablo/userspace-conntrack-helpers/nf-ftp-helper-userspace-POC.tar.bz

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-09-28 21:06:34 +02:00
Pablo Neira Ayuso 8b008faf92 netfilter: ctnetlink: allow to specify the expectation flags
With this patch, you can specify the expectation flags for user-space
created expectations.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-09-22 08:36:59 +02:00
Pablo Neira Ayuso bcac0dfab1 netfilter: ctnetlink: missing validation of CTA_EXPECT_ZONE attribute
This patch adds the missing validation of the CTA_EXPECT_ZONE
attribute in the ctnetlink code.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-09-22 08:35:36 +02:00
Eric Dumazet 5bfddbd46a netfilter: nf_conntrack: IPS_UNTRACKED bit
NOTRACK makes all cpus share a cache line on nf_conntrack_untracked
twice per packet. This is bad for performance.
__read_mostly annotation is also a bad choice.

This patch introduces IPS_UNTRACKED bit so that we can use later a
per_cpu untrack structure more easily.

A new helper, nf_ct_untracked_get() returns a pointer to
nf_conntrack_untracked.

Another one, nf_ct_untracked_status_or() is used by nf_nat_init() to add
IPS_NAT_DONE_MASK bits to untracked status.

nf_ct_is_untracked() prototype is changed to work on a nf_conn pointer.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-08 16:09:52 +02:00
Joe Perches 736d58e3a2 netfilter: remove unnecessary returns from void function()s
This patch removes from net/ netfilter files
all the unnecessary return; statements that precede the
last closing brace of void functions.

It does not remove the returns that are immediately
preceded by a label as gcc doesn't like that.

Done via:
$ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
  xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

Signed-off-by: Joe Perches <joe@perches.com>
[Patrick: changed to keep return statements in otherwise empty function bodies]
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-13 15:16:27 +02:00
Stephen Hemminger 654d0fbdc8 netfilter: cleanup printk messages
Make sure all printk messages have a severity level.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-13 15:02:08 +02:00
Patrick McHardy 6291055465 Merge branch 'master' of /repos/git/net-next-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
	net/ipv6/netfilter/ip6t_REJECT.c
	net/netfilter/xt_limit.c

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-20 16:02:01 +02:00
Jiri Pirko d26e6a0283 netfilter: ctnetlink: compute message size properly
Message size should be dependent on the presence of an accounting
extension, not on CONFIG_NF_CT_ACCT definition.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-01 12:39:19 +02:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Pablo Neira Ayuso 37b7ef7203 netfilter: ctnetlink: fix reliable event delivery if message building fails
This patch fixes a bug that allows to lose events when reliable
event delivery mode is used, ie. if NETLINK_BROADCAST_SEND_ERROR
and NETLINK_RECV_NO_ENOBUFS socket options are set.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 14:29:03 -07:00
Pablo Neira Ayuso a88e22adf5 netfilter: ctnetlink: fix creation of conntrack with helpers
This patch fixes a bug that triggers an assertion if you create
a conntrack entry with a helper and netfilter debugging is enabled.
Basically, we hit the assertion because the confirmation flag is
set before the conntrack extensions are added. To fix this, we
move the extension addition before the aforementioned flag is
set.

This patch also removes the possibility of setting a helper for
existing conntracks. This operation would also trigger the
assertion since we are not allowed to add new extensions for
existing conntracks. We know noone that could benefit from
this operation sanely.

Thanks to Eric Dumazet for initial posting a preliminary patch
to address this issue.

Reported-by: David Ramblewski <David.Ramblewski@atosorigin.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-19 14:24:39 +01:00
Patrick McHardy ef00f89f1e netfilter: ctnetlink: add zone support
Parse and dump the conntrack zone in ctnetlink.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-15 18:14:57 +01:00
Patrick McHardy 5d0aa2ccd4 netfilter: nf_conntrack: add support for "conntrack zones"
Normally, each connection needs a unique identity. Conntrack zones allow
to specify a numerical zone using the CT target, connections in different
zones can use the same identity.

Example:

iptables -t raw -A PREROUTING -i veth0 -j CT --zone 1
iptables -t raw -A OUTPUT -o veth1 -j CT --zone 1

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-15 18:13:33 +01:00
Patrick McHardy d1e7a03f4f netfilter: ctnetlink: dump expectation helper name
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-11 12:22:28 +01:00
Patrick McHardy d0b0268fdd netfilter: ctnetlink: add missing netlink attribute policies
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-10 15:38:33 +01:00
Patrick McHardy 9ab99d5a43 Merge branch 'master' of /repos/git/net-next-2.6
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-10 14:17:10 +01:00
Patrick McHardy d696c7bdaa netfilter: nf_conntrack: fix hash resizing with namespaces
As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash
size is global and not per namespace, but modifiable at runtime through
/sys/module/nf_conntrack/hashsize. Changing the hash size will only
resize the hash in the current namespace however, so other namespaces
will use an invalid hash size. This can cause crashes when enlarging
the hashsize, or false negative lookups when shrinking it.

Move the hash size into the per-namespace data and only use the global
hash size to initialize the per-namespace value when instanciating a
new namespace. Additionally restrict hash resizing to init_net for
now as other namespaces are not handled currently.

Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-08 11:18:07 -08:00
Patrick McHardy b2a15a604d netfilter: nf_conntrack: support conntrack templates
Support initializing selected parameters of new conntrack entries from a
"conntrack template", which is a specially marked conntrack entry attached
to the skb.

Currently the helper and the event delivery masks can be initialized this
way.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-03 14:40:17 +01:00
Patrick McHardy 0cebe4b416 netfilter: ctnetlink: support selective event delivery
Add two masks for conntrack end expectation events to struct nf_conntrack_ecache
and use them to filter events. Their default value is "all events" when the
event sysctl is on and "no events" when it is off. A following patch will add
specific initializations. Expectation events depend on the ecache struct of
their master conntrack.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-03 13:51:51 +01:00
Patrick McHardy 858b313300 netfilter: nf_conntrack: split up IPCT_STATUS event
Split up the IPCT_STATUS event into an IPCT_REPLY event, which is generated
when the IPS_SEEN_REPLY bit is set, and an IPCT_ASSURED event, which is
generated when the IPS_ASSURED bit is set.

In combination with a following patch to support selective event delivery,
this can be used for "sparse" conntrack replication: start replicating the
conntrack entry after it reached the ASSURED state and that way it's SYN-flood
resistant.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-03 13:48:53 +01:00
Patrick McHardy 794e68716b netfilter: ctnetlink: only assign helpers for matching protocols
Make sure not to assign a helper for a different network or transport
layer protocol to a connection.

Additionally change expectation deletion by helper to compare the name
directly - there might be multiple helper registrations using the same
name, currently one of them is chosen in an unpredictable manner and
only those expectations are removed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-03 13:41:29 +01:00
Patrick McHardy e578756c35 netfilter: ctnetlink: fix expectation mask dump
The protocol number is not initialized, so userspace can't interpret
the layer 4 data properly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-26 17:04:02 +01:00
Alexey Dobriyan 9592a5c01e netfilter: ctnetlink: netns support
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-13 16:04:18 +01:00
Alexey Dobriyan cd8c20b650 netfilter: nfnetlink: netns support
Make nfnl socket per-petns.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-13 16:02:14 +01:00
Patrick McHardy 3993832464 netfilter: nfnetlink: constify message attributes and headers
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-25 16:07:58 +02:00
Pablo Neira Ayuso dd7669a92c netfilter: conntrack: optional reliable conntrack event delivery
This patch improves ctnetlink event reliability if one broadcast
listener has set the NETLINK_BROADCAST_ERROR socket option.

The logic is the following: if an event delivery fails, we keep
the undelivered events in the missed event cache. Once the next
packet arrives, we add the new events (if any) to the missed
events in the cache and we try a new delivery, and so on. Thus,
if ctnetlink fails to deliver an event, we try to deliver them
once we see a new packet. Therefore, we may lose state
transitions but the userspace process gets in sync at some point.

At worst case, if no events were delivered to userspace, we make
sure that destroy events are successfully delivered. Basically,
if ctnetlink fails to deliver the destroy event, we remove the
conntrack entry from the hashes and we insert them in the dying
list, which contains inactive entries. Then, the conntrack timer
is added with an extra grace timeout of random32() % 15 seconds
to trigger the event again (this grace timeout is tunable via
/proc). The use of a limited random timeout value allows
distributing the "destroy" resends, thus, avoiding accumulating
lots "destroy" events at the same time. Event delivery may
re-order but we can identify them by means of the tuple plus
the conntrack ID.

The maximum number of conntrack entries (active or inactive) is
still handled by nf_conntrack_max. Thus, we may start dropping
packets at some point if we accumulate a lot of inactive conntrack
entries that did not successfully report the destroy event to
userspace.

During my stress tests consisting of setting a very small buffer
of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
flag, and generating lots of very small connections, I noticed
very few destroy entries on the fly waiting to be resend.

A simple way to test this patch consist of creating a lot of
entries, set a very small Netlink buffer in conntrackd (+ a patch
which is not in the git tree to set the BROADCAST_ERROR flag)
and invoke `conntrack -F'.

For expectations, no changes are introduced in this patch.
Currently, event delivery is only done for new expectations (no
events from expectation expiration, removal and confirmation).
In that case, they need a per-expectation event cache to implement
the same idea that is exposed in this patch.

This patch can be useful to provide reliable flow-accouting. We
still have to add a new conntrack extension to store the creation
and destroy time.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13 12:30:52 +02:00
Pablo Neira Ayuso a0891aa6a6 netfilter: conntrack: move event caching to conntrack extension infrastructure
This patch reworks the per-cpu event caching to use the conntrack
extension infrastructure.

The main drawback is that we consume more memory per conntrack
if event delivery is enabled. This patch is required by the
reliable event delivery that follows to this patch.

BTW, this patch allows you to enable/disable event delivery via
/proc/sys/net/netfilter/nf_conntrack_events in runtime, although
you can still disable event caching as compilation option.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13 12:26:29 +02:00
Patrick McHardy 440f0d5885 netfilter: nf_conntrack: use per-conntrack locks for protocol data
Introduce per-conntrack locks and use them instead of the global protocol
locks to avoid contention. Especially tcp_lock shows up very high in
profiles on larger machines.

This will also allow to simplify the upcoming reliable event delivery patches.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-10 14:32:47 +02:00
Pablo Neira Ayuso e34d5c1a4f netfilter: conntrack: replace notify chain by function pointer
This patch removes the notify chain infrastructure and replace it
by a simple function pointer. This issue has been mentioned in the
mailing list several times: the use of the notify chain adds
too much overhead for something that is only used by ctnetlink.

This patch also changes nfnetlink_send(). It seems that gfp_any()
returns GFP_KERNEL for user-context request, like those via
ctnetlink, inside the RCU read-side section which is not valid.
Using GFP_KERNEL is also evil since netlink may schedule(),
this leads to "scheduling while atomic" bug reports.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-03 10:32:06 +02:00
Pablo Neira Ayuso 17e6e4eac0 netfilter: conntrack: simplify event caching system
This patch simplifies the conntrack event caching system by removing
several events:

 * IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted
   since the have no clients.
 * IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter
   days.
 * IPCT_REFRESH which is not of any use since we always include the
   timeout in the messages.

After this patch, the existing events are:

 * IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify
 addition and deletion of entries.
 * IPCT_STATUS, that notes that the status bits have changes,
 eg. IPS_SEEN_REPLY and IPS_ASSURED.
 * IPCT_PROTOINFO, that reports that internal protocol information has
 changed, eg. the TCP, DCCP and SCTP protocol state.
 * IPCT_HELPER, that a helper has been assigned or unassigned to this
 entry.
 * IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this
 covers the case when a mark is set to zero.
 * IPCT_NATSEQADJ, to report that there's updates in the NAT sequence
 adjustment.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02 20:08:46 +02:00
Pablo Neira Ayuso 274d383b9c netfilter: conntrack: don't report events on module removal
During the module removal there are no possible event listeners
since ctnetlink must be removed before to allow removing
nf_conntrack. This patch removes the event reporting for the
module removal case which is not of any use in the existing code.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02 20:08:38 +02:00
Pablo Neira Ayuso 03b64f518a netfilter: ctnetlink: cleanup message-size calculation
This patch cleans up the message calculation to make it similar
to rtnetlink, moreover, it removes unneeded verbose information.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02 20:08:27 +02:00
Pablo Neira Ayuso 96bcf938dc netfilter: ctnetlink: use nlmsg_* helper function to build messages
Replaces the old macros to build Netlink messages with the
new nlmsg_*() helper functions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02 20:07:39 +02:00
Pablo Neira Ayuso f2f3e38c63 netfilter: ctnetlink: rename tuple() by nf_ct_tuple() macro definition
This patch move the internal tuple() macro definition to the
header file as nf_ct_tuple().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02 20:03:35 +02:00
Pablo Neira Ayuso 8b0a231d4d netfilter: ctnetlink: remove nowait parameter from *fill_info()
This patch is a cleanup, it removes the `nowait' parameter
from all *fill_info() function since it is always set to one.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02 20:03:34 +02:00
David S. Miller 356d6c2d55 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2009-05-05 12:00:53 -07:00
Pablo Neira Ayuso fecc1133b6 netfilter: ctnetlink: fix wrong message type in user updates
This patch fixes the wrong message type that are triggered by
user updates, the following commands:

(term1)# conntrack -I -p tcp -s 1.1.1.1 -d 2.2.2.2 -t 10 --sport 10 --dport 20 --state LISTEN
(term1)# conntrack -U -p tcp -s 1.1.1.1 -d 2.2.2.2 -t 10 --sport 10 --dport 20 --state SYN_SENT
(term1)# conntrack -U -p tcp -s 1.1.1.1 -d 2.2.2.2 -t 10 --sport 10 --dport 20 --state SYN_RECV

only trigger event message of type NEW, when only the first is NEW
while others should be UPDATE.

(term2)# conntrack -E
    [NEW] tcp      6 10 LISTEN src=1.1.1.1 dst=2.2.2.2 sport=10 dport=20 [UNREPLIED] src=2.2.2.2 dst=1.1.1.1 sport=20 dport=10 mark=0
    [NEW] tcp      6 10 SYN_SENT src=1.1.1.1 dst=2.2.2.2 sport=10 dport=20 [UNREPLIED] src=2.2.2.2 dst=1.1.1.1 sport=20 dport=10 mark=0
    [NEW] tcp      6 10 SYN_RECV src=1.1.1.1 dst=2.2.2.2 sport=10 dport=20 [UNREPLIED] src=2.2.2.2 dst=1.1.1.1 sport=20 dport=10 mark=0

This patch also removes IPCT_REFRESH from the bitmask since it is
not of any use.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-05-05 17:48:26 +02:00
Pablo Neira Ayuso 29fe1b4812 netfilter: ctnetlink: fix gcc warning during compilation
This patch fixes a (bogus?) gcc warning during compilation:

net/netfilter/nf_conntrack_netlink.c🔢 warning: 'helpname' may be used uninitialized in this function
net/netfilter/nf_conntrack_netlink.c:991: warning: 'helpname' may be used uninitialized in this function

In fact, helpname is initialized by ctnetlink_parse_help() so
I cannot see a way to use it without being initialized.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-22 02:26:37 -07:00
Pablo Neira Ayuso 150ace0db3 netfilter: ctnetlink: report error if event message allocation fails
This patch fixes an inconsistency that results in no error reports
to user-space listeners if we fail to allocate the event message.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-04-17 17:47:31 +02:00
David S. Miller 01e6de64d9 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2009-03-26 22:45:23 -07:00
Holger Eitzenberger d271e8bd8c ctnetlink: compute generic part of event more acurately
On a box with most of the optional Netfilter switches turned off some
of the NLAs are never send, e. g. secmark, mark or the conntrack
byte/packet counters.  As a worst case scenario this may possibly
still lead to ctnetlink skbs being reallocated in netlink_trim()
later, loosing all the nice effects from the previous patches.

I try to solve that (at least partly) by correctly #ifdef'ing the
NLAs in the computation.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-26 13:37:14 +01:00
Holger Eitzenberger 2732c4e45b netfilter: ctnetlink: allocate right-sized ctnetlink skb
Try to allocate a Netlink skb roughly the size of the actual
message, with the help from the l3 and l4 protocol helpers.
This is all to prevent a reallocation in netlink_trim() later.

The overhead of allocating the right-sized skb is rather small, with
ctnetlink_alloc_skb() actually being inlined away on my x86_64 box.
The size of the per-proto space is determined at registration time of
the protocol helper.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-25 21:50:59 +01:00