Commit graph

2048 commits

Author SHA1 Message Date
Jesper Dangaard Brouer 590e3f79a2 ipvs: IPv6 MTU checking cleanup and bugfix
Cleaning up the IPv6 MTU checking in the IPVS xmit code, by using
a common helper function __mtu_check_toobig_v6().

The MTU check for tunnel mode can also use this helper as
ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) is qual to
skb->len.  And the 'mtu' variable have been adjusted before
calling helper.

Notice, this also fixes a bug, as the the MTU check in ip_vs_dr_xmit_v6()
were missing a check for skb_is_gso().

This bug e.g. caused issues for KVM IPVS setups, where different
Segmentation Offloading techniques are utilized, between guests,
via the virtio driver.  This resulted in very bad performance,
due to the ICMPv6 "too big" messages didn't affect the sender.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-30 02:55:39 +02:00
David S. Miller e6acb38480 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
This is an initial merge in of Eric Biederman's work to start adding
user namespace support to the networking.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 18:54:37 -04:00
David S. Miller bf277b0cce Merge git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
This is the first batch of Netfilter and IPVS updates for your
net-next tree. Mostly cleanups for the Netfilter side. They are:

* Remove unnecessary RTNL locking now that we have support
  for namespace in nf_conntrack, from Patrick McHardy.

* Cleanup to eliminate unnecessary goto in the initialization
  path of several Netfilter tables, from Jean Sacren.

* Another cleanup from Wu Fengguang, this time to PTR_RET instead
  of if IS_ERR then return PTR_ERR.

* Use list_for_each_entry_continue_rcu in nf_iterate, from
  Michael Wang.

* Add pmtu_disc sysctl option to disable PMTU in their tunneling
  transmitter, from Julian Anastasov.

* Generalize application protocol registration in IPVS and modify
  IPVS FTP helper to use it, from Julian Anastasov.

* update Kconfig. The IPVS FTP helper depends on the Netfilter FTP
  helper for NAT support, from Julian Anastasov.

* Add logic to update PMTU for IPIP packets in IPVS, again
  from Julian Anastasov.

* A couple of sparse warning fixes for IPVS and Netfilter from
  Claudiu Ghioc and Patrick McHardy respectively.

Patrick's IPv6 NAT changes will follow after this batch, I need
to flush this batch first before refreshing my tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 18:48:52 -07:00
Michael Wang 6705e86724 netfilter: replace list_for_each_continue_rcu with new interface
This patch replaces list_for_each_continue_rcu() with
list_for_each_entry_continue_rcu() to allow removing
list_for_each_continue_rcu().

Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-22 19:17:20 +02:00
Patrick McHardy 2834a6386b netfilter: nf_conntrack: remove unnecessary RTNL locking
Locking the rtnl was added to nf_conntrack_l{3,4}_proto_unregister()
for walking the network namespace list. This is not done anymore since
we have proper namespace support in the protocols now, so we don't
need to take the RTNL anymore.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-20 12:46:29 +02:00
Patrick McHardy fe31d1a860 netfilter: sparse endian fixes
Fix a couple of endian annotation in net/netfilter:

net/netfilter/nfnetlink_acct.c:82:30: warning: cast to restricted __be64
net/netfilter/nfnetlink_acct.c:86:30: warning: cast to restricted __be64
net/netfilter/nfnetlink_cthelper.c:77:28: warning: cast to restricted __be16
net/netfilter/xt_NFQUEUE.c:46:16: warning: restricted __be32 degrades to integer
net/netfilter/xt_NFQUEUE.c:60:34: warning: restricted __be32 degrades to integer
net/netfilter/xt_NFQUEUE.c:68:34: warning: restricted __be32 degrades to integer
net/netfilter/xt_osf.c:272:55: warning: cast to restricted __be16

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-20 12:45:57 +02:00
Patrick McHardy 2dba62c30e netfilter: nfnetlink_log: fix NLA_PUT macro removal bug
Commit 1db20a52 (nfnetlink_log: Stop using NLA_PUT*().) incorrectly
converted a NLA_PUT_BE16 macro to nla_put_be32() in nfnetlink_log:

-               NLA_PUT_BE16(inst->skb, NFULA_HWTYPE, htons(skb->dev->type));
+               if (nla_put_be32(inst->skb, NFULA_HWTYPE, htons(skb->dev->type))

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-20 12:40:23 +02:00
David S. Miller 6c71bec66a Merge git://1984.lsi.us.es/nf
Pable Neira Ayuso says:

====================
The following five patches contain fixes for 3.6-rc, they are:

* Two fixes for message parsing in the SIP conntrack helper, from
  Patrick McHardy.

* One fix for the SIP helper introduced in the user-space cthelper
  infrastructure, from Patrick McHardy.

* fix missing appropriate locking while modifying one conntrack entry
  from the nfqueue integration code, from myself.

* fix possible access to uninitiliazed timer in the nf_conntrack
  expectation infrastructure, from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:44:29 -07:00
Pablo Neira Ayuso 2614f86490 netfilter: nf_ct_expect: fix possible access to uninitialized timer
In __nf_ct_expect_check, the function refresh_timer returns 1
if a matching expectation is found and its timer is successfully
refreshed. This results in nf_ct_expect_related returning 0.
Note that at this point:

- the passed expectation is not inserted in the expectation table
  and its timer was not initialized, since we have refreshed one
  matching/existing expectation.

- nf_ct_expect_alloc uses kmem_cache_alloc, so the expectation
  timer is in some undefined state just after the allocation,
  until it is appropriately initialized.

This can be a problem for the SIP helper during the expectation
addition:

 ...
 if (nf_ct_expect_related(rtp_exp) == 0) {
         if (nf_ct_expect_related(rtcp_exp) != 0)
                 nf_ct_unexpect_related(rtp_exp);
 ...

Note that nf_ct_expect_related(rtp_exp) may return 0 for the timer refresh
case that is detailed above. Then, if nf_ct_unexpect_related(rtcp_exp)
returns != 0, nf_ct_unexpect_related(rtp_exp) is called, which does:

 spin_lock_bh(&nf_conntrack_lock);
 if (del_timer(&exp->timeout)) {
         nf_ct_unlink_expect(exp);
         nf_ct_expect_put(exp);
 }
 spin_unlock_bh(&nf_conntrack_lock);

Note that del_timer always returns false if the timer has been
initialized.  However, the timer was not initialized since setup_timer
was not called, therefore, the expectation timer remains in some
undefined state. If I'm not missing anything, this may lead to the
removal an unexistent expectation.

To fix this, the optimization that allows refreshing an expectation
is removed. Now nf_conntrack_expect_related looks more consistent
to me since it always add the expectation in case that it returns
success.

Thanks to Patrick McHardy for participating in the discussion of
this patch.

I think this may be the source of the problem described by:
http://marc.info/?l=netfilter-devel&m=134073514719421&w=2

Reported-by: Rafal Fitt <rafalf@aplusc.com.pl>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-16 11:49:53 +02:00
Mathias Krause 2d8a041b7b ipvs: fix info leak in getsockopt(IP_VS_SO_GET_TIMEOUT)
If at least one of CONFIG_IP_VS_PROTO_TCP or CONFIG_IP_VS_PROTO_UDP is
not set, __ip_vs_get_timeouts() does not fully initialize the structure
that gets copied to userland and that for leaks up to 12 bytes of kernel
stack. Add an explicit memset(0) before passing the structure to
__ip_vs_get_timeouts() to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:31 -07:00
Eric W. Biederman 26711a791e userns: xt_owner: Add basic user namespace support.
- Only allow adding matches from the initial user namespace
- Add the appropriate conversion functions to handle matches
  against sockets in other user namespaces.

Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:30 -07:00
Eric W. Biederman da7428080a userns xt_recent: Specify the owner/group of ip_list_perms in the initial user namespace
xt_recent creates a bunch of proc files and initializes their uid
and gids to the values of ip_list_uid and ip_list_gid.  When
initialize those proc files convert those values to kuids so they
can continue to reside on the /proc inode.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jan Engelhardt <jengelh@medozas.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:29 -07:00
Eric W. Biederman 8c6e2a941a userns: Convert xt_LOG to print socket kuids and kgids as uids and gids
xt_LOG always writes messages via sb_add via printk.  Therefore when
xt_LOG logs the uid and gid of a socket a packet came from the
values should be converted to be in the initial user namespace.

Thus making xt_LOG as user namespace safe as possible.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:29 -07:00
Eric W. Biederman 9eea9515cb userns: nfnetlink_log: Report socket uids in the log sockets user namespace
At logging instance creation capture the peer netlink socket's user
namespace. Use the captured peer user namespace when reporting socket
uids to the peer.

The peer socket's user namespace is guaranateed to be valid until the user
closes the netlink socket.  nfnetlink_log removes instances during the final
close of a socket.  __build_packet_message does not get called after an
instance is destroyed.   Therefore it is safe to let the peer netlink socket
take care of the user namespace reference counting for us.

Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:27 -07:00
Pablo Neira Ayuso 68e035c950 netfilter: ctnetlink: fix missing locking while changing conntrack from nfqueue
Since 9cb017665 netfilter: add glue code to integrate nfnetlink_queue and
ctnetlink, we can modify the conntrack entry via nfnl_queue. However, the
change of the conntrack entry via nfnetlink_queue requires appropriate
locking to avoid concurrent updates.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-14 12:54:45 +02:00
Patrick McHardy 02b69cbdc2 netfilter: nf_ct_sip: fix IPv6 address parsing
Within SIP messages IPv6 addresses are enclosed in square brackets in most
cases, with the exception of the "received=" header parameter. Currently
the helper fails to parse enclosed addresses.

This patch:

- changes the SIP address parsing function to enforce square brackets
  when required, and accept them when not required but present, as
  recommended by RFC 5118.

- adds a new SDP address parsing function that never accepts square
  brackets since SDP doesn't use them.

With these changes, the SIP helper correctly parses all test messages
from RFC 5118 (Session Initiation Protocol (SIP) Torture Test Messages
for Internet Protocol Version 6 (IPv6)).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-10 11:53:11 +02:00
Patrick McHardy e9324b2ce6 netfilter: nf_ct_sip: fix helper name
Commit 3a8fc53a (netfilter: nf_ct_helper: allocate 16 bytes for the helper
and policy names) introduced a bug in the SIP helper, the helper name is
sprinted to the sip_names array instead of instead of into the helper
structure. This breaks the helper match and the /proc/net/nf_conntrack_expect
output.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-10 11:53:03 +02:00
Julian Anastasov 3654e61137 ipvs: add pmtu_disc option to disable IP DF for TUN packets
Disabling PMTU discovery can increase the output packet
rate but some users have enough resources and prefer to fragment
than to drop traffic. By default, we copy the DF bit but if
pmtu_disc is disabled we do not send FRAG_NEEDED messages anymore.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-08-10 10:35:07 +09:00
Julian Anastasov f2edb9f770 ipvs: implement passive PMTUD for IPIP packets
IPVS is missing the logic to update PMTU in routing
for its IPIP packets. We monitor the dst_mtu and can return
FRAG_NEEDED messages but if the tunneled packets get ICMP
error we can not rely on other traffic to save the lowest
MTU.

	The following patch adds ICMP handling for IPIP
packets in incoming direction, from some remote host to
our local IP used as saddr in the outer header. By this
way we can forward any related ICMP traffic if it is for IPVS
TUN connection. For the special case of PMTUD we update the
routing and if client requested DF we can forward the
error.

	To properly update the routing we have to bind
the cached route (dest->dst_cache) to the selected saddr
because ipv4_update_pmtu uses saddr for dst lookup.
Add IP_VS_RT_MODE_CONNECT flag to force such binding with
second route.

	Update ip_vs_tunnel_xmit to provide IP_VS_RT_MODE_CONNECT
and change the code to copy DF. For now we prefer not to
force PMTU discovery (outer DF=1) because we don't have
configuration option to enable or disable PMTUD. As we
do not keep any packets to resend, we prefer not to
play games with packets without DF bit because the sender
is not informed when they are rejected.

	Also, change ops->update_pmtu to be called only
for local clients because there is no point to update
MTU for input routes, in our case skb->dst->dev is lo.
It seems the code is copied from ipip.c where the skb
dst points to tunnel device.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-08-10 10:35:03 +09:00
Claudiu Ghioc 2b2d280817 ipvs: fixed sparse warning
Removed the following sparse warnings, wether CONFIG_SYSCTL
is defined or not:
*       warning: symbol 'ip_vs_control_net_init_sysctl' was not
	declared. Should it be static?
*       warning: symbol 'ip_vs_control_net_cleanup_sysctl' was
	not declared. Should it be static?

Signed-off-by: Claudiu Ghioc <claudiu.ghioc@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-08-10 10:34:51 +09:00
Julian Anastasov be97fdb5fb ipvs: generalize app registration in netns
Get rid of the ftp_app pointer and allow applications
to be registered without adding fields in the netns_ipvs structure.

v2: fix coding style as suggested by Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-08-10 10:34:51 +09:00
Julian Anastasov aaea4ed74d ipvs: ip_vs_ftp depends on nf_conntrack_ftp helper
The FTP application indirectly depends on the
nf_conntrack_ftp helper for proper NAT support. If the
module is not loaded, IPVS can resize the packets for the
command connection, eg. PASV response but the SEQ adjustment
logic in ipv4_confirm is not called without helper.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-08-10 10:34:51 +09:00
David S. Miller abaa72d7fd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
2012-07-19 11:17:30 -07:00
David S. Miller 6700c2709c net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}()
This will be used so that we can compose a full flow key.

Even though we have a route in this context, we need more.  In the
future the routes will be without destination address, source address,
etc. keying.  One ipv4 route will cover entire subnets, etc.

In this environment we have to have a way to possess persistent storage
for redirects and PMTU information.  This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here.  Using that flow key will do a fib_lookup()
and create/update the persistent entry.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17 03:29:28 -07:00
Julian Anastasov 283283c4da ipvs: fix oops in ip_vs_dst_event on rmmod
After commit 39f618b4fd (3.4)
"ipvs: reset ipvs pointer in netns" we can oops in
ip_vs_dst_event on rmmod ip_vs because ip_vs_control_cleanup
is called after the ipvs_core_ops subsys is unregistered and
net->ipvs is NULL. Fix it by exiting early from ip_vs_dst_event
if ipvs is NULL. It is safe because all services and dests
for the net are already freed.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-07-17 12:00:58 +02:00
David S. Miller 04c9f416e3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/batman-adv/bridge_loop_avoidance.c
	net/batman-adv/bridge_loop_avoidance.h
	net/batman-adv/soft-interface.c
	net/mac80211/mlme.c

With merge help from Antonio Quartulli (batman-adv) and
Stephen Rothwell (drivers/net/usb/qmi_wwan.c).

The net/mac80211/mlme.c conflict seemed easy enough, accounting for a
conversion to some new tracing macros.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 23:56:33 -07:00
Ben Hutchings 2c53040f01 net: Fix (nearly-)kernel-doc comments for various functions
Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 23:13:45 -07:00
Jozsef Kadlecsik a73f89a61f netfilter: ipset: timeout fixing bug broke SET target special timeout value
The patch "127f559 netfilter: ipset: fix timeout value overflow bug"
broke the SET target when no timeout was specified.

Reported-by: Jean-Philippe Menil <jean-philippe.menil@univ-nantes.fr>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-07-09 10:53:04 +02:00
David S. Miller d3a5ea6e21 Merge branch 'master' of git://1984.lsi.us.es/nf-next 2012-07-07 16:18:50 -07:00
David S. Miller c90a9bb907 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-07-05 03:44:25 -07:00
Krishna Kumar 46ba5a25f5 netfilter: nfnetlink_queue: do not allow to set unsupported flag bits
Allow setting of only supported flag bits in queue->flags.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-07-04 19:51:50 +02:00
Tomasz Bursztyka 59560a38a3 netfilter: nfnetlink: check callbacks before using those in nfnetlink_rcv_msg
nfnetlink_rcv_msg() might call a NULL callback which will cause NULL pointer
dereference.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-07-04 19:47:53 +02:00
Pablo Neira Ayuso be0593c678 netfilter: nf_ct_tcp: missing per-net support for cttimeout
This patch adds missing per-net support for the cttimeout
infrastructure to TCP.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
2012-07-04 19:37:42 +02:00
Pablo Neira Ayuso 08911475d1 netfilter: nf_conntrack: generalize nf_ct_l4proto_net
This patch generalizes nf_ct_l4proto_net by splitting it into chunks and
moving the corresponding protocol part to where it really belongs to.

To clarify, note that we follow two different approaches to support per-net
depending if it's built-in or run-time loadable protocol tracker.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
2012-07-04 19:37:22 +02:00
Pablo Neira Ayuso 03292745b0 netlink: add nlk->netlink_bind hook for module auto-loading
This patch adds a hook in the binding path of netlink.

This is used by ctnetlink to allow module autoloading for the case
in which one user executes:

 conntrack -E

So far, this resulted in nfnetlink loaded, but not
nf_conntrack_netlink.

I have received in the past many complains on this behaviour.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-29 16:46:06 -07:00
Pablo Neira Ayuso a31f2d17b3 netlink: add netlink_kernel_cfg parameter to netlink_kernel_create
This patch adds the following structure:

struct netlink_kernel_cfg {
        unsigned int    groups;
        void            (*input)(struct sk_buff *skb);
        struct mutex    *cb_mutex;
};

That can be passed to netlink_kernel_create to set optional configurations
for netlink kernel sockets.

I've populated this structure by looking for NULL and zero parameters at the
existing code. The remaining parameters that always need to be set are still
left in the original interface.

That includes optional parameters for the netlink socket creation. This allows
easy extensibility of this interface in the future.

This patch also adapts all callers to use this new interface.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-29 16:46:02 -07:00
Tomasz Bursztyka 4009e18851 netfilter: nfnetlink: fix missing rcu_read_unlock in nfnetlink_rcv_msg
Bug added in commit 6b75e3e8d6 (netfilter: nfnetlink: add RCU in
nfnetlink_rcv_msg())

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-29 13:04:16 +02:00
Tomasz Bursztyka d31f4d448f netfilter: ipset: fix crash if IPSET_CMD_NONE command is sent
This patch fixes a crash if that ipset command is sent over nfnetlink.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-29 13:04:04 +02:00
Gao feng 54b8873f7c netfilter: nf_ct_dccp: add dccp_kmemdup_sysctl_table function
This patch is a cleanup. It adds dccp_kmemdup_sysctl_table to
split code into smaller chunks. Yet it prepares introduction
of nf_conntrack_proto_*_sysctl.c.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-27 19:14:31 +02:00
Gao feng 22ac03772f netfilter: nf_ct_generic: add generic_kmemdup_sysctl_table function
This patch is a cleanup. It adds generic_kmemdup_sysctl_table to
split code into smaller chunks. Yet it prepares introduction
of nf_conntrack_proto_*_sysctl.c.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-27 19:13:31 +02:00
Gao feng f42c4183c7 netfilter: nf_ct_sctp: merge sctpv[4,6]_net_init into sctp_net_init
Merge sctpv4_net_init and sctpv6_net_init into sctp_net_init to
remove redundant code now that we have the u_int16_t proto
parameter.

And use nf_proto_net.users to identify if it's the first time
we use the nf_proto_net, in that case, we initialize i

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-27 19:13:31 +02:00
Gao feng 51b4c824fc netfilter: nf_ct_udplite: add udplite_kmemdup_sysctl_table function
This cleans up nf_conntrack_l4proto_udplite[4,6] and it prepares
the moving of the sysctl code to nf_conntrack_proto_*_sysctl.c
to reduce the ifdef pollution.

And use nf_proto_net.users to identify if it's the first time
we use the nf_proto_net, in that case, we initialize it.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-27 19:12:52 +02:00
Gao feng dee7364e0e netfilter: nf_ct_udp: merge udpv[4,6]_net_init into udp_net_init
Merge udpv4_net_init and udpv6_net_init into udp_net_init to
remove redundant code now that we have the u_int16_t proto
parameter.

And use nf_proto_net.users to identify if it's the first time
we use the nf_proto_net, in that case, we initialize it.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-27 19:05:33 +02:00
Gao feng efa758fe2c netfilter: nf_ct_tcp: merge tcpv[4,6]_net_init into tcp_net_init
Merge tcpv4_net_init and tcpv6_net_init into tcp_net_init to
remove redundant code now that we have the u_int16_t proto
parameter.

And use nf_proto_net.users to identify if it's the first time
we use the nf_proto_net, in that case, we initialize it.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-27 19:05:05 +02:00
Gao feng 12c26df35e netfilter: nf_conntrack: fix memory leak if sysctl registration fails
In nf_ct_l4proto_register_sysctl, if l4proto sysctl registration
fails, we have to make sure that we release the compat sysctl
table.

This can happen if TCP has been registered compat for IPv4, and
IPv6 compat registration fails.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-27 18:55:22 +02:00
Gao feng fa34fff5e6 netfilter: nf_conntrack: use l4proto->users as refcount for per-net data
Currently, nf_proto_net's l4proto->users meaning is quite confusing
since it depends on the compilation tweaks.

To resolve this, we cleanup this code to regard it as the refcount
for l4proto's per-net data, since there may be two l4protos use the
same per-net data.

Thus, we increment pn->users when nf_conntrack_l4proto_register
successfully, and decrement it for nf_conntrack_l4_unregister case.

The users refcnt is not required form layer 3 protocol trackers.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-27 18:46:00 +02:00
Gao feng f28997e27a netfilter: nf_conntrack: add nf_ct_kfree_compat_sysctl_table
This patch is a cleanup.

It adds nf_ct_kfree_compat_sysctl_table to release l4proto's
compat sysctl table and set the compat sysctl table point to NULL.

This new function will be used by follow-up patches.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-27 18:36:25 +02:00
Gao feng f1caad2745 netfilter: nf_conntrack: prepare l4proto->init_net cleanup
l4proto->init contain quite redundant code. We can simplify this
by adding a new parameter l3proto.

This patch prepares that code simplification.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-27 18:31:14 +02:00
Gao feng fa0f61f05e netfilter: nf_conntrack: fix nf_conntrack_l3proto_register
Before commit 2c352f444c
(netfilter: nf_conntrack: prepare namespace support for
l4 protocol trackers), we register sysctl before register
protocol tracker. Thus, if sysctl is registration fails,
the protocol tracker will not be registered.

After that commit, if sysctl registration fails, protocol
registration still remains, so we leave things in intermediate
state.

To fix this, this patch registers sysctl before protocols.
And if protocol registration fail, sysctl is unregistered.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-27 18:11:15 +02:00
Pablo Neira Ayuso 392025f87a netfilter: ctnetlink: add new messages to obtain statistics
This patch adds the following messages to ctnetlink:

IPCTNL_MSG_CT_GET_STATS_CPU
IPCTNL_MSG_CT_GET_STATS
IPCTNL_MSG_EXP_GET_STATS_CPU

To display connection tracking system per-cpu and global statistics.

This provides a replacement for the following /proc interfaces:

/proc/net/stat/nf_conntrack
/proc/sys/net/netfilter/nf_conntrack_count

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-27 17:28:03 +02:00
David S. Miller 3da07c0c2b netfilter: nfnetlink_queue_core: Move away from NLMSG_PUT().
And use nlmsg_data() while we're here too.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-26 21:35:27 -07:00
David S. Miller d550d09589 netfilter: nfnetlink_log: Move away from NLMSG_PUT().
And use nlmsg_data() while we're here too.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-26 21:34:03 -07:00
Eric Dumazet c24584c028 netfilter: ipvs: fix dst leak in __ip_vs_addr_is_local_v6
After call to ip6_route_output() we must release dst or we leak it.

Also should test dst->error, as ip6_route_output() never returns NULL.

Use boolean while we are at it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-25 12:07:09 +02:00
Florian Westphal ef5b6e1277 netfilter: ipset: fix interface comparision in hash-netiface sets
ifname_compare() assumes that skb->dev is zero-padded,
e.g 'eth1\0\0\0\0\0...'. This isn't always the case. e1000 driver does

strncpy(netdev->name, pci_name(pdev), sizeof(netdev->name) - 1);

in e1000_probe(), so once device is registered dev->name memory contains
'eth1\0:0:3\0\0\0' (or something like that), which makes eth1 compare
fail.

Use plain strcmp() instead.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-25 12:03:21 +02:00
Pablo Neira Ayuso 8e36c4b5b6 netfilter: ctnetlink: fix compilation with NF_CONNTRACK_EVENTS=n
This patch fixes compilation with NF_CONNTRACK_EVENTS=n and
NETFILTER_NETLINK_QUEUE_CT=y.

I'm leaving all those static inline functions that calculate the size
of the event message out of the ifdef area of NF_CONNTRACK_EVENTS since
they will not be included by gcc in case they are unused.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-23 02:13:46 +02:00
Pablo Neira Ayuso ab5e8b77d5 netfilter: nfnetlink_queue: fix sparse warning due to missing include
This patch fixes a sparse warning due to missing include header file.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-23 02:13:38 +02:00
Pablo Neira Ayuso d584a61a93 netfilter: nfnetlink_queue: fix compilation with CONFIG_NF_NAT=m and CONFIG_NF_CT_NETLINK=y
LD      init/built-in.o
net/built-in.o:(.data+0x4408): undefined reference to `nf_nat_tcp_seq_adjust'
make: *** [vmlinux] Error 1

This patch adds a new pointer hook (nfq_ct_nat_hook) similar to other existing
in Netfilter to solve our complicated configuration dependencies.

Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-22 02:49:52 +02:00
Pablo Neira Ayuso 5a05fae5ca netfilter: nfq_ct_hook needs __rcu and __read_mostly
This removes some sparse warnings.

Reported-by: Fengguang Wu <wfg@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-20 20:50:31 +02:00
Pablo Neira Ayuso 7c62234547 netfilter: nfnetlink_queue: fix compilation with NF_CONNTRACK disabled
In "9cb0176 netfilter: add glue code to integrate nfnetlink_queue and ctnetlink"
the compilation with NF_CONNTRACK disabled is broken. This patch fixes this
issue.

I have moved the conntrack part into nfnetlink_queue_ct.c to avoid
peppering the entire nfnetlink_queue.c code with ifdefs.

I also needed to rename nfnetlink_queue.c to nfnetlink_queue_pkt.c
to update the net/netfilter/Makefile to support conditional compilation
of the conntrack integration.

This patch also adds CONFIG_NETFILTER_QUEUE_CT in case you want to explicitly
disable the integration between nf_conntrack and nfnetlink_queue.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-19 04:44:57 +02:00
Pablo Neira Ayuso 6e9c2db3aa netfilter: fix compilation of the nfnl_cthelper if NF_CONNTRACK is unset
This patch fixes the compilation of net/netfilter/nfnetlink_cthelper.c
if CONFIG_NF_CONNTRACK is not set.

This patch also moves the definition of the cthelper infrastructure to
the scope of NF_CONNTRACK things.

I have also renamed NETFILTER_NETLINK_CTHELPER by NF_CT_NETLINK_HELPER,
to use similar names to other nf_conntrack_netlink extensions. Better now
that this has been only for two days in David's tree.

Two new dependencies have been added:

* NF_CT_NETLINK
* NETFILTER_NETLINK_QUEUE

Since these infrastructure requires both ctnetlink and nfqueue.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-19 01:25:08 +02:00
Pablo Neira Ayuso 32f5376003 netfilter: nf_ct_helper: disable automatic helper re-assignment of different type
This patch modifies __nf_ct_try_assign_helper in a way that invalidates support
for the following scenario:

1) attach the helper A for first time when the conntrack is created
2) attach new (different) helper B due to changes the reply tuple caused by NAT

eg. port redirection from TCP/21 to TCP/5060 with both FTP and SIP helpers
loaded, which seems to be a quite unorthodox scenario.

I can provide a more elaborated patch to support this scenario but explicit
helper attachment provides a better solution for this since now the use can
attach the helpers consistently, without relying on the automatic helper
lookup magic.

This patch fixes a possible out of bound zeroing of the conntrack helper
extension if the helper B uses more memory for its private data than
helper A.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-19 01:24:52 +02:00
Pablo Neira Ayuso fd7462de46 netfilter: ctnetlink: fix NULL dereference while trying to change helper
The patch 1afc56794e: "netfilter: nf_ct_helper: implement variable
length helper private data" from Jun 7, 2012, leads to the following
Smatch complaint:

net/netfilter/nf_conntrack_netlink.c:1231 ctnetlink_change_helper()
         error: we previously assumed 'help->helper' could be null (see line 1228)

This NULL dereference can be triggered with the following sequence:

1) attach the helper for first time when the conntrack is created.
2) remove the helper module or detach the helper from the conntrack
   via ctnetlink.
3) attach helper again (the same or different one, no matter) to the
   that existing conntrack again via ctnetlink.

This patch fixes the problem by removing the use case that allows you
to re-assign again a helper for one conntrack entry via ctnetlink since
I cannot find any practical use for it.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-19 00:18:38 +02:00
David S. Miller 82f437b950 Merge branch 'master' of git://1984.lsi.us.es/nf-next
Pablo says:

====================
This is the second batch of Netfilter updates for net-next. It contains the
kernel changes for the new user-space connection tracking helper
infrastructure.

More details on this infrastructure are provides here:
http://lwn.net/Articles/500196/

Still, I plan to provide some official documentation through the
conntrack-tools user manual on how to setup user-space utilities for this.
So far, it provides two helper in user-space, one for NFSv3 and another for
Oracle/SQLnet/TNS. Yet in my TODO list.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-16 15:23:35 -07:00
Pablo Neira Ayuso 12f7a50533 netfilter: add user-space connection tracking helper infrastructure
There are good reasons to supports helpers in user-space instead:

* Rapid connection tracking helper development, as developing code
  in user-space is usually faster.

* Reliability: A buggy helper does not crash the kernel. Moreover,
  we can monitor the helper process and restart it in case of problems.

* Security: Avoid complex string matching and mangling in kernel-space
  running in privileged mode. Going further, we can even think about
  running user-space helpers as a non-root process.

* Extensibility: It allows the development of very specific helpers (most
  likely non-standard proprietary protocols) that are very likely not to be
  accepted for mainline inclusion in the form of kernel-space connection
  tracking helpers.

This patch adds the infrastructure to allow the implementation of
user-space conntrack helpers by means of the new nfnetlink subsystem
`nfnetlink_cthelper' and the existing queueing infrastructure
(nfnetlink_queue).

I had to add the new hook NF_IP6_PRI_CONNTRACK_HELPER to register
ipv[4|6]_helper which results from splitting ipv[4|6]_confirm into
two pieces. This change is required not to break NAT sequence
adjustment and conntrack confirmation for traffic that is enqueued
to our user-space conntrack helpers.

Basic operation, in a few steps:

1) Register user-space helper by means of `nfct':

 nfct helper add ftp inet tcp

 [ It must be a valid existing helper supported by conntrack-tools ]

2) Add rules to enable the FTP user-space helper which is
   used to track traffic going to TCP port 21.

For locally generated packets:

 iptables -I OUTPUT -t raw -p tcp --dport 21 -j CT --helper ftp

For non-locally generated packets:

 iptables -I PREROUTING -t raw -p tcp --dport 21 -j CT --helper ftp

3) Run the test conntrackd in helper mode (see example files under
   doc/helper/conntrackd.conf

 conntrackd

4) Generate FTP traffic going, if everything is OK, then conntrackd
   should create expectations (you can check that with `conntrack':

 conntrack -E expect

    [NEW] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp
[DESTROY] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp

This confirms that our test helper is receiving packets including the
conntrack information, and adding expectations in kernel-space.

The user-space helper can also store its private tracking information
in the conntrack structure in the kernel via the CTA_HELP_INFO. The
kernel will consider this a binary blob whose layout is unknown. This
information will be included in the information that is transfered
to user-space via glue code that integrates nfnetlink_queue and
ctnetlink.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16 15:40:02 +02:00
Pablo Neira Ayuso ae243bee39 netfilter: ctnetlink: add CTA_HELP_INFO attribute
This attribute can be used to modify and to dump the internal
protocol information.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16 15:09:15 +02:00
Pablo Neira Ayuso 8c88f87cb2 netfilter: nfnetlink_queue: add NAT TCP sequence adjustment if packet mangled
User-space programs that receive traffic via NFQUEUE may mangle packets.
If NAT is enabled, this usually puzzles sequence tracking, leading to
traffic disruptions.

With this patch, nfnl_queue will make the corresponding NAT TCP sequence
adjustment if:

1) The packet has been mangled,
2) the NFQA_CFG_F_CONNTRACK flag has been set, and
3) NAT is detected.

There are some records on the Internet complaning about this issue:
http://stackoverflow.com/questions/260757/packet-mangling-utilities-besides-iptables

By now, we only support TCP since we have no helpers for DCCP or SCTP.
Better to add this if we ever have some helper over those layer 4 protocols.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16 15:09:08 +02:00
Pablo Neira Ayuso 9cb0176654 netfilter: add glue code to integrate nfnetlink_queue and ctnetlink
This patch allows you to include the conntrack information together
with the packet that is sent to user-space via NFQUEUE.

Previously, there was no integration between ctnetlink and
nfnetlink_queue. If you wanted to access conntrack information
from your libnetfilter_queue program, you required to query
ctnetlink from user-space to obtain it. Thus, delaying the packet
processing even more.

Including the conntrack information is optional, you can set it
via NFQA_CFG_F_CONNTRACK flag with the new NFQA_CFG_FLAGS attribute.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16 15:09:02 +02:00
Pablo Neira Ayuso 1afc56794e netfilter: nf_ct_helper: implement variable length helper private data
This patch uses the new variable length conntrack extensions.

Instead of using union nf_conntrack_help that contain all the
helper private data information, we allocate variable length
area to store the private helper data.

This patch includes the modification of all existing helpers.
It also includes a couple of include header to avoid compilation
warnings.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16 15:08:55 +02:00
Pablo Neira Ayuso 3cf4c7e381 netfilter: nf_ct_ext: support variable length extensions
We can now define conntrack extensions of variable size. This
patch is useful to get rid of these unions:

union nf_conntrack_help
union nf_conntrack_proto
union nf_conntrack_nat_help

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16 15:08:49 +02:00
Pablo Neira Ayuso 3a8fc53a45 netfilter: nf_ct_helper: allocate 16 bytes for the helper and policy names
This patch modifies the struct nf_conntrack_helper to allocate
the room for the helper name. The maximum length is 16 bytes
(this was already introduced in 2.6.24).

For the maximum length for expectation policy names, I have
also selected 16 bytes.

This patch is required by the follow-up patch to support
user-space connection tracking helpers.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16 15:08:39 +02:00
David S. Miller 43b03f1f6d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	MAINTAINERS
	drivers/net/wireless/iwlwifi/pcie/trans.c

The iwlwifi conflict was resolved by keeping the code added
in 'net' that turns off the buggy chip feature.

The MAINTAINERS conflict was merely overlapping changes, one
change updated all the wireless web site URLs and the other
changed some GIT trees to be Johannes's instead of John's.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-12 21:59:18 -07:00
Pablo Neira Ayuso 352e04b911 netfilter: nf_ct_tcp, udp: fix compilation with sysctl disabled
This patch fixes the compilation of the TCP and UDP trackers with sysctl
compilation disabled:

net/netfilter/nf_conntrack_proto_udp.c: In function ‘udp_init_net_data’:
net/netfilter/nf_conntrack_proto_udp.c:279:13: error: ‘struct nf_proto_net’ has no member named
 ‘user’
net/netfilter/nf_conntrack_proto_tcp.c:1606:9: error: ‘struct nf_proto_net’ has no member named
 ‘user’
net/netfilter/nf_conntrack_proto_tcp.c:1643:9: error: ‘struct nf_proto_net’ has no member named
 ‘user’

Reported-by: Fengguang Wu <wfg@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11 15:22:46 -07:00
David S. Miller 67da255210 Merge branch 'master' of git://1984.lsi.us.es/net-next 2012-06-11 12:56:14 -07:00
Alban Crequy 4c809d630c netfilter: ipvs: switch hook PFs to nfproto
This patch is a cleanup. Use NFPROTO_* for consistency with other
netfilter code.

Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Reviewed-by: Vincent Sanders <vincent.sanders@collabora.co.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:43 +02:00
Denys Fedoryshchenko efdedd5426 netfilter: xt_recent: add address masking option
The mask option allows you put all address belonging that mask into
the same recent slot. This can be useful in case that recent is used
to detect attacks from the same network segment.

Tested for backward compatibility.

Signed-off-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:42 +02:00
Florian Westphal 1da6dd0798 netfilter: NFQUEUE: don't xor src/dst ip address for load distribution
because reply packets need to go to the same nfqueue, src/dst ip
address were xor'd prior to jhash().

However, this causes bad distribution for some workloads, e.g.
flows a.b.1.{1,n} -> a.b.2.{1,n} all share the same hash value.

Avoid this by hashing both. To get same hash for replies,
first argument is the smaller address.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:42 +02:00
Gao feng 8264deb818 netfilter: nf_conntrack: add namespace support for cttimeout
This patch adds namespace support for cttimeout.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:41 +02:00
Pablo Neira Ayuso e76d0af5e4 netfilter: nf_conntrack: remove now unused sysctl for nf_conntrack_l[3|4]proto
Since the sysctl data for l[3|4]proto now resides in pernet nf_proto_net.
We can now remove this unused fields from struct nf_contrack_l[3,4]proto.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:41 +02:00
Gao feng 4f71d80fc0 netfilter: nf_ct_gre: use new namespace support
This patch modifies the GRE protocol tracker, which partially
supported namespace before this patch, to use the new namespace
infrastructure for nf_conntrack.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:41 +02:00
Gao feng 84c394511f netfilter: nf_ct_dccp: use new namespace support
This patch modifies the DCCP protocol tracker to use the new
namespace infrastructure for nf_conntrack.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:41 +02:00
Gao feng a8021fedda netfilter: nf_ct_udplite: add namespace support
This patch adds namespace support for UDPlite protocol tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:41 +02:00
Gao feng 49d485a30f netfilter: nf_ct_sctp: add namespace support
This patch adds namespace support for SCTP protocol tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:40 +02:00
Gao feng 7080ba0955 netfilter: nf_ct_icmp: add namespace support
This patch adds namespace support for ICMPv6 protocol tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:40 +02:00
Gao feng 4b626b9c5d netfilter: nf_ct_icmp: add namespace support
This patch adds namespace support for ICMP protocol tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:40 +02:00
Gao feng 0ce490ad43 netfilter: nf_ct_udp: add namespace support
This patch adds namespace support for UDP protocol tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:40 +02:00
Gao feng d2ba1fde42 netfilter: nf_ct_tcp: add namespace support
This patch adds namespace support for TCP protocol tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:39 +02:00
Gao feng 15f585bd76 netfilter: nf_ct_generic: add namespace support
This patch adds namespace support for the generic layer 4 protocol
tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:39 +02:00
Gao feng 524a53e5ad netfilter: nf_conntrack: prepare namespace support for l3 protocol trackers
This patch prepares the namespace support for layer 3 protocol trackers.
Basically, this modifies the following interfaces:

* nf_ct_l3proto_[un]register_sysctl.
* nf_conntrack_l3proto_[un]register.

We add a new nf_ct_l3proto_net is used to get the pernet data of l3proto.

This adds rhe new struct nf_ip_net that is used to store the sysctl header
and l3proto_ipv4,l4proto_tcp(6),l4proto_udp(6),l4proto_icmp(v6) because the
protos such tcp and tcp6 use the same data,so making nf_ip_net as a field
of netns_ct is the easiest way to manager it.

This patch also adds init_net to struct nf_conntrack_l3proto to initial
the layer 3 protocol pernet data.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:39 +02:00
Gao feng 2c352f444c netfilter: nf_conntrack: prepare namespace support for l4 protocol trackers
This patch prepares the namespace support for layer 4 protocol trackers.
Basically, this modifies the following interfaces:

* nf_ct_[un]register_sysctl
* nf_conntrack_l4proto_[un]register

to include the namespace parameter. We still use init_net in this patch
to prepare the ground for follow-up patches for each layer 4 protocol
tracker.

We add a new net_id field to struct nf_conntrack_l4proto that is used
to store the pernet_operations id for each layer 4 protocol tracker.

Note that AF_INET6's protocols do not need to do sysctl compat. Thus,
we only register compat sysctl when l4proto.l3proto != AF_INET6.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:39 +02:00
Krishna Kumar fdb694a01f netfilter: Add fail-open support
Implement a new "fail-open" mode where packets are not dropped
upon queue-full condition. This mode can be enabled/disabled per
queue using netlink NFQA_CFG_FLAGS & NFQA_CFG_MASK attributes.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Vivek Kashyap <vivk@us.ibm.com>
Signed-off-by: Sridhar Samudrala <samudrala@us.ibm.com>
2012-06-07 14:58:39 +02:00
Cong Wang 68c07cb6d8 netfilter: xt_connlimit: remove revision 0
It was scheduled to be removed.

Cc: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:39 +02:00
Pablo Neira Ayuso d109e9af61 netfilter: nf_ct_h323: fix bug in rtcp natting
The nat_rtp_rtcp hook takes two separate parameters port and rtp_port.

port is expected to be the real h245 address (found inside the packet).
rtp_port is the even number closest to port (RTP ports are even and
RTCP ports are odd).

However currently, both port and rtp_port are having same value (both are
rounded to nearest even numbers).

This works well in case of openlogicalchannel with media (RTP/even) port.

But in case of openlogicalchannel for media control (RTCP/odd) port,
h245 address in the packet is wrongly modified to have an even port.

I am attaching a pcap demonstrating the problem, for any further analysis.

This behavior was introduced around v2.6.19 while rewriting the helper.

Signed-off-by: Jagdish Motwani <jagdish.motwani@elitecore.com>
Signed-off-by: Sanket Shah <sanket.shah@elitecore.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:53:17 +02:00
Hans Schillstrom d1992b169d netfilter: xt_HMARK: fix endianness and provide consistent hashing
This patch addresses two issues:

a) Fix usage of u32 and __be32 that causes endianess warnings via sparse.
b) Ensure consistent hashing in a cluster that is composed of big and
   little endian systems. Thus, we obtain the same hash mark in an
   heterogeneous cluster.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:53:01 +02:00
Joe Perches e3192690a3 net: Remove casts to same type
Adding casts of objects to the same type is unnecessary
and confusing for a human reader.

For example, this cast:

	int y;
	int *p = (int *)&y;

I used the coccinelle script below to find and remove these
unnecessary casts.  I manually removed the conversions this
script produces of casts with __force and __user.

@@
type T;
T *p;
@@

-	(T *)p
+	p

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-04 11:45:11 -04:00
Eric Dumazet 5d0ba55b64 net: use consume_skb() in place of kfree_skb()
Remove some dropwatch/drop_monitor false positives.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-04 11:27:40 -04:00
Linus Torvalds f5c101892f Merge branch 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu updates from Tejun Heo:
 "Contains Alex Shi's three patches to remove percpu_xxx() which overlap
  with this_cpu_xxx().  There shouldn't be any functional change."

* 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: remove percpu_xxx() functions
  x86: replace percpu_xxx funcs with this_cpu_xxx
  net: replace percpu_xxx funcs with this_cpu_xxx or __this_cpu_xxx
2012-05-22 17:37:47 -07:00
David S. Miller 028940342a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-05-16 22:17:37 -04:00
Pablo Neira Ayuso be3eed2e96 netfilter: nf_ct_h323: fix usage of MODULE_ALIAS_NFCT_HELPER
ctnetlink uses the aliases that are created by MODULE_ALIAS_NFCT_HELPER
to auto-load the module based on the helper name. Thus, we have to use
RAS, Q.931 and H.245, not H.323.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-17 01:00:05 +02:00
Eldad Zack 1a52099640 netfilter: xt_CT: remove redundant header include
nf_conntrack_l4proto.h is included twice.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-17 01:00:02 +02:00
Jozsef Kadlecsik 127f559127 netfilter: ipset: fix timeout value overflow bug
Large timeout parameters could result wrong timeout values due to
an overflow at msec to jiffies conversion (reported by Andreas Herz)

[ This patch was mangled by Pablo Neira Ayuso since David Laight and
  Eric Dumazet noticed that we were using hardcoded 1000 instead of
  MSEC_PER_SEC to calculate the timeout ]

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-17 00:56:41 +02:00
Pablo Neira Ayuso 1a4ac9870f netfilter: nf_ct_tcp: extend log message for invalid ignored packets
Extend log message if packets are ignored to include the TCP state, ie.
replace:

[ 3968.070196] nf_ct_tcp: invalid packet ignored IN= OUT= SRC=...

by:

[ 3968.070196] nf_ct_tcp: invalid packet ignored in state ESTABLISHED IN= OUT= SRC=...

This information is useful to know in what state we were while ignoring the
packet.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2012-05-17 00:56:38 +02:00
Pablo Neira Ayuso c44f5faa8e netfilter: xt_HMARK: modulus is expensive for hash calculation
Use:

((u64)(HASH_VAL * HASH_SIZE)) >> 32

as suggested by David S. Miller.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-17 00:56:35 +02:00
Dan Carpenter 5861811549 netfilter: xt_HMARK: potential NULL dereference in get_inner_hdr()
There is a typo in the error checking and "&&" was used instead of "||".
If skb_header_pointer() returns NULL then it leads to a NULL
dereference.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-17 00:56:33 +02:00
Florian Westphal 1f27e2516c netfilter: xt_hashlimit: use _ALL macro to reject unknown flag bits
David Miller says:
     The canonical way to validate if the set bits are in a valid
     range is to have a "_ALL" macro, and test:
     if (val & ~XT_HASHLIMIT_ALL)
         goto err;"

make it so.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-17 00:56:31 +02:00
Jozsef Kadlecsik 26a5d3cc0b netfilter: ipset: fix hash size checking in kernel
The hash size must fit both into u32 (jhash) and the max value of
size_t. The missing checking could lead to kernel crash, bug reported
by Seblu.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 15:38:49 -04:00
Joe Perches e87cc4728f net: Convert net_ratelimit uses to net_<level>_ratelimited
Standardize the net core ratelimited logging functions.

Coalesce formats, align arguments.
Change a printk then vprintk sequence to use printf extension %pV.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-15 13:45:03 -04:00
Alex Shi 19e8d69c54 net: replace percpu_xxx funcs with this_cpu_xxx or __this_cpu_xxx
percpu_xxx funcs are duplicated with this_cpu_xxx funcs, so replace
them for further code clean up.

And in preempt safe scenario, __this_cpu_xxx funcs may has a bit
better performance since __this_cpu_xxx has no redundant
preempt_enable/preempt_disable on some architectures.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-05-14 14:15:31 -07:00
Joe Perches 8561cf9978 netfilter: Convert compare_ether_addr to ether_addr_equal
Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.

Done via cocci script:

$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
-	!compare_ether_addr(a, b)
+	ether_addr_equal(a, b)

@@
expression a,b;
@@
-	compare_ether_addr(a, b)
+	!ether_addr_equal(a, b)

@@
expression a,b;
@@
-	!ether_addr_equal(a, b) == 0
+	ether_addr_equal(a, b)

@@
expression a,b;
@@
-	!ether_addr_equal(a, b) != 0
+	!ether_addr_equal(a, b)

@@
expression a,b;
@@
-	ether_addr_equal(a, b) == 0
+	!ether_addr_equal(a, b)

@@
expression a,b;
@@
-	ether_addr_equal(a, b) != 0
+	ether_addr_equal(a, b)

@@
expression a,b;
@@
-	!!ether_addr_equal(a, b)
+	ether_addr_equal(a, b)

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-09 20:49:18 -04:00
Florian Westphal 0197dee7d3 netfilter: hashlimit: byte-based limit mode
can be used e.g. for ingress traffic policing or
to detect when a host/port consumes more bandwidth than expected.

This is done by optionally making cost to mean
"cost per 16-byte-chunk-of-data" instead of "cost per packet".

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-09 13:04:57 +02:00
Florian Westphal 817e076f61 netfilter: hashlimit: move rateinfo initialization to helper
followup patch would bloat main match function too much.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-09 12:54:06 +02:00
Florian Westphal 7a909ac70f netfilter: limit, hashlimit: avoid duplicated inline
credit_cap can be set to credit, which avoids inlining user2credits
twice. Also, remove inline keyword and let compiler decide.

old:
    684     192       0     876     36c net/netfilter/xt_limit.o
   4927     344      32    5303    14b7 net/netfilter/xt_hashlimit.o
now:
    668     192       0     860     35c net/netfilter/xt_limit.o
   4793     344      32    5169    1431 net/netfilter/xt_hashlimit.o

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-09 12:54:06 +02:00
Hans Schillstrom cf308a1fae netfilter: add xt_hmark target for hash-based skb marking
The target allows you to create rules in the "raw" and "mangle" tables
which set the skbuff mark by means of hash calculation within a given
range. The nfmark can influence the routing method (see "Use netfilter
MARK value as routing key") and can also be used by other subsystems to
change their behaviour.

[ Part of this patch has been refactorized and modified by Pablo Neira Ayuso ]

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-09 12:54:05 +02:00
Hans Schillstrom 84018f55ab netfilter: ip6_tables: add flags parameter to ipv6_find_hdr()
This patch adds the flags parameter to ipv6_find_hdr. This flags
allows us to:

* know if this is a fragment.
* stop at the AH header, so the information contained in that header
  can be used for some specific packet handling.

This patch also adds the offset parameter for inspection of one
inner IPv6 header that is contained in error messages.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-09 12:53:47 +02:00
Pablo Neira Ayuso 6714cf5465 netfilter: nf_conntrack: fix explicit helper attachment and NAT
Explicit helper attachment via the CT target is broken with NAT
if non-standard ports are used. This problem was hidden behind
the automatic helper assignment routine. Thus, it becomes more
noticeable now that we can disable the automatic helper assignment
with Eric Leblond's:

9e8ac5a netfilter: nf_ct_helper: allow to disable automatic helper assignment

Basically, nf_conntrack_alter_reply asks for looking up the helper
up if NAT is enabled. Unfortunately, we don't have the conntrack
template at that point anymore.

Since we don't want to rely on the automatic helper assignment,
we can skip the second look-up and stick to the helper that was
attached by iptables. With the CT target, the user is in full
control of helper attachment, thus, the policy is to trust what
the user explicitly configures via iptables (no automatic magic
anymore).

Interestingly, this bug was hidden by the automatic helper look-up
code. But it can be easily trigger if you attach the helper in
a non-standard port, eg.

iptables -I PREROUTING -t raw -p tcp --dport 8888 \
	-j CT --helper ftp

And you disabled the automatic helper assignment.

I added the IPS_HELPER_BIT that allows us to differenciate between
a helper that has been explicitly attached and those that have been
automatically assigned. I didn't come up with a better solution
(having backward compatibility in mind).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:44:42 +02:00
Kelvie Wong 9768e1ace4 netfilter: nf_ct_expect: partially implement ctnetlink_change_expect
This refreshes the "timeout" attribute in existing expectations if one is
given.

The use case for this would be for userspace helpers to extend the lifetime
of the expectation when requested, as this is not possible right now
without deleting/recreating the expectation.

I use this specifically for forwarding DCERPC traffic through:

DCERPC has a port mapper daemon that chooses a (seemingly) random port for
future traffic to go to. We expect this traffic (with a reasonable
timeout), but sometimes the port mapper will tell the client to continue
using the same port. This allows us to extend the expectation accordingly.

Signed-off-by: Kelvie Wong <kelvie@ieee.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:40:59 +02:00
H Hartley Sweeten 068d522067 ipvs: ip_vs_proto: local functions should not be exposed globally
Functions not referenced outside of a source file should be marked
static to prevent it from being exposed globally.

This quiets the sparse warnings:

warning: symbol '__ipvs_proto_data_get' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:40:54 +02:00
H Hartley Sweeten d5cce20874 ipvs: ip_vs_ftp: local functions should not be exposed globally
Functions not referenced outside of a source file should be marked
static to prevent it from being exposed globally.

This quiets the sparse warnings:

warning: symbol 'ip_vs_ftp_init' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:40:52 +02:00
Pablo Neira Ayuso 6b324dbfc3 ipvs: optimize the use of flags in ip_vs_bind_dest
cp->flags is marked volatile but ip_vs_bind_dest
can safely modify the flags, so save some CPU cycles by
using temp variable.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:40:49 +02:00
Pablo Neira Ayuso f73181c828 ipvs: add support for sync threads
Allow master and backup servers to use many threads
for sync traffic. Add sysctl var "sync_ports" to define the
number of threads. Every thread will use single UDP port,
thread 0 will use the default port 8848 while last thread
will use port 8848+sync_ports-1.

	The sync traffic for connections is scheduled to many
master threads based on the cp address but one connection is
always assigned to same thread to avoid reordering of the
sync messages.

	Remove ip_vs_sync_switch_mode because this check
for sync mode change is still risky. Instead, check for mode
change under sync_buff_lock.

	Make sure the backup socks do not block on reading.

Special thanks to Aleksey Chudov for helping in all tests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:40:33 +02:00
Julian Anastasov 749c42b620 ipvs: reduce sync rate with time thresholds
Add two new sysctl vars to control the sync rate with the
main idea to reduce the rate for connection templates because
currently it depends on the packet rate for controlled connections.
This mechanism should be useful also for normal connections
with high traffic.

sync_refresh_period: in seconds, difference in reported connection
	timer that triggers new sync message. It can be used to
	avoid sync messages for the specified period (or half of
	the connection timeout if it is lower) if connection state
	is not changed from last sync.

sync_retries: integer, 0..3, defines sync retries with period of
	sync_refresh_period/8. Useful to protect against loss of
	sync messages.

	Allow sysctl_sync_threshold to be used with
sysctl_sync_period=0, so that only single sync message is sent
if sync_refresh_period is also 0.

	Add new field "sync_endtime" in connection structure to
hold the reported time when connection expires. The 2 lowest
bits will represent the retry count.

	As the sysctl_sync_period now can be 0 use ACCESS_ONCE to
avoid division by zero.

	Special thanks to Aleksey Chudov for being patient with me,
for his extensive reports and helping in all tests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:40:10 +02:00
Pablo Neira Ayuso 1c003b1580 ipvs: wakeup master thread
High rate of sync messages in master can lead to
overflowing the socket buffer and dropping the messages.
Fixed sleep of 1 second without wakeup events is not suitable
for loaded masters,

	Use delayed_work to schedule sending for queued messages
and limit the delay to IPVS_SYNC_SEND_DELAY (20ms). This will
reduce the rate of wakeups but to avoid sending long bursts we
wakeup the master thread after IPVS_SYNC_WAKEUP_RATE (8) messages.

	Add hard limit for the queued messages before sending
by using "sync_qlen_max" sysctl var. It defaults to 1/32 of
the memory pages but actually represents number of messages.
It will protect us from allocating large parts of memory
when the sending rate is lower than the queuing rate.

	As suggested by Pablo, add new sysctl var
"sync_sock_size" to configure the SNDBUF (master) or
RCVBUF (slave) socket limit. Default value is 0 (preserve
system defaults).

	Change the master thread to detect and block on
SNDBUF overflow, so that we do not drop messages when
the socket limit is low but the sync_qlen_max limit is
not reached. On ENOBUFS or other errors just drop the
messages.

	Change master thread to enter TASK_INTERRUPTIBLE
state early, so that we do not miss wakeups due to messages or
kthread_should_stop event.

Thanks to Pablo Neira Ayuso for his valuable feedback!

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:39:53 +02:00
Julian Anastasov cdcc5e905d ipvs: always update some of the flags bits in backup
As the goal is to mirror the inactconns/activeconns
counters in the backup server, make sure the cp->flags are
updated even if cp is still not bound to dest. If cp->flags
are not updated ip_vs_bind_dest will rely only on the initial
flags when updating the counters. To avoid mistakes and
complicated checks for protocol state rely only on the
IP_VS_CONN_F_INACTIVE bit when updating the counters.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:38:31 +02:00
Julian Anastasov 882a844bd5 ipvs: fix ip_vs_try_bind_dest to rebind app and transmitter
Initially, when the synced connection is created we
use the forwarding method provided by master but once we
bind to destination it can be changed. As result, we must
update the application and the transmitter.

	As ip_vs_try_bind_dest is called always for connections
that require dest binding, there is no need to validate the
cp and dest pointers.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:38:28 +02:00
Julian Anastasov 06611f82cc ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest
As the IP_VS_CONN_F_INACTIVE bit is properly set
in cp->flags for all kind of connections we do not need to
add special checks for synced connections when updating
the activeconns/inactconns counters for first time. Now
logic will look just like in ip_vs_unbind_dest.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:38:26 +02:00
Julian Anastasov 82cfc06278 ipvs: ignore IP_VS_CONN_F_NOOUTPUT in backup server
As IP_VS_CONN_F_NOOUTPUT is derived from the
forwarding method we should get it from conn_flags just
like we do it for IP_VS_CONN_F_FWD_MASK bits when binding
to real server.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:38:24 +02:00
Sasha Levin 9615e61e6f ipvs: use GFP_KERNEL allocation where possible
Use GFP_KERNEL instead of GFP_ATOMIC when registering an ipvs protocol.

This is safe since it will always run from a process context.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:38:22 +02:00
Julian Anastasov d6318f08e8 ipvs: SH scheduler does not need GFP_ATOMIC allocation
Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:37:28 +02:00
Julian Anastasov 45d4e71a39 ipvs: LBLCR scheduler does not need GFP_ATOMIC allocation on init
Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:37:26 +02:00
Julian Anastasov 4f2a94dcb6 ipvs: WRR scheduler does not need GFP_ATOMIC allocation
Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:37:22 +02:00
Julian Anastasov 4beddbe38c ipvs: DH scheduler does not need GFP_ATOMIC allocation
Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:37:20 +02:00
Julian Anastasov 748d845ca9 ipvs: LBLC scheduler does not need GFP_ATOMIC allocation on init
Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:37:17 +02:00
Julian Anastasov 41cff6d5c9 ipvs: timeout tables do not need GFP_ATOMIC allocation
They are called only on initialization.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:37:09 +02:00
Eric Leblond a900689264 netfilter: nf_ct_helper: allow to disable automatic helper assignment
This patch allows you to disable automatic conntrack helper
lookup based on TCP/UDP ports, eg.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper

[ Note: flows that already got a helper will keep using it even
  if automatic helper assignment has been disabled ]

Once this behaviour has been disabled, you have to explicitly
use the iptables CT target to attach helper to flows.

There are good reasons to stop supporting automatic helper
assignment, for further information, please read:

http://www.netfilter.org/news.html#2012-04-03

This patch also adds one message to inform that automatic helper
assignment is deprecated and it will be removed soon (this is
spotted only once, with the first flow that gets a helper attached
to make it as less annoying as possible).

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:35:18 +02:00
Tony Zelenoff 031d7709f2 netfilter: nf_ct_ecache: refactor notifier registration
* ret variable initialization removed as useless
* similar code strings concatenated and functions code
  flow became more plain

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:17:23 +02:00
David S. Miller 0d6c4a2e46 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/e1000e/param.c
	drivers/net/wireless/iwlwifi/iwl-agn-rx.c
	drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c
	drivers/net/wireless/iwlwifi/iwl-trans.h

Resolved the iwlwifi conflict with mainline using 3-way diff posted
by John Linville and Stephen Rothwell.  In 'net' we added a bug
fix to make iwlwifi report a more accurate skb->truesize but this
conflicted with RX path changes that happened meanwhile in net-next.

In e1000e a conflict arose in the validation code for settings of
adapter->itr.  'net-next' had more sophisticated logic so that
logic was used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-07 23:35:40 -04:00
Pablo Neira Ayuso 6cf5185248 netfilter: xt_CT: fix wrong checking in the timeout assignment path
The current checking always succeeded. We have to check the first
character of the string to check that it's empty, thus, skipping
the timeout path.

This fixes the use of the CT target without the timeout option.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-30 10:40:36 +02:00
Hans Schillstrom 8537de8a7a ipvs: kernel oops - do_ip_vs_get_ctl
Change order of init so netns init is ready
when register ioctl and netlink.

Ver2
	Whitespace fixes and __init added.

Reported-by: "Ryan O'Hara" <rohara@redhat.com>
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-30 10:40:35 +02:00
Hans Schillstrom 582b8e3ead ipvs: take care of return value from protocol init_netns
ip_vs_create_timeout_table() can return NULL
All functions protocol init_netns is affected of this patch.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-30 10:40:35 +02:00
Hans Schillstrom 4b984cd50b ipvs: null check of net->ipvs in lblc(r) shedulers
Avoid crash when registering shedulers after
the IPVS core initialization for netns fails. Do this by
checking for present core (net->ipvs).

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-30 10:40:14 +02:00
Julian Anastasov 39f618b4fd ipvs: reset ipvs pointer in netns
Make sure net->ipvs is reset on netns cleanup or failed
initialization. It is needed for IPVS applications to know that
IPVS core is not loaded in netns.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-26 15:26:35 +09:00
Julian Anastasov 8d08d71ce5 ipvs: add check in ftp for initialized core
Avoid crash when registering ip_vs_ftp after
the IPVS core initialization for netns fails. Do this by
checking for present core (net->ipvs).

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-26 15:26:35 +09:00
Julian Anastasov 8f9b9a2fad ipvs: fix crash in ip_vs_control_net_cleanup on unload
commit 14e405461e (2.6.39)
("Add __ip_vs_control_{init,cleanup}_sysctl()")
introduced regression due to wrong __net_init for
__ip_vs_control_cleanup_sysctl. This leads to crash when
the ip_vs module is unloaded.

	Fix it by changing __net_init to __net_exit for
the function that is already renamed to ip_vs_control_net_cleanup_sysctl.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-25 11:16:30 +02:00
Sasha Levin 7118c07a84 ipvs: Verify that IP_VS protocol has been registered
The registration of a protocol might fail, there were no checks
and all registrations were assumed to be correct. This lead to
NULL ptr dereferences when apps tried registering.

For example:

[ 1293.226051] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 1293.227038] IP: [<ffffffff822aacb0>] tcp_register_app+0x60/0xb0
[ 1293.227038] PGD 391de067 PUD 6c20b067 PMD 0
[ 1293.227038] Oops: 0000 [#1] PREEMPT SMP
[ 1293.227038] CPU 1
[ 1293.227038] Pid: 19609, comm: trinity Tainted: G        W    3.4.0-rc1-next-20120405-sasha-dirty #57
[ 1293.227038] RIP: 0010:[<ffffffff822aacb0>]  [<ffffffff822aacb0>] tcp_register_app+0x60/0xb0
[ 1293.227038] RSP: 0018:ffff880038c1dd18  EFLAGS: 00010286
[ 1293.227038] RAX: ffffffffffffffc0 RBX: 0000000000001500 RCX: 0000000000010000
[ 1293.227038] RDX: 0000000000000000 RSI: ffff88003a2d5888 RDI: 0000000000000282
[ 1293.227038] RBP: ffff880038c1dd48 R08: 0000000000000000 R09: 0000000000000000
[ 1293.227038] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a2d5668
[ 1293.227038] R13: ffff88003a2d5988 R14: ffff8800696a8ff8 R15: 0000000000000000
[ 1293.227038] FS:  00007f01930d9700(0000) GS:ffff88007ce00000(0000) knlGS:0000000000000000
[ 1293.227038] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1293.227038] CR2: 0000000000000018 CR3: 0000000065dfc000 CR4: 00000000000406e0
[ 1293.227038] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1293.227038] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1293.227038] Process trinity (pid: 19609, threadinfo ffff880038c1c000, task ffff88002dc73000)
[ 1293.227038] Stack:
[ 1293.227038]  ffff880038c1dd48 00000000fffffff4 ffff8800696aada0 ffff8800694f5580
[ 1293.227038]  ffffffff8369f1e0 0000000000001500 ffff880038c1dd98 ffffffff822a716b
[ 1293.227038]  0000000000000000 ffff8800696a8ff8 0000000000000015 ffff8800694f5580
[ 1293.227038] Call Trace:
[ 1293.227038]  [<ffffffff822a716b>] ip_vs_app_inc_new+0xdb/0x180
[ 1293.227038]  [<ffffffff822a7258>] register_ip_vs_app_inc+0x48/0x70
[ 1293.227038]  [<ffffffff822b2fea>] __ip_vs_ftp_init+0xba/0x140
[ 1293.227038]  [<ffffffff821c9060>] ops_init+0x80/0x90
[ 1293.227038]  [<ffffffff821c90cb>] setup_net+0x5b/0xe0
[ 1293.227038]  [<ffffffff821c9416>] copy_net_ns+0x76/0x100
[ 1293.227038]  [<ffffffff810dc92b>] create_new_namespaces+0xfb/0x190
[ 1293.227038]  [<ffffffff810dca21>] unshare_nsproxy_namespaces+0x61/0x80
[ 1293.227038]  [<ffffffff810afd1f>] sys_unshare+0xff/0x290
[ 1293.227038]  [<ffffffff8187622e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1293.227038]  [<ffffffff82665539>] system_call_fastpath+0x16/0x1b
[ 1293.227038] Code: 89 c7 e8 34 91 3b 00 89 de 66 c1 ee 04 31 de 83 e6 0f 48 83 c6 22 48 c1 e6 04 4a 8b 14 26 49 8d 34 34 48 8d 42 c0 48 39 d6 74 13 <66> 39 58 58 74 22 48 8b 48 40 48 8d 41 c0 48 39 ce 75 ed 49 8d
[ 1293.227038] RIP  [<ffffffff822aacb0>] tcp_register_app+0x60/0xb0
[ 1293.227038]  RSP <ffff880038c1dd18>
[ 1293.227038] CR2: 0000000000000018
[ 1293.379284] ---[ end trace 364ab40c7011a009 ]---
[ 1293.381182] Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-25 11:16:12 +02:00
Pavel Emelyanov 4a17fd5229 sock: Introduce named constants for sk_reuse
Name them in a "backward compatible" manner, i.e. reuse or not
are still 1 and 0 respectively. The reuse value of 2 means that
the socket with it will forcibly reuse everyone else's port.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Eric W. Biederman a5347fe36b net: Delete all remaining instances of ctl_path
We don't use struct ctl_path anymore so delete the exported constants.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman ec8f23ce0f net: Convert all sysctl registrations to register_net_sysctl
This results in code with less boiler plate that is a bit easier
to read.

Additionally stops us from using compatibility code in the sysctl
core, hastening the day when the compatibility code can be removed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman f99e8f715a net: Convert nf_conntrack_proto to use register_net_sysctl
There isn't much advantage here except that strings paths are a bit
easier to read, and converting everything to them allows me to kill off
ctl_path.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman 5dd3df105b net: Move all of the network sysctls without a namespace into init_net.
This makes it clearer which sysctls are relative to your current network
namespace.

This makes it a little less error prone by not exposing sysctls for the
initial network namespace in other namespaces.

This is the same way we handle all of our other network interfaces to
userspace and I can't honestly remember why we didn't do this for
sysctls right from the start.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:17 -04:00
Eric Dumazet 95c9617472 net: cleanup unsigned to unsigned int
Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:44:40 -04:00
David S. Miller 011e3c6325 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-04-12 19:41:23 -04:00