Commit graph

3056 commits

Author SHA1 Message Date
Patrick McHardy 761da2935d netfilter: nf_tables: add set timeout API support
Add set timeout support to the netlink API. Sets with timeout support
enabled can have a default timeout value and garbage collection interval
specified.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-01 11:17:28 +02:00
David S. Miller 4ef295e047 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next tree.
Basically, nf_tables updates to add the set extension infrastructure and finish
the transaction for sets from Patrick McHardy. More specifically, they are:

1) Move netns to basechain and use recently added possible_net_t, from
   Patrick McHardy.

2) Use LOGLEVEL_<FOO> from nf_log infrastructure, from Joe Perches.

3) Restore nf_log_trace that was accidentally removed during conflict
   resolution.

4) nft_queue does not depend on NETFILTER_XTABLES, starting from here
   all patches from Patrick McHardy.

5) Use raw_smp_processor_id() in nft_meta.

Then, several patches to prepare ground for the new set extension
infrastructure:

6) Pass object length to the hash callback in rhashtable as needed by
   the new set extension infrastructure.

7) Cleanup patch to restore struct nft_hash as wrapper for struct
   rhashtable

8) Another small source code readability cleanup for nft_hash.

9) Convert nft_hash to rhashtable callbacks.

And finally...

10) Add the new set extension infrastructure.

11) Convert the nft_hash and nft_rbtree sets to use it.

12) Batch set element release to avoid several RCU grace period in a row
    and add new function nft_set_elem_destroy() to consolidate set element
    release.

13) Return the set extension data area from nft_lookup.

14) Refactor existing transaction code to add some helper functions
    and document it.

15) Complete the set transaction support, using similar approach to what we
    already use, to activate/deactivate elements in an atomic fashion.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-29 12:43:43 -07:00
Patrick McHardy cc02e457bb netfilter: nf_tables: implement set transaction support
Set elements are the last object type not supporting transaction support.
Implement similar to the existing rule transactions:

The global transaction counter keeps track of two generations, current
and next. Each element contains a bitmask specifying in which generations
it is inactive.

New elements start out as inactive in the current generation and active
in the next. On commit, the previous next generation becomes the current
generation and the element becomes active. The bitmask is then cleared
to indicate that the element is active in all future generations. If the
transaction is aborted, the element is removed from the set before it
becomes active.

When removing an element, it gets marked as inactive in the next generation.
On commit the next generation becomes active and the therefor the element
inactive. It is then taken out of then set and released. On abort, the
element is marked as active for the next generation again.

Lookups ignore elements not active in the current generation.

The current set types (hash/rbtree) both use a field in the extension area
to store the generation mask. This (currently) does not require any
additional memory since we have some free space in there.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-26 11:09:35 +01:00
Patrick McHardy ea4bd995b0 netfilter: nf_tables: add transaction helper functions
Add some helper functions for building the genmask as preparation for
set transactions.

Also add a little documentation how this stuff actually works.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-26 11:09:35 +01:00
Patrick McHardy b2832dd662 netfilter: nf_tables: return set extensions from ->lookup()
Return the extension area from the ->lookup() function to allow to
consolidate common actions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-26 11:09:34 +01:00
Patrick McHardy 61edafbb47 netfilter: nf_tables: consolide set element destruction
With the conversion to set extensions, it is now possible to consolidate
the different set element destruction functions.

The set implementations' ->remove() functions are changed to only take
the element out of their internal data structures. Elements will be freed
in a batched fashion after the global transaction's completion RCU grace
period.

This reduces the amount of grace periods required for nft_hash from N
to zero additional ones, additionally this guarantees that the set
elements' extensions of all implementations can be used under RCU
protection.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-26 11:09:34 +01:00
Hannes Frederic Sowa b6a7719aed ipv4: hash net ptr into fragmentation bucket selection
As namespaces are sometimes used with overlapping ip address ranges,
we should also use the namespace as input to the hash to select the ip
fragmentation counter bucket.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 14:07:04 -04:00
Patrick McHardy fe2811ebeb netfilter: nf_tables: convert hash and rbtree to set extensions
The set implementations' private struct will only contain the elements
needed to maintain the search structure, all other elements are moved
to the set extensions.

Element allocation and initialization is performed centrally by
nf_tables_api instead of by the different set implementations'
->insert() functions. A new "elemsize" member in the set ops specifies
the amount of memory to reserve for internal usage. Destruction
will also be moved out of the set implementations by a following patch.

Except for element allocation, the patch is a simple conversion to
using data from the extension area.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-25 17:18:35 +01:00
Patrick McHardy 3ac4c07a24 netfilter: nf_tables: add set extensions
Add simple set extension infrastructure for maintaining variable sized
and optional per element data.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-25 17:18:34 +01:00
Patrick McHardy bfd6e327e1 netfilter: nft_hash: convert to use rhashtable callbacks
A following patch will convert sets to use so called set extensions,
where the key is not located in a fixed position anymore. This will
require rhashtable hashing and comparison callbacks to be used.

As preparation, convert nft_hash to use these callbacks without any
functional changes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-25 17:18:34 +01:00
Patrick McHardy 45d84751fb netfilter: nft_hash: indent rhashtable parameters
Improve readability by indenting the parameter initialization.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-25 17:18:34 +01:00
Patrick McHardy 745f5450d5 netfilter: nft_hash: restore struct nft_hash
Following patches will add new private members, restore struct nft_hash
as preparation.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-25 17:18:33 +01:00
Patrick McHardy 14d14a5d29 netfilter: nft_meta: use raw_smp_processor_id()
Using smp_processor_id() triggers warnings with PREEMPT_RCU. There is no
point in disabling preemption since we only collect the numeric value,
so use raw_smp_processor_id() instead.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-25 12:09:40 +01:00
Patrick McHardy d95797252a netfilter: nf_tables: nft_queue does not depend on x_tables
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-25 12:09:39 +01:00
Pablo Neira Ayuso fce1528ef6 netfilter: nf_tables: restore nf_log_trace() in nf_tables_core.c
As described by 4017a7e ("netfilter: restore rule tracing via
nfnetlink_log"), this accidentally slipped through during conflict
resolution in d5c1d8c.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-25 12:09:39 +01:00
Joe Perches a81b2ce850 netfilter: Use LOGLEVEL_<FOO> defines
Use the #defines where appropriate.

Miscellanea:

Add explicit #include <linux/kernel.h> where it was not
previously used so that these #defines are a bit more
explicitly defined instead of indirectly included via:
	module.h->moduleparam.h->kernel.h

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-25 12:09:39 +01:00
Patrick McHardy 5ebb335dcb netfilter: nf_tables: move struct net pointer to base chain
The network namespace is only needed for base chains to get at the
gencursor. Also convert to possible_net_t.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-25 12:09:38 +01:00
Thomas Graf 6b6f302ced rhashtable: Add rhashtable_free_and_destroy()
rhashtable_destroy() variant which stops rehashes, iterates over
the table and calls a callback to release resources.

Avoids need for nft_hash to embed rhashtable internals and allows to
get rid of the being_destroyed flag. It also saves a 2nd mutex
lock upon destruction.

Also fixes an RCU lockdep splash on nft set destruction due to
calling rht_for_each_entry_safe() without holding bucket locks.
Open code this loop as we need know that no mutations may occur in
parallel.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 17:48:40 -04:00
Thomas Graf b5e2c150ac rhashtable: Disable automatic shrinking by default
Introduce a new bool automatic_shrinking to require the
user to explicitly opt-in to automatic shrinking of tables.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 17:48:40 -04:00
David S. Miller d5c1d8c567 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/netfilter/nf_tables_core.c

The nf_tables_core.c conflict was resolved using a conflict resolution
from Stephen Rothwell as a guide.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:22:43 -04:00
David S. Miller 40451fd013 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next.
Basically, more incremental updates for br_netfilter from Florian
Westphal, small nf_tables updates (including one fix for rb-tree
locking) and small two-liner to add extra validation for the REJECT6
target.

More specifically, they are:

1) Use the conntrack status flags from br_netfilter to know that DNAT is
   happening. Patch for Florian Westphal.

2) nf_bridge->physoutdev == NULL already indicates that the traffic is
   bridged, so let's get rid of the BRNF_BRIDGED flag. Also from Florian.

3) Another patch to prepare voidization of seq_printf/seq_puts/seq_putc,
   from Joe Perches.

4) Consolidation of nf_tables_newtable() error path.

5) Kill nf_bridge_pad used by br_netfilter from ip_fragment(),
   from Florian Westphal.

6) Access rb-tree root node inside the lock and remove unnecessary
   locking from the get path (we already hold nfnl_lock there), from
   Patrick McHardy.

7) You cannot use a NFT_SET_ELEM_INTERVAL_END when the set doesn't
   support interval, also from Patrick.

8) Enforce IP6T_F_PROTO from ip6t_REJECT to make sure the core is
   actually restricting matches to TCP.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:02:46 -04:00
Patrick McHardy 55df35d22f netfilter: nf_tables: reject NFT_SET_ELEM_INTERVAL_END flag for non-interval sets
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-22 19:50:35 +01:00
Patrick McHardy 16c45eda96 netfilter: nft_rbtree: fix locking
Fix a race condition and unnecessary locking:

* the root rb_node must only be accessed under the lock in nft_rbtree_lookup()
* the lock is not needed in lookup functions in netlink context

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-22 19:49:09 +01:00
Pablo Neira Ayuso 749177ccc7 netfilter: nft_compat: set IP6T_F_PROTO flag if protocol is set
ip6tables extensions check for this flag to restrict match/target to a
given protocol. Without this flag set, SYNPROXY6 returns an error.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Patrick McHardy <kaber@trash.net>
2015-03-22 19:32:05 +01:00
Herbert Xu fa3773211e netfilter: Convert nft_hash to inlined rhashtable
This patch converts nft_hash to the inlined rhashtable interface.

This patch also replaces the call to rhashtable_lookup_compare with
a straight rhashtable_lookup_fast because it's simply doing a memcmp
(in fact nft_hash_lookup already uses memcmp instead of nft_data_cmp).

Furthermore, the compare function is only meant to compare, it is not
supposed to have side-effects.  The current side-effect code can
simply be moved into the nft_hash_get.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 16:16:24 -04:00
Pablo Neira Ayuso 3d8c6dce53 netfilter: xt_TPROXY: fix invflags check in tproxy_tg6_check()
We have to check for IP6T_INV_PROTO in invflags, instead of flags.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Balazs Scheidler <bazsi@balabit.hu>
2015-03-20 14:35:33 +01:00
Pablo Neira Ayuso 4017a7ee69 netfilter: restore rule tracing via nfnetlink_log
Since fab4085 ("netfilter: log: nf_log_packet() as real unified
interface"), the loginfo structure that is passed to nf_log_packet() is
used to explicitly indicate the logger type you want to use.

This is a problem for people tracing rules through nfnetlink_log since
packets are always routed to the NF_LOG_TYPE logger after the
aforementioned patch.

We can fix this by removing the trace loginfo structures, but that still
changes the log level from 4 to 5 for tracing messages and there may be
someone relying on this outthere. So let's just introduce a new
nf_log_trace() function that restores the former behaviour.

Reported-by: Markus Kötter <koetter@rrzn.uni-hannover.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-19 11:14:48 +01:00
Marcelo Ricardo Leitner 54ff9ef36b ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop}
in favor of their inner __ ones, which doesn't grab rtnl.

As these functions need to operate on a locked socket, we can't be
grabbing rtnl by then. It's too late and doing so causes reversed
locking.

So this patch:
- move rtnl handling to callers instead while already fixing some
  reversed locking situations, like on vxlan and ipvs code.
- renames __ ones to not have the __ mark:
  __ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group
  __ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop}

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 22:05:09 -04:00
Pablo Neira Ayuso ffdb210eb4 netfilter: nf_tables: consolidate error path of nf_tables_newtable()
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-18 11:57:31 +01:00
Joe Perches 1ca9e41770 netfilter: Remove uses of seq_<foo> return values
The seq_printf/seq_puts/seq_putc return values, because they
are frequently misused, will eventually be converted to void.

See: commit 1f33c41c03 ("seq_file: Rename seq_overflow() to
     seq_has_overflowed() and make public")

Miscellanea:

o realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-18 10:51:35 +01:00
Eric Dumazet a940700003 netfilter: xt_socket: prepare for TCP_NEW_SYN_RECV support
TCP request socks soon will be visible in ehash table.

xt_socket will be able to match them, but first we need
to make sure to not consider them as full sockets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:17:59 -04:00
Eric Dumazet 8b58014779 netfilter: tproxy: prepare TCP_NEW_SYN_RECV support
TCP request socks soon will be visible in ehash table.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:17:59 -04:00
Eric Dumazet a8399231f0 netfilter: use sk_fullsock() helper
Upcoming request sockets have TCP_NEW_SYN_RECV state and should
be special cased a bit like TCP_TIME_WAIT sockets.

Signed-off-by; Eric Dumazet <edumazet@google.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:17:59 -04:00
Pablo Neira Ayuso d6b6cb1d3e netfilter: nf_tables: allow to change chain policy without hook if it exists
If there's an existing base chain, we have to allow to change the
default policy without indicating the hook information.

However, if the chain doesn't exists, we have to enforce the presence of
the hook attribute.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-17 13:48:04 +01:00
Florian Westphal e4bb9bcbfb netfilter: bridge: remove BRNF_STATE_BRIDGED flag
Its not needed anymore since 2bf540b73e
([NETFILTER]: bridge-netfilter: remove deferred hooks).
Before this it was possible to have physoutdev set for locally generated
packets -- this isn't the case anymore:

BRNF_STATE_BRIDGED flag is set when we assign nf_bridge->physoutdev,
so physoutdev != NULL means BRNF_STATE_BRIDGED is set.
If physoutdev is NULL, then we are looking at locally-delivered and
routed packet.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-16 14:35:02 +01:00
Herbert Xu d8bdff59ce netfilter: Fix potential crash in nft_hash walker
When we get back an EAGAIN from rhashtable_walk_next we were
treating it as a valid object which obviously doesn't work too
well.

Luckily this is hard to trigger so it seems nobody has run into
it yet.

This patch fixes it by redoing the next call when we get an EAGAIN.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-13 12:03:00 +01:00
Ian Wilson 78146572b9 netfilter: Zero the tuple in nfnl_cthelper_parse_tuple()
nfnl_cthelper_parse_tuple() is called from nfnl_cthelper_new(),
nfnl_cthelper_get() and nfnl_cthelper_del().  In each case they pass
a pointer to an nf_conntrack_tuple data structure local variable:

    struct nf_conntrack_tuple tuple;
    ...
    ret = nfnl_cthelper_parse_tuple(&tuple, tb[NFCTH_TUPLE]);

The problem is that this local variable is not initialized, and
nfnl_cthelper_parse_tuple() only initializes two fields: src.l3num and
dst.protonum.  This leaves all other fields with undefined values
based on whatever is on the stack:

    tuple->src.l3num = ntohs(nla_get_be16(tb[NFCTH_TUPLE_L3PROTONUM]));
    tuple->dst.protonum = nla_get_u8(tb[NFCTH_TUPLE_L4PROTONUM]);

The symptom observed was that when the rpc and tns helpers were added
then traffic to port 1536 was being sent to user-space.

Signed-off-by: Ian Wilson <iwilson@brocade.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-12 13:07:36 +01:00
David S. Miller 3cef5c5b0b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/cadence/macb.c

Overlapping changes in macb driver, mostly fixes and cleanups
in 'net' overlapping with the integration of at91_ether into
macb in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 23:38:02 -04:00
David S. Miller 5428aef811 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree. Basically, improvements for the packet rejection infrastructure,
deprecation of CLUSTERIP, cleanups for nf_tables and some untangling for
br_netfilter. More specifically they are:

1) Send packet to reset flow if checksum is valid, from Florian Westphal.

2) Fix nf_tables reject bridge from the input chain, also from Florian.

3) Deprecate the CLUSTERIP target, the cluster match supersedes it in
   functionality and it's known to have problems.

4) A couple of cleanups for nf_tables rule tracing infrastructure, from
   Patrick McHardy.

5) Another cleanup to place transaction declarations at the bottom of
   nf_tables.h, also from Patrick.

6) Consolidate Kconfig dependencies wrt. NF_TABLES.

7) Limit table names to 32 bytes in nf_tables.

8) mac header copying in bridge netfilter is already required when
   calling ip_fragment(), from Florian Westphal.

9) move nf_bridge_update_protocol() to br_netfilter.c, also from
   Florian.

10) Small refactor in br_netfilter in the transmission path, again from
    Florian.

11) Move br_nf_pre_routing_finish_bridge_slow() to br_netfilter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 15:58:21 -04:00
David S. Miller 9d73b42bbf Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for your net tree,
they are:

1) Don't truncate ethernet protocol type to u8 in nft_compat, from
   Arturo Borrero.

2) Fix several problems in the addition/deletion of elements in nf_tables.

3) Fix module refcount leak in ip_vs_sync, from Julian Anastasov.

4) Fix a race condition in the abort path in the nf_tables transaction
   infrastructure. Basically aborted rules can show up as active rules
   until changes are unrolled, oneliner from Patrick McHardy.

5) Check for overflows in the data area of the rule, also from Patrick.

6) Fix off-by-one in the per-rule user data size field. This introduces
   a new nft_userdata structure that is placed at the beginning of the
   user data area that contains the length to save some bits from the
   rule and we only need one bit to indicate its presence, from Patrick.

7) Fix rule replacement error path, the replaced rule is deleted on
   error instead of leaving it in place. This has been fixed by relying
   on the abort path to undo the incomplete replacement.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-05 21:51:07 -05:00
Pablo Neira Ayuso 1cae565e8b netfilter: nf_tables: limit maximum table name length to 32 bytes
Set the same as we use for chain names, it should be enough.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-06 01:21:21 +01:00
Pablo Neira Ayuso f04e599e20 netfilter: nf_tables: consolidate Kconfig options
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-06 01:21:15 +01:00
Patrick McHardy 354bf5a0d7 netfilter: nf_tables: consolidate tracing invocations
* JUMP and GOTO are equivalent except for JUMP pushing the current
  context to the stack

* RETURN and implicit RETURN (CONTINUE) are equivalent except that
  the logged rule number differs

Result:

  nft_do_chain              | -112
 1 function changed, 112 bytes removed, diff: -112

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-06 01:21:12 +01:00
Patrick McHardy 01ef16c2dd netfilter: nf_tables: minor tracing cleanups
The tracing code is squeezed between multiple related parts of the
evaluation code, move it out. Also add an inline wrapper for the
reoccuring test for skb->nf_trace.

Small code savings in nft_do_chain():

  nft_trace_packet          | -137
  nft_do_chain              |   -8
 2 functions changed, 145 bytes removed, diff: -145

net/netfilter/nf_tables_core.c:
  __nft_trace_packet | +137
 1 function changed, 137 bytes added, diff: +137

net/netfilter/nf_tables_core.o:
 3 functions changed, 137 bytes added, 145 bytes removed, diff: -8

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-06 01:21:11 +01:00
Pablo Neira Ayuso 59900e0a01 netfilter: nf_tables: fix error handling of rule replacement
In general, if a transaction object is added to the list successfully,
we can rely on the abort path to undo what we've done. This allows us to
simplify the error handling of the rule replacement path in
nf_tables_newrule().

This implicitly fixes an unnecessary removal of the old rule, which
needs to be left in place if we fail to replace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-04 18:46:08 +01:00
Patrick McHardy 86f1ec3231 netfilter: nf_tables: fix userdata length overflow
The NFT_USERDATA_MAXLEN is defined to 256, however we only have a u8
to store its size. Introduce a struct nft_userdata which contains a
length field and indicate its presence using a single bit in the rule.

The length field of struct nft_userdata is also a u8, however we don't
store zero sized data, so the actual length is udata->len + 1.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-04 18:46:06 +01:00
Patrick McHardy 9889840f59 netfilter: nf_tables: check for overflow of rule dlen field
Check that the space required for the expressions doesn't exceed the
size of the dlen field, which would lead to the iterators crashing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-04 18:46:05 +01:00
Patrick McHardy 8670c3a55e netfilter: nf_tables: fix transaction race condition
A race condition exists in the rule transaction code for rules that
get added and removed within the same transaction.

The new rule starts out as inactive in the current and active in the
next generation and is inserted into the ruleset. When it is deleted,
it is additionally set to inactive in the next generation as well.

On commit the next generation is begun, then the actions are finalized.
For the new rule this would mean clearing out the inactive bit for
the previously current, now next generation.

However nft_rule_clear() clears out the bits for *both* generations,
activating the rule in the current generation, where it should be
deactivated due to being deleted. The rule will thus be active until
the deletion is finalized, removing the rule from the ruleset.

Similarly, when aborting a transaction for the same case, the undo
of insertion will remove it from the RCU protected rule list, the
deletion will clear out all bits. However until the next RCU
synchronization after all operations have been undone, the rule is
active on CPUs which can still see the rule on the list.

Generally, there may never be any modifications of the current
generations' inactive bit since this defeats the entire purpose of
atomicity. Change nft_rule_clear() to only touch the next generations
bit to fix this.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-04 18:46:04 +01:00
David S. Miller 71a83a6db6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/rocker/rocker.c

The rocker commit was two overlapping changes, one to rename
the ->vport member to ->pport, and another making the bitmask
expression use '1ULL' instead of plain '1'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03 21:16:48 -05:00
Florian Westphal ee586bbc28 netfilter: reject: don't send icmp error if csum is invalid
tcp resets are never emitted if the packet that triggers the
reject/reset has an invalid checksum.

For icmp error responses there was no such check.
It allows to distinguish icmp response generated via

iptables -I INPUT -p udp --dport 42 -j REJECT

and those emitted by network stack (won't respond if csum is invalid,
REJECT does).

Arguably its possible to avoid this by using conntrack and only
using REJECT with -m conntrack NEW/RELATED.

However, this doesn't work when connection tracking is not in use
or when using nf_conntrack_checksum=0.

Furthermore, sending errors in response to invalid csums doesn't make
much sense so just add similar test as in nf_send_reset.

Validate csum if needed and only send the response if it is ok.

Reference: http://bugzilla.redhat.com/show_bug.cgi?id=1169829
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-03 02:10:35 +01:00
David S. Miller 77f0379fa8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

A small batch with accumulated updates in nf-next, mostly IPVS updates,
they are:

1) Add 64-bits stats counters to IPVS, from Julian Anastasov.

2) Move NETFILTER_XT_MATCH_ADDRTYPE out of NETFILTER_ADVANCED as docker
seem to require this, from Anton Blanchard.

3) Use boolean instead of numeric value in set_match_v*(), from
coccinelle via Fengguang Wu.

4) Allows rescheduling of new connections in IPVS when port reuse is
detected, from Marcelo Ricardo Leitner.

5) Add missing bits to support arptables extensions from nft_compat,
from Arturo Borrero.

Patrick is preparing a large batch to enhance the set infrastructure,
named expressions among other things, that should follow up soon after
this batch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 14:55:05 -05:00
Arturo Borrero 5f15893943 netfilter: nft_compat: add support for arptables extensions
This patch adds support to arptables extensions from nft_compat.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-02 12:28:13 +01:00
Daniel Borkmann 4c4b52d9b2 rhashtable: remove indirection for grow/shrink decision functions
Currently, all real users of rhashtable default their grow and shrink
decision functions to rht_grow_above_75() and rht_shrink_below_30(),
so that there's currently no need to have this explicitly selectable.

It can/should be generic and private inside rhashtable until a real
use case pops up. Since we can make this private, we'll save us this
additional indirection layer and can improve insertion/deletion time
as well.

Reference: http://patchwork.ozlabs.org/patch/443040/
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27 16:06:02 -05:00
Marcelo Ricardo Leitner d752c36457 ipvs: allow rescheduling of new connections when port reuse is detected
Currently, when TCP/SCTP port reusing happens, IPVS will find the old
entry and use it for the new one, behaving like a forced persistence.
But if you consider a cluster with a heavy load of small connections,
such reuse will happen often and may lead to a not optimal load
balancing and might prevent a new node from getting a fair load.

This patch introduces a new sysctl, conn_reuse_mode, that allows
controlling how to proceed when port reuse is detected. The default
value will allow rescheduling of new connections only if the old entry
was in TIME_WAIT state for TCP or CLOSED for SCTP.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-02-25 13:46:35 +09:00
Pablo Neira Ayuso 8f711a601d Merge https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs
Simon Horman says:

====================
Second Round of IPVS Fixes for v3.20

This patch resolves some memory leaks in connection
synchronisation code that date back to v2.6.39.
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-02-24 19:35:27 +01:00
Julian Anastasov 528c943f3b ipvs: add missing ip_vs_pe_put in sync code
ip_vs_conn_fill_param_sync() gets in param.pe a module
reference for persistence engine from __ip_vs_pe_getbyname()
but forgets to put it. Problem occurs in backup for
sync protocol v1 (2.6.39).

Also, pe_data usually comes in sync messages for
connection templates and ip_vs_conn_new() copies
the pointer only in this case. Make sure pe_data
is not leaked if it comes unexpectedly for normal
connections. Leak can happen only if bogus messages
are sent to backup server.

Fixes: fe5e7a1efb ("IPVS: Backup, Adding Version 1 receive capability")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-02-22 16:16:36 -05:00
Bojan Prtvar 059a2440fd net: Remove state argument from skb_find_text()
Although it is clear that textsearch state is intentionally passed to
skb_find_text() as uninitialized argument, it was never used by the
callers. Therefore, we can simplify skb_find_text() by making it
local variable.

Signed-off-by: Bojan Prtvar <prtvar.b@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-22 15:59:54 -05:00
Pablo Neira Ayuso 02263db00b netfilter: nf_tables: fix addition/deletion of elements from commit/abort
We have several problems in this path:

1) There is a use-after-free when removing individual elements from
   the commit path.

2) We have to uninit() the data part of the element from the abort
   path to avoid a chain refcount leak.

3) We have to check for set->flags to see if there's a mapping, instead
   of the element flags.

4) We have to check for !(flags & NFT_SET_ELEM_INTERVAL_END) to skip
   elements that are part of the interval that have no data part, so
   they don't need to be uninit().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-02-22 21:05:08 +01:00
Arturo Borrero 2156d321b8 netfilter: nft_compat: don't truncate ethernet protocol type to u8
Use u16 for protocol and then cast it to __be16

>> net/netfilter/nft_compat.c:140:37: sparse: incorrect type in assignment (different base types)
   net/netfilter/nft_compat.c:140:37:    expected restricted __be16 [usertype] ethproto
   net/netfilter/nft_compat.c:140:37:    got unsigned char [unsigned] [usertype] proto
>> net/netfilter/nft_compat.c:351:37: sparse: incorrect type in assignment (different base types)
   net/netfilter/nft_compat.c:351:37:    expected restricted __be16 [usertype] ethproto
   net/netfilter/nft_compat.c:351:37:    got unsigned char [unsigned] [usertype] proto

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-02-22 21:04:06 +01:00
David S. Miller ee92259849 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains updates for your net tree, they are:

1) Fix removal of destination in IPVS when the new mixed family support
   is used, from Alexey Andriyanov via Simon Horman.

2) Fix module refcount undeflow in nft_compat when reusing a match /
   target.

3) Fix iptables-restore when the recent match is used with a new hitcount
   that exceeds threshold, from Florian Westphal.

4) Fix stack corruption in xt_socket due to using stack storage to save
   the inner IPv6 header, from Eric Dumazet.

I'll follow up soon with another batch with more fixes that are still
cooking.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-20 17:36:20 -05:00
Eric Dumazet 78296c97ca netfilter: xt_socket: fix a stack corruption bug
As soon as extract_icmp6_fields() returns, its local storage (automatic
variables) is deallocated and can be overwritten.

Lets add an additional parameter to make sure storage is valid long
enough.

While we are at it, adds some const qualifiers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: b64c9256a9 ("tproxy: added IPv6 support to the socket match")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-02-16 17:00:48 +01:00
Florian Westphal cef9ed86ed netfilter: xt_recent: don't reject rule if new hitcount exceeds table max
given:
-A INPUT -m recent --update --seconds 30 --hitcount 4
and
iptables-save > foo

then
iptables-restore < foo

will fail with:
kernel: xt_recent: hitcount (4) is larger than packets to be remembered (4) for table DEFAULT

Even when the check is fixed, the restore won't work if the hitcount is
increased to e.g. 6, since by the time checkentry runs it will find the
'old' incarnation of the table.

We can avoid this by increasing the maximum threshold silently; we only
have to rm all the current entries of the table (these entries would
not have enough room to handle the increased hitcount).

This even makes (not-very-useful)
-A INPUT -m recent --update --seconds 30 --hitcount 4
-A INPUT -m recent --update --seconds 30 --hitcount 42
work.

Fixes: abc86d0f99 (netfilter: xt_recent: relax ip_pkt_list_tot restrictions)
Tracked-down-by: Chris Vine <chris@cvine.freeserve.co.uk>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-02-16 17:00:47 +01:00
Pablo Neira Ayuso 520aa7414b netfilter: nft_compat: fix module refcount underflow
Feb 12 18:20:42 nfdev kernel: ------------[ cut here ]------------
Feb 12 18:20:42 nfdev kernel: WARNING: CPU: 4 PID: 4359 at kernel/module.c:963 module_put+0x9b/0xba()
Feb 12 18:20:42 nfdev kernel: CPU: 4 PID: 4359 Comm: ebtables-compat Tainted: G        W      3.19.0-rc6+ #43
[...]
Feb 12 18:20:42 nfdev kernel: Call Trace:
Feb 12 18:20:42 nfdev kernel: [<ffffffff815fd911>] dump_stack+0x4c/0x65
Feb 12 18:20:42 nfdev kernel: [<ffffffff8103e6f7>] warn_slowpath_common+0x9c/0xb6
Feb 12 18:20:42 nfdev kernel: [<ffffffff8109919f>] ? module_put+0x9b/0xba
Feb 12 18:20:42 nfdev kernel: [<ffffffff8103e726>] warn_slowpath_null+0x15/0x17
Feb 12 18:20:42 nfdev kernel: [<ffffffff8109919f>] module_put+0x9b/0xba
Feb 12 18:20:42 nfdev kernel: [<ffffffff813ecf7c>] nft_match_destroy+0x45/0x4c
Feb 12 18:20:42 nfdev kernel: [<ffffffff813e683f>] nf_tables_rule_destroy+0x28/0x70

Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
2015-02-16 17:00:36 +01:00
David S. Miller 4a3046d68a Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains two small Netfilter updates for your
net-next tree, they are:

1) Add ebtables support to nft_compat, from Arturo Borrero.

2) Fix missing validation of the SET_ID attribute in the lookup
   expressions, from Patrick McHardy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-11 14:27:24 -08:00
Wu Fengguang 7f73b9f1ca netfilter: ipset: fix boolreturn.cocci warnings
net/netfilter/xt_set.c:196:9-10: WARNING: return of 0/1 in function 'set_match_v3' with return type bool
net/netfilter/xt_set.c:242:9-10: WARNING: return of 0/1 in function 'set_match_v4' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

CC: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-02-11 16:13:30 +01:00
Anton Blanchard 5cca4ace0f netfilter: Don't hide NETFILTER_XT_MATCH_ADDRTYPE behind NETFILTER_ADVANCED
Docker needs NETFILTER_XT_MATCH_ADDRTYPE, so move it out from behind
NETFILTER_ADVANCED and make it default to a module.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-02-11 16:09:29 +01:00
Julian Anastasov cd67cd5eb2 ipvs: use 64-bit rates in stats
IPVS stats are limited to 2^(32-10) conns/s and packets/s,
2^(32-5) bytes/s. It is time to use 64 bits:

* Change all conn/packet kernel counters to 64-bit and update
them in u64_stats_update_{begin,end} section

* In kernel use struct ip_vs_kstats instead of the user-space
struct ip_vs_stats_user and use new func ip_vs_export_stats_user
to export it to sockopt users to preserve compatibility with
32-bit values

* Rename cpu counters "ustats" to "cnt"

* To netlink users provide additionally 64-bit stats:
IPVS_SVC_ATTR_STATS64 and IPVS_DEST_ATTR_STATS64. Old stats
remain for old binaries.

* We can use ip_vs_copy_stats in ip_vs_stats_percpu_show

Thanks to Chris Caputo for providing initial patch for ip_vs_est.c

Signed-off-by: Chris Caputo <ccaputo@alt.net>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-02-09 16:59:03 +09:00
Alexey Andriyanov dd3733b3e7 ipvs: fix inability to remove a mixed-family RS
The current code prevents any operation with a mixed-family dest
unless IP_VS_CONN_F_TUNNEL flag is set. The problem is that it's impossible
for the client to follow this rule, because ip_vs_genl_parse_dest does
not even read the destination conn_flags when cmd = IPVS_CMD_DEL_DEST
(need_full_dest = 0).

Also, not every client can pass this flag when removing a dest. ipvsadm,
for example, does not support the "-i" command line option together with
the "-d" option.

This change disables any checks for mixed-family on IPVS_CMD_DEL_DEST command.

Signed-off-by: Alexey Andriyanov <alan@al-an.info>
Fixes: bc18d37f67 ("ipvs: Allow heterogeneous pools now that we support them")
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-02-09 14:13:30 +09:00
David S. Miller 6e03f896b5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/vxlan.c
	drivers/vhost/net.c
	include/linux/if_vlan.h
	net/core/dev.c

The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.

In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.

In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.

In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 14:33:28 -08:00
Herbert Xu 9a77662882 netfilter: Use rhashtable walk iterator
This patch gets rid of the manual rhashtable walk in nft_hash
which touches rhashtable internals that should not be exposed.
It does so by using the rhashtable iterator primitives.

Note that I'm leaving nft_hash_destroy alone since it's only
invoked on shutdown and it shouldn't be affected by changes
to rhashtable internals (or at least not what I'm planning to
change).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 20:34:53 -08:00
Patrick McHardy 4c1017aa80 netfilter: nft_lookup: add missing attribute validation for NFTA_LOOKUP_SET_ID
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-30 19:08:20 +01:00
Arturo Borrero 5191f4d82d netfilter: nft_compat: add ebtables support
This patch extends nft_compat to support ebtables extensions.

ebtables verdict codes are translated to the ones used by the nf_tables engine,
so we can properly use ebtables target extensions from nft_compat.

This patch extends previous work by Giuseppe Longo <giuseppelng@gmail.com>.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-30 19:07:59 +01:00
Pablo Neira Ayuso f5553c19ff netfilter: nf_tables: fix leaks in error path of nf_tables_newchain()
Release statistics and module refcount on memory allocation problems.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-30 18:42:08 +01:00
Julian Anastasov 579eb62ac3 ipvs: rerouting to local clients is not needed anymore
commit f5a41847ac ("ipvs: move ip_route_me_harder for ICMP")
from 2.6.37 introduced ip_route_me_harder() call for responses to
local clients, so that we can provide valid rt_src after SNAT.
It was used by TCP to provide valid daddr for ip_send_reply().
After commit 0a5ebb8000 ("ipv4: Pass explicit daddr arg to
ip_send_reply()." from 3.0 this rerouting is not needed anymore
and should be avoided, especially in LOCAL_IN.

Fixes 3.12.33 crash in xfrm reported by Florian Wiessner:
"3.12.33 - BUG xfrm_selector_match+0x25/0x2f6"

Reported-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de>
Tested-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-01-30 10:05:55 +09:00
Pablo Neira Ayuso e8781f70a5 netfilter: nf_tables: disable preemption when restoring chain counters
With CONFIG_DEBUG_PREEMPT=y

[22144.496057] BUG: using smp_processor_id() in preemptible [00000000] code: iptables-compat/10406
[22144.496061] caller is debug_smp_processor_id+0x17/0x1b
[22144.496065] CPU: 2 PID: 10406 Comm: iptables-compat Not tainted 3.19.0-rc4+ #
[...]
[22144.496092] Call Trace:
[22144.496098]  [<ffffffff8145b9fa>] dump_stack+0x4f/0x7b
[22144.496104]  [<ffffffff81244f52>] check_preemption_disabled+0xd6/0xe8
[22144.496110]  [<ffffffff81244f90>] debug_smp_processor_id+0x17/0x1b
[22144.496120]  [<ffffffffa07c557e>] nft_stats_alloc+0x94/0xc7 [nf_tables]
[22144.496130]  [<ffffffffa07c73d2>] nf_tables_newchain+0x471/0x6d8 [nf_tables]
[22144.496140]  [<ffffffffa07c5ef6>] ? nft_trans_alloc+0x18/0x34 [nf_tables]
[22144.496154]  [<ffffffffa063c8da>] nfnetlink_rcv_batch+0x2b4/0x457 [nfnetlink]

Reported-by: Andreas Schultz <aschultz@tpip.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-26 11:50:02 +01:00
Pablo Neira Ayuso 75e8d06d43 netfilter: nf_tables: validate hooks in NAT expressions
The user can crash the kernel if it uses any of the existing NAT
expressions from the wrong hook, so add some code to validate this
when loading the rule.

This patch introduces nft_chain_validate_hooks() which is based on
an existing function in the bridge version of the reject expression.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-19 14:52:39 +01:00
Johannes Berg 053c095a82 netlink: make nlmsg_end() and genlmsg_end() void
Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.

This makes the very common pattern of

  if (genlmsg_end(...) < 0) { ... }

be a whole bunch of dead code. Many places also simply do

  return nlmsg_end(...);

and the caller is expected to deal with it.

This also commonly (at least for me) causes errors, because it is very
common to write

  if (my_function(...))
    /* error condition */

and if my_function() does "return nlmsg_end()" this is of course wrong.

Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.

Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did

-	return nlmsg_end(...);
+	nlmsg_end(...);
+	return 0;

I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with <= 0 in dump functionality, but that could just
be changed to < 0 with no change in behaviour, so I opted for the more
efficient version.

One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for <0 or <=0 and thus broke out of the loop every single time.
I've preserved this since it will (I think) have caused the messages to
userspace to be formatted differently with just a single message for
every SKB returned to userspace. It's possible that this isn't needed
for the tools that actually use this, but I don't even know what they
are so couldn't test that changing this behaviour would be acceptable.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-18 01:03:45 -05:00
David S. Miller 4e7a84b1a5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
netfilter updates for net-next

The following patchset contains netfilter updates for net-next, just a
bunch of cleanups and small enhancement to selectively flush conntracks
in ctnetlink, more specifically the patches are:

1) Rise default number of buckets in conntrack from 16384 to 65536 in
   systems with >= 4GBytes, patch from Marcelo Leitner.

2) Small refactor to save one level on indentation in xt_osf, from
   Joe Perches.

3) Remove unnecessary sizeof(char) in nf_log, from Fabian Frederick.

4) Another small cleanup to remove redundant variable in nfnetlink,
   from Duan Jiong.

5) Fix compilation warning in nfnetlink_cthelper on parisc, from
   Chen Gang.

6) Fix wrong format in debugging for ctseqadj, from Gao feng.

7) Selective conntrack flushing through the mark for ctnetlink, patch
   from Kristian Evensen.

8) Remove nf_ct_conntrack_flush_report() exported symbol now that is
   not required anymore after the selective flushing patch, again from
   Kristian.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 01:50:25 -05:00
David S. Miller 3f3558bb51 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/xen-netfront.c

Minor overlapping changes in xen-netfront.c, mostly to do
with some buffer management changes alongside the split
of stats into TX and RX.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 00:53:17 -05:00
David S. Miller 2bd8221804 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
netfilter/ipvs fixes for net

The following patchset contains netfilter/ipvs fixes, they are:

1) Small fix for the FTP helper in IPVS, a diff variable may be left
   unset when CONFIG_IP_VS_IPV6 is set. Patch from Dan Carpenter.

2) Fix nf_tables port NAT in little endian archs, patch from leroy
   christophe.

3) Fix race condition between conntrack confirmation and flush from
   userspace. This is the second reincarnation to resolve this problem.

4) Make sure inner messages in the batch come with the nfnetlink header.

5) Relax strict check from nfnetlink_bind() that may break old userspace
   applications using all 1s group mask.

6) Schedule removal of chains once no sets and rules refer to them in
   the new nf_tables ruleset flush command. Reported by Asbjoern Sloth
   Toennesen.

Note that this batch comes later than usual because of the short
winter holidays.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 00:14:49 -05:00
Kristian Evensen ae406bd057 netfilter: conntrack: Remove nf_ct_conntrack_flush_report
The only user of nf_ct_conntrack_flush_report() was ctnetlink_del_conntrack().
After adding support for flushing connections with a given mark, this function
is no longer called.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-08 12:16:58 +01:00
Kristian Evensen 866476f323 netfilter: conntrack: Flush connections with a given mark
This patch adds support for selective flushing of conntrack mappings.
By adding CTA_MARK and CTA_MARK_MASK to a delete-message, the mark (and
mask) is checked before a connection is deleted while flushing.

Configuring the flush is moved out of ctnetlink_del_conntrack(), and
instead of calling nf_conntrack_flush_report(), we always call
nf_ct_iterate_cleanup().  This enables us to only make one call from the
new ctnetlink_flush_conntrack() and makes it easy to add more filter
parameters.

Filtering is done in the ctnetlink_filter_match()-function, which is
also called from ctnetlink_dump_table(). ctnetlink_dump_filter has been
renamed ctnetlink_filter, to indicated that it is no longer only used
when dumping conntrack entries.

Moreover, reject mark filters with -EOPNOTSUPP if no ct mark support is
available.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-08 12:14:20 +01:00
Pablo Neira Ayuso a2f18db0c6 netfilter: nf_tables: fix flush ruleset chain dependencies
Jumping between chains doesn't mix well with flush ruleset. Rules
from a different chain and set elements may still refer to us.

[  353.373791] ------------[ cut here ]------------
[  353.373845] kernel BUG at net/netfilter/nf_tables_api.c:1159!
[  353.373896] invalid opcode: 0000 [#1] SMP
[  353.373942] Modules linked in: intel_powerclamp uas iwldvm iwlwifi
[  353.374017] CPU: 0 PID: 6445 Comm: 31c3.nft Not tainted 3.18.0 #98
[  353.374069] Hardware name: LENOVO 5129CTO/5129CTO, BIOS 6QET47WW (1.17 ) 07/14/2010
[...]
[  353.375018] Call Trace:
[  353.375046]  [<ffffffff81964c31>] ? nf_tables_commit+0x381/0x540
[  353.375101]  [<ffffffff81949118>] nfnetlink_rcv+0x3d8/0x4b0
[  353.375150]  [<ffffffff81943fc5>] netlink_unicast+0x105/0x1a0
[  353.375200]  [<ffffffff8194438e>] netlink_sendmsg+0x32e/0x790
[  353.375253]  [<ffffffff818f398e>] sock_sendmsg+0x8e/0xc0
[  353.375300]  [<ffffffff818f36b9>] ? move_addr_to_kernel.part.20+0x19/0x70
[  353.375357]  [<ffffffff818f44f9>] ? move_addr_to_kernel+0x19/0x30
[  353.375410]  [<ffffffff819016d2>] ? verify_iovec+0x42/0xd0
[  353.375459]  [<ffffffff818f3e10>] ___sys_sendmsg+0x3f0/0x400
[  353.375510]  [<ffffffff810615fa>] ? native_sched_clock+0x2a/0x90
[  353.375563]  [<ffffffff81176697>] ? acct_account_cputime+0x17/0x20
[  353.375616]  [<ffffffff8110dc78>] ? account_user_time+0x88/0xa0
[  353.375667]  [<ffffffff818f4bbd>] __sys_sendmsg+0x3d/0x80
[  353.375719]  [<ffffffff81b184f4>] ? int_check_syscall_exit_work+0x34/0x3d
[  353.375776]  [<ffffffff818f4c0d>] SyS_sendmsg+0xd/0x20
[  353.375823]  [<ffffffff81b1826d>] system_call_fastpath+0x16/0x1b

Release objects in this order: rules -> sets -> chains -> tables, to
make sure no references to chains are held anymore.

Reported-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.biz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06 22:27:48 +01:00
Pablo Neira Ayuso 62924af247 netfilter: nfnetlink: relax strict multicast group check from netlink_bind
Relax the checking that was introduced in 97840cb ("netfilter:
nfnetlink: fix insufficient validation in nfnetlink_bind") when the
subscription bitmask is used. Existing userspace code code may request
to listen to all of the existing netlink groups by setting an all to one
subscription group bitmask. Netlink already validates subscription via
setsockopt() for us.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06 22:27:47 +01:00
Pablo Neira Ayuso 9ea2aa8b7d netfilter: nfnetlink: validate nfnetlink header from batch
Make sure there is enough room for the nfnetlink header in the
netlink messages that are part of the batch. There is a similar
check in netlink_rcv_skb().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06 22:27:46 +01:00
Pablo Neira Ayuso 8ca3f5e974 netfilter: conntrack: fix race between confirmation and flush
Commit 5195c14c8b ("netfilter: conntrack: fix race in
__nf_conntrack_confirm against get_next_corpse") aimed to resolve the
race condition between the confirmation (packet path) and the flush
command (from control plane). However, it introduced a crash when
several packets race to add a new conntrack, which seems easier to
reproduce when nf_queue is in place.

Fix this race, in __nf_conntrack_confirm(), by removing the CT
from unconfirmed list before checking the DYING bit. In case
race occured, re-add the CT to the dying list

This patch also changes the verdict from NF_ACCEPT to NF_DROP when
we lose race. Basically, the confirmation happens for the first packet
that we see in a flow. If you just invoked conntrack -F once (which
should be the common case), then this is likely to be the first packet
of the flow (unless you already called flush anytime soon in the past).
This should be hard to trigger, but better drop this packet, otherwise
we leave things in inconsistent state since the destination will likely
reply to this packet, but it will find no conntrack, unless the origin
retransmits.

The change of the verdict has been discussed in:
https://www.marc.info/?l=linux-netdev&m=141588039530056&w=2

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06 22:27:45 +01:00
Gao feng b44b565cf5 netfilter: nf_ct_seqadj: print ack seq in the right host byte order
new_start_seq and new_end_seq are network byte order,
print the host byte order in debug message and print
seq number as the type of unsigned int.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@soleta.eu>
2015-01-05 13:52:20 +01:00
Chen Gang b18c5d15e8 netfilter: nfnetlink_cthelper: Remove 'const' and '&' to avoid warnings
The related code can be simplified, and also can avoid related warnings
(with allmodconfig under parisc):

    CC [M]  net/netfilter/nfnetlink_cthelper.o
  net/netfilter/nfnetlink_cthelper.c: In function ‘nfnl_cthelper_from_nlattr’:
  net/netfilter/nfnetlink_cthelper.c:97:9: warning: passing argument 1 o ‘memcpy’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-array-qualifiers]
    memcpy(&help->data, nla_data(attr), help->helper->data_len);
           ^
  In file included from include/linux/string.h:17:0,
                   from include/uapi/linux/uuid.h:25,
                   from include/linux/uuid.h:23,
                   from include/linux/mod_devicetable.h:12,
                   from ./arch/parisc/include/asm/hardware.h:4,
                   from ./arch/parisc/include/asm/processor.h:15,
                   from ./arch/parisc/include/asm/spinlock.h:6,
                   from ./arch/parisc/include/asm/atomic.h:21,
                   from include/linux/atomic.h:4,
                   from ./arch/parisc/include/asm/bitops.h:12,
                   from include/linux/bitops.h:36,
                   from include/linux/kernel.h:10,
                   from include/linux/list.h:8,
                   from include/linux/module.h:9,
                   from net/netfilter/nfnetlink_cthelper.c:11:
  ./arch/parisc/include/asm/string.h:8:8: note: expected ‘void *’ but argument is of type ‘const char (*)[]’
   void * memcpy(void * dest,const void *src,size_t count);
          ^

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@soleta.eu>
2015-01-05 12:42:54 +01:00
Duan Jiong 0f8162326f netfilter: nfnetlink: remove redundant variable nskb
Actually after netlink_skb_clone() is called, the nskb and
skb will point to the same thing, but they are used just like
they are different, sometimes this is confusing, so i think
there is no necessary to keep nskb anymore.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@soleta.eu>
2015-01-05 12:10:49 +01:00
Thomas Graf 97defe1ecf rhashtable: Per bucket locks & deferred expansion/shrinking
Introduces an array of spinlocks to protect bucket mutations. The number
of spinlocks per CPU is configurable and selected based on the hash of
the bucket. This allows for parallel insertions and removals of entries
which do not share a lock.

The patch also defers expansion and shrinking to a worker queue which
allows insertion and removal from atomic context. Insertions and
deletions may occur in parallel to it and are only held up briefly
while the particular bucket is linked or unzipped.

Mutations of the bucket table pointer is protected by a new mutex, read
access is RCU protected.

In the event of an expansion or shrinking, the new bucket table allocated
is exposed as a so called future table as soon as the resize process
starts.  Lookups, deletions, and insertions will briefly use both tables.
The future table becomes the main table after an RCU grace period and
initial linking of the old to the new table was performed. Optimization
of the chains to make use of the new number of buckets follows only the
new table is in use.

The side effect of this is that during that RCU grace period, a bucket
traversal using any rht_for_each() variant on the main table will not see
any insertions performed during the RCU grace period which would at that
point land in the future table. The lookup will see them as it searches
both tables if needed.

Having multiple insertions and removals occur in parallel requires nelems
to become an atomic counter.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03 14:32:57 -05:00
Thomas Graf 897362e446 nft_hash: Remove rhashtable_remove_pprev()
The removal function of nft_hash currently stores a reference to the
previous element during lookup which is used to optimize removal later
on. This was possible because a lock is held throughout calling
rhashtable_lookup() and rhashtable_remove().

With the introdution of deferred table resizing in parallel to lookups
and insertions, the nftables lock will no longer synchronize all
table mutations and the stored pprev may become invalid.

Removing this optimization makes removal slightly more expensive on
average but allows taking the resize cost out of the insert and
remove path.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03 14:32:57 -05:00
Thomas Graf 88d6ed15ac rhashtable: Convert bucket iterators to take table and index
This patch is in preparation to introduce per bucket spinlocks. It
extends all iterator macros to take the bucket table and bucket
index. It also introduces a new rht_dereference_bucket() to
handle protected accesses to buckets.

It introduces a barrier() to the RCU iterators to the prevent
the compiler from caching the first element.

The lockdep verifier is introduced as stub which always succeeds
and properly implement in the next patch when the locks are
introduced.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03 14:32:56 -05:00
Thomas Graf 8d24c0b431 rhashtable: Do hashing inside of rhashtable_lookup_compare()
Hash the key inside of rhashtable_lookup_compare() like
rhashtable_lookup() does. This allows to simplify the hashing
functions and keep them private.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03 14:32:56 -05:00
Johannes Berg 023e2cfa36 netlink/genetlink: pass network namespace to bind/unbind
Netlink families can exist in multiple namespaces, and for the most
part multicast subscriptions are per network namespace. Thus it only
makes sense to have bind/unbind notifications per network namespace.

To achieve this, pass the network namespace of a given client socket
to the bind/unbind functions.

Also do this in generic netlink, and there also make sure that any
bind for multicast groups that only exist in init_net is rejected.
This isn't really a problem if it is accepted since a client in a
different namespace will never receive any notifications from such
a group, but it can confuse the family if not rejected (it's also
possible to silently (without telling the family) accept it, but it
would also have to be ignored on unbind so families that take any
kind of action on bind/unbind won't do unnecessary work for invalid
clients like that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-27 03:07:50 -05:00
leroy christophe 7b5bca4676 netfilter: nf_tables: fix port natting in little endian archs
Make sure this fetches 16-bits port data from the register.
Remove casting to make sparse happy, not needed anymore.

Signed-off-by: leroy christophe <christophe.leroy@c-s.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-23 15:34:28 +01:00
Fabian Frederick 8aefc4d1c6 netfilter: log: remove unnecessary sizeof(char)
sizeof(char) is always 1.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-23 14:33:58 +01:00
Joe Perches 372e28661a netfilter: xt_osf: Use continue to reduce indentation
Invert logic in test to use continue.

This routine already uses continue, use it a bit more to
minimize > 80 column long lines and unnecessary indentation.

No change in compiled object file.

Other miscellanea:

o Remove trailing whitespace
o Realign arguments to multiline statement

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-23 14:20:10 +01:00
Marcelo Leitner 88eab472ec netfilter: conntrack: adjust nf_conntrack_buckets default value
Manually bumping either nf_conntrack_buckets or nf_conntrack_max has
become a common task as our Linux servers tend to serve more and more
clients/applications, so let's adjust nf_conntrack_buckets this to a
more updated value.

Now for systems with more than 4GB of memory, nf_conntrack_buckets
becomes 65536 instead of 16384, resulting in nf_conntrack_max=256k
entries.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-23 14:20:10 +01:00
Pablo Neira Ayuso 70314fc684 Merge tag 'ipvs2-for-v3.19' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next into ipvs-next
Simon Horman says:

====================
Second round of IPVS Updates for v3.19

please consider these IPVS updates for v3.19 or alternatively v3.20.

The single patch in this series fixes a long standing bug that
has not caused any trouble and thus is not being prioritised as a fix.
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-18 20:54:26 +01:00
Linus Torvalds 70e71ca0af Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) New offloading infrastructure and example 'rocker' driver for
    offloading of switching and routing to hardware.

    This work was done by a large group of dedicated individuals, not
    limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
    Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

 2) Start making the networking operate on IOV iterators instead of
    modifying iov objects in-situ during transfers.  Thanks to Al Viro
    and Herbert Xu.

 3) A set of new netlink interfaces for the TIPC stack, from Richard
    Alpe.

 4) Remove unnecessary looping during ipv6 routing lookups, from Martin
    KaFai Lau.

 5) Add PAUSE frame generation support to gianfar driver, from Matei
    Pavaluca.

 6) Allow for larger reordering levels in TCP, which are easily
    achievable in the real world right now, from Eric Dumazet.

 7) Add a variable of napi_schedule that doesn't need to disable cpu
    interrupts, from Eric Dumazet.

 8) Use a doubly linked list to optimize neigh_parms_release(), from
    Nicolas Dichtel.

 9) Various enhancements to the kernel BPF verifier, and allow eBPF
    programs to actually be attached to sockets.  From Alexei
    Starovoitov.

10) Support TSO/LSO in sunvnet driver, from David L Stevens.

11) Allow controlling ECN usage via routing metrics, from Florian
    Westphal.

12) Remote checksum offload, from Tom Herbert.

13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
    driver, from Thomas Lendacky.

14) Add MPLS support to openvswitch, from Simon Horman.

15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
    Klassert.

16) Do gro flushes on a per-device basis using a timer, from Eric
    Dumazet.  This tries to resolve the conflicting goals between the
    desired handling of bulk vs.  RPC-like traffic.

17) Allow userspace to ask for the CPU upon what a packet was
    received/steered, via SO_INCOMING_CPU.  From Eric Dumazet.

18) Limit GSO packets to half the current congestion window, from Eric
    Dumazet.

19) Add a generic helper so that all drivers set their RSS keys in a
    consistent way, from Eric Dumazet.

20) Add xmit_more support to enic driver, from Govindarajulu
    Varadarajan.

21) Add VLAN packet scheduler action, from Jiri Pirko.

22) Support configurable RSS hash functions via ethtool, from Eyal
    Perry.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
  Fix race condition between vxlan_sock_add and vxlan_sock_release
  net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
  net/mlx4: Add support for A0 steering
  net/mlx4: Refactor QUERY_PORT
  net/mlx4_core: Add explicit error message when rule doesn't meet configuration
  net/mlx4: Add A0 hybrid steering
  net/mlx4: Add mlx4_bitmap zone allocator
  net/mlx4: Add a check if there are too many reserved QPs
  net/mlx4: Change QP allocation scheme
  net/mlx4_core: Use tasklet for user-space CQ completion events
  net/mlx4_core: Mask out host side virtualization features for guests
  net/mlx4_en: Set csum level for encapsulated packets
  be2net: Export tunnel offloads only when a VxLAN tunnel is created
  gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
  cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
  net: fec: only enable mdio interrupt before phy device link up
  net: fec: clear all interrupt events to support i.MX6SX
  net: fec: reset fep link status in suspend function
  net: sock: fix access via invalid file descriptor
  net: introduce helper macro for_each_cmsghdr
  ...
2014-12-11 14:27:06 -08:00
Linus Torvalds cbfe0de303 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS changes from Al Viro:
 "First pile out of several (there _definitely_ will be more).  Stuff in
  this one:

   - unification of d_splice_alias()/d_materialize_unique()

   - iov_iter rewrite

   - killing a bunch of ->f_path.dentry users (and f_dentry macro).

     Getting that completed will make life much simpler for
     unionmount/overlayfs, since then we'll be able to limit the places
     sensitive to file _dentry_ to reasonably few.  Which allows to have
     file_inode(file) pointing to inode in a covered layer, with dentry
     pointing to (negative) dentry in union one.

     Still not complete, but much closer now.

   - crapectomy in lustre (dead code removal, mostly)

   - "let's make seq_printf return nothing" preparations

   - assorted cleanups and fixes

  There _definitely_ will be more piles"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  copy_from_iter_nocache()
  new helper: iov_iter_kvec()
  csum_and_copy_..._iter()
  iov_iter.c: handle ITER_KVEC directly
  iov_iter.c: convert copy_to_iter() to iterate_and_advance
  iov_iter.c: convert copy_from_iter() to iterate_and_advance
  iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
  iov_iter.c: convert iov_iter_zero() to iterate_and_advance
  iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
  iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
  iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
  iov_iter.c: iterate_and_advance
  iov_iter.c: macros for iterating over iov_iter
  kill f_dentry macro
  dcache: fix kmemcheck warning in switch_names
  new helper: audit_file()
  nfsd_vfs_write(): use file_inode()
  ncpfs: use file_inode()
  kill f_dentry uses
  lockd: get rid of ->f_path.dentry->d_sb
  ...
2014-12-10 16:10:49 -08:00
Dan Carpenter 3b05ac3824 ipvs: uninitialized data with IP_VS_IPV6
The app_tcp_pkt_out() function expects "*diff" to be set and ends up
using uninitialized data if CONFIG_IP_VS_IPV6 is turned on.

The same issue is there in app_tcp_pkt_in().  Thanks to Julian Anastasov
for noticing that.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-12-10 17:36:47 +09:00
Hannes Frederic Sowa dbfc4fb7d5 dst: no need to take reference on DST_NOCACHE dsts
Since commit f886497212 ("ipv4: fix dst race in sk_dst_get()")
DST_NOCACHE dst_entries get freed by RCU. So there is no need to get a
reference on them when we are in rcu protected sections.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 16:08:17 -05:00
Al Viro ba00410b81 Merge branch 'iov_iter' into for-next 2014-12-08 20:39:29 -05:00
David S. Miller 244ebd9f8f Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains netfilter updates for net-next. Basically,
enhancements for xt_recent, skip zeroing of timer in conntrack, fix
linking problem with recent redirect support for nf_tables, ipset
updates and a couple of cleanups. More specifically, they are:

1) Rise maximum number per IP address to be remembered in xt_recent
   while retaining backward compatibility, from Florian Westphal.

2) Skip zeroing timer area in nf_conn objects, also from Florian.

3) Inspect IPv4 and IPv6 traffic from the bridge to allow filtering using
   using meta l4proto and transport layer header, from Alvaro Neira.

4) Fix linking problems in the new redirect support when CONFIG_IPV6=n
   and IP6_NF_IPTABLES=n.

And ipset updates from Jozsef Kadlecsik:

5) Support updating element extensions when the set is full (fixes
   netfilter bugzilla id 880).

6) Fix set match with 32-bits userspace / 64-bits kernel.

7) Indicate explicitly when /0 networks are supported in ipset.

8) Simplify cidr handling for hash:*net* types.

9) Allocate the proper size of memory when /0 networks are supported.

10) Explicitly add padding elements to hash:net,net and hash:net,port,
    because the elements must be u32 sized for the used hash function.

Jozsef is also cooking ipset RCU conversion which should land soon if
they reach the merge window in time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 20:56:46 -08:00
Jozsef Kadlecsik cac3763967 netfilter: ipset: Explicitly add padding elements to hash:net, net and hash:net, port, net
The elements must be u32 sized for the used hash function.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:36 +01:00
Jozsef Kadlecsik 77b4311d20 netfilter: ipset: Allocate the proper size of memory when /0 networks are supported
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:36 +01:00
Jozsef Kadlecsik 25a76f3463 netfilter: ipset: Simplify cidr handling for hash:*net* types
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:36 +01:00
Jozsef Kadlecsik 59de79cf57 netfilter: ipset: Indicate when /0 networks are supported
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:36 +01:00
Jozsef Kadlecsik a51b9199b1 netfilter: ipset: Alignment problem between 64bit kernel 32bit userspace
Sven-Haegar Koch reported the issue:

sims:~# iptables -A OUTPUT -m set --match-set testset src -j ACCEPT
iptables: Invalid argument. Run `dmesg' for more information.

In syslog:
x_tables: ip_tables: set.3 match: invalid size 48 (kernel) != (user) 32

which was introduced by the counter extension in ipset.

The patch fixes the alignment issue with introducing a new set match
revision with the fixed underlying 'struct ip_set_counter_match'
structure.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:35 +01:00
Jozsef Kadlecsik 86ac79c7be netfilter: ipset: Support updating extensions when the set is full
When the set was full (hash type and maxelem reached), it was not
possible to update the extension part of already existing elements.
The patch removes this limitation.

Fixes: https://bugzilla.netfilter.org/show_bug.cgi?id=880
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:34 +01:00
David S. Miller 60b7379dc5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-29 20:47:48 -08:00
Pablo Neira Ayuso b59eaf9e28 netfilter: combine IPv4 and IPv6 nf_nat_redirect code in one module
This resolves linking problems with CONFIG_IPV6=n:

net/built-in.o: In function `redirect_tg6':
xt_REDIRECT.c:(.text+0x6d021): undefined reference to `nf_nat_redirect_ipv6'

Reported-by: Andreas Ruprecht <rupran@einserver.de>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27 13:08:42 +01:00
Florian Westphal c41884ce05 netfilter: conntrack: avoid zeroing timer
add a __nfct_init_offset annotation member to struct nf_conn to make
it clear which members are covered by the memset when the conntrack
is allocated.

This avoids zeroing timer_list and ct_net; both are already inited
explicitly.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27 12:41:06 +01:00
Florian Westphal abc86d0f99 netfilter: xt_recent: relax ip_pkt_list_tot restrictions
The maximum value for the hitcount parameter is given by
"ip_pkt_list_tot" parameter (default: 20).

Exceeding this value on the command line will cause the rule to be
rejected.  The parameter is also readonly, i.e. it cannot be changed
without module unload or reboot.

Store size per table, then base nstamps[] size on the hitcount instead.

The module parameter is retained for backwards compatibility.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27 12:40:31 +01:00
Pablo Neira 43612d7c04 Revert "netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse"
This reverts commit 5195c14c8b.

If the conntrack clashes with an existing one, it is left out of
the unconfirmed list, thus, crashing when dropping the packet and
releasing the conntrack since golden rule is that conntracks are
always placed in any of the existing lists for traceability reasons.

Reported-by: Daniel Borkmann <dborkman@redhat.com>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=88841
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-25 14:14:51 -05:00
David S. Miller 958d03b016 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
netfilter/ipvs updates for net-next

The following patchset contains Netfilter updates for your net-next
tree, this includes the NAT redirection support for nf_tables, the
cgroup support for nft meta and conntrack zone support for the connlimit
match. Coming after those, a bunch of sparse warning fixes, missing
netns bits and cleanups. More specifically, they are:

1) Prepare IPv4 and IPv6 NAT redirect code to use it from nf_tables,
   patches from Arturo Borrero.

2) Introduce the nf_tables redir expression, from Arturo Borrero.

3) Remove an unnecessary assignment in ip_vs_xmit/__ip_vs_get_out_rt().
   Patch from Alex Gartrell.

4) Add nft_log_dereference() macro to the nf_log infrastructure, patch
   from Marcelo Leitner.

5) Add some extra validation when registering logger families, also
   from Marcelo.

6) Some spelling cleanups from stephen hemminger.

7) Fix sparse warning in nf_logger_find_get().

8) Add cgroup support to nf_tables meta, patch from Ana Rey.

9) A Kconfig fix for the new redir expression and fix sparse warnings in
   the new redir expression.

10) Fix several sparse warnings in the netfilter tree, from
    Florian Westphal.

11) Reduce verbosity when OOM in nfnetlink_log. User can basically do
    nothing when this situation occurs.

12) Add conntrack zone support to xt_connlimit, again from Florian.

13) Add netnamespace support to the h323 conntrack helper, contributed
    by Vasily Averin.

14) Remove unnecessary nul-pointer checks before free_percpu() and
    module_put(), from Markus Elfring.

15) Use pr_fmt in nfnetlink_log, again patch from Marcelo Leitner.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-24 16:00:58 -05:00
David S. Miller 1459143386 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ieee802154/fakehard.c

A bug fix went into 'net' for ieee802154/fakehard.c, which is removed
in 'net-next'.

Add build fix into the merge from Stephen Rothwell in openvswitch, the
logging macros take a new initial 'log' argument, a new call was added
in 'net' so when we merge that in here we have to explicitly add the
new 'log' arg to it else the build fails.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 22:28:24 -05:00
Marcelo Leitner beacd3e8ef netfilter: nfnetlink_log: Make use of pr_fmt where applicable
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-20 14:09:01 +01:00
Markus Elfring 982f405136 netfilter: Deletion of unnecessary checks before two function calls
The functions free_percpu() and module_put() test whether their argument
is NULL and then return immediately. Thus the test around the call is
not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-20 13:08:43 +01:00
Vasily Averin 2c7b5d5dac netfilter: nf_conntrack_h323: lookup route from proper net namespace
Signed-off-by: Vasily Averin <vvs@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-17 12:47:14 +01:00
Florian Westphal e59ea3df3f netfilter: xt_connlimit: honor conntrack zone if available
Currently all the conntrack lookups are done using default zone.
In case the skb has a ct attached (e.g. template) we should use this zone
for lookups instead.  This makes connlimit work with connections assigned
to other zones.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-17 12:44:20 +01:00
Pablo Neira Ayuso 97840cb67f netfilter: nfnetlink: fix insufficient validation in nfnetlink_bind
Make sure the netlink group exists, otherwise you can trigger an out
of bound array memory access from the netlink_bind() path. This splat
can only be triggered only by superuser.

[  180.203600] UBSan: Undefined behaviour in ../net/netfilter/nfnetlink.c:467:28
[  180.204249] index 9 is out of range for type 'int [9]'
[  180.204697] CPU: 0 PID: 1771 Comm: trinity-main Not tainted 3.18.0-rc4-mm1+ #122
[  180.205365] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
+04/01/2014
[  180.206498]  0000000000000018 0000000000000000 0000000000000009 ffff88007bdf7da8
[  180.207220]  ffffffff82b0ef5f 0000000000000092 ffffffff845ae2e0 ffff88007bdf7db8
[  180.207887]  ffffffff8199e489 ffff88007bdf7e18 ffffffff8199ea22 0000003900000000
[  180.208639] Call Trace:
[  180.208857] dump_stack (lib/dump_stack.c:52)
[  180.209370] ubsan_epilogue (lib/ubsan.c:174)
[  180.209849] __ubsan_handle_out_of_bounds (lib/ubsan.c:400)
[  180.210512] nfnetlink_bind (net/netfilter/nfnetlink.c:467)
[  180.210986] netlink_bind (net/netlink/af_netlink.c:1483)
[  180.211495] SYSC_bind (net/socket.c:1541)

Moreover, define the missing nf_tables and nf_acct multicast groups too.

Reported-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-17 12:01:13 +01:00
bill bonaparte 5195c14c8b netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse
After removal of the central spinlock nf_conntrack_lock, in
commit 93bb0ceb75 ("netfilter: conntrack: remove central
spinlock nf_conntrack_lock"), it is possible to race against
get_next_corpse().

The race is against the get_next_corpse() cleanup on
the "unconfirmed" list (a per-cpu list with seperate locking),
which set the DYING bit.

Fix this race, in __nf_conntrack_confirm(), by removing the CT
from unconfirmed list before checking the DYING bit.  In case
race occured, re-add the CT to the dying list.

While at this, fix coding style of the comment that has been
updated.

Fixes: 93bb0ceb75 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
Reported-by: bill bonaparte <programme110@gmail.com>
Signed-off-by: bill bonaparte <programme110@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-14 17:43:05 +01:00
Thomas Graf 6eba82248e rhashtable: Drop gfp_flags arg in insert/remove functions
Reallocation is only required for shrinking and expanding and both rely
on a mutex for synchronization and callers of rhashtable_init() are in
non atomic context. Therefore, no reason to continue passing allocation
hints through the API.

Instead, use GFP_KERNEL and add __GFP_NOWARN | __GFP_NORETRY to allow
for silent fall back to vzalloc() without the OOM killer jumping in as
pointed out by Eric Dumazet and Eric W. Biederman.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:18:40 -05:00
Herbert Xu 7b4ce23534 rhashtable: Add parent argument to mutex_is_held
Currently mutex_is_held can only test locks in the that are global
since it takes no arguments.  This prevents rhashtable from being
used in places where locks are lock, e.g., per-namespace locks.

This patch adds a parent field to mutex_is_held and rhashtable_params
so that local locks can be used (and tested).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:13:05 -05:00
Herbert Xu 1f501d6252 netfilter: Move mutex_is_held under PROVE_LOCKING
The rhashtable function mutex_is_held is only used when PROVE_LOCKING
is enabled.  This patch modifies netfilter so that we can rhashtable.h
itself can later make mutex_is_held optional depending on PROVE_LOCKING.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:13:05 -05:00
Pablo Neira Ayuso 8225161545 netfilter: nfnetlink_log: remove unnecessary error messages
In case of OOM, there's nothing userspace can do.

If there's no room to put the payload in __build_packet_message(),
jump to nla_put_failure which already performs the corresponding
error reporting.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-13 13:13:00 +01:00
Florian Westphal 5676864431 netfilter: fix various sparse warnings
net/bridge/br_netfilter.c:870:6: symbol 'br_netfilter_enable' was not declared. Should it be static?
  no; add include
net/ipv4/netfilter/nft_reject_ipv4.c:22:6: symbol 'nft_reject_ipv4_eval' was not declared. Should it be static?
  yes
net/ipv6/netfilter/nf_reject_ipv6.c:16:6: symbol 'nf_send_reset6' was not declared. Should it be static?
  no; add include
net/ipv6/netfilter/nft_reject_ipv6.c:22:6: symbol 'nft_reject_ipv6_eval' was not declared. Should it be static?
  yes
net/netfilter/core.c:33:32: symbol 'nf_ipv6_ops' was not declared. Should it be static?
  no; add include
net/netfilter/xt_DSCP.c:40:57: cast truncates bits from constant value (ffffff03 becomes 3)
net/netfilter/xt_DSCP.c:57:59: cast truncates bits from constant value (ffffff03 becomes 3)
  add __force, 3 is what we want.
net/ipv4/netfilter/nf_log_arp.c:77:6: symbol 'nf_log_arp_packet' was not declared. Should it be static?
  yes
net/ipv4/netfilter/nf_reject_ipv4.c:17:6: symbol 'nf_send_reset' was not declared. Should it be static?
  no; add include

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-13 12:14:42 +01:00
Pablo Neira Ayuso b326dd37b9 netfilter: nf_tables: restore synchronous object release from commit/abort
The existing xtables matches and targets, when used from nft_compat, may
sleep from the destroy path, ie. when removing rules. Since the objects
are released via call_rcu from softirq context, this results in lockdep
splats and possible lockups that may be hard to reproduce.

Patrick also indicated that delayed object release via call_rcu can
cause us problems in the ordering of event notifications when anonymous
sets are in place.

So, this patch restores the synchronous object release from the commit
and abort paths. This includes a call to synchronize_rcu() to make sure
that no packets are walking on the objects that are going to be
released. This is slowier though, but it's simple and it resolves the
aforementioned problems.

This is a partial revert of c7c32e7 ("netfilter: nf_tables: defer all
object release via rcu") that was introduced in 3.16 to speed up
interaction with userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 12:06:24 +01:00
Pablo Neira Ayuso afefb6f928 netfilter: nft_compat: use the match->table to validate dependencies
Instead of the match->name, which is of course not relevant.

Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 12:06:24 +01:00
Pablo Neira Ayuso c918687f5e netfilter: nft_compat: relax chain type validation
Check for nat chain dependency only, which is the one that can
actually crash the kernel. Don't care if mangle, filter and security
specific match and targets are used out of their scope, they are
harmless.

This restores iptables-compat with mangle specific match/target when
used out of the OUTPUT chain, that are actually emulated through filter
chains, which broke when performing strict validation.

Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 12:06:24 +01:00
Pablo Neira Ayuso 2daf1b4d18 netfilter: nft_compat: use current net namespace
Instead of init_net when using xtables over nftables compat.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 12:06:24 +01:00
Pablo Neira Ayuso baf4750d92 netfilter: nft_redir: fix sparse warnings
>> net/netfilter/nft_redir.c:39:26: sparse: incorrect type in assignment (different base types)
   net/netfilter/nft_redir.c:39:26:    expected unsigned int [unsigned] [usertype] nla_be32
   net/netfilter/nft_redir.c:39:26:    got restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:46:34: sparse: incorrect type in assignment (different base types)
   net/netfilter/nft_redir.c:46:34:    expected unsigned int [unsigned] [usertype] nla_be32
   net/netfilter/nft_redir.c:46:34:    got restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32

Fixes: e9105f1 ("netfilter: nf_tables: add new expression nft_redir")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 12:00:04 +01:00
Pablo Neira Ayuso f6c6339d5e netfilter: fix unmet dependencies in NETFILTER_XT_TARGET_REDIRECT
warning: (NETFILTER_XT_TARGET_REDIRECT) selects NF_NAT_REDIRECT_IPV4 which has unmet direct dependencies (NET && INET && NETFILTER && NF_NAT_IPV4)

warning: (NETFILTER_XT_TARGET_REDIRECT) selects NF_NAT_REDIRECT_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_NAT_IPV6)

Fixes: 8b13edd ("netfilter: refactor NAT redirect IPv4 to use it from nf_tables")
Fixes: 9de920e ("netfilter: refactor NAT redirect IPv6 code to use it from nf_tables")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 11:54:12 +01:00
Calvin Owens 50656d9df6 ipvs: Keep skb->sk when allocating headroom on tunnel xmit
ip_vs_prepare_tunneled_skb() ignores ->sk when allocating a new
skb, either unconditionally setting ->sk to NULL or allowing
the uninitialized ->sk from a newly allocated skb to leak through
to the caller.

This patch properly copies ->sk and increments its reference count.

Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-11-12 11:03:04 +09:00
Dan Carpenter 2196937e12 netfilter: ipset: small potential read beyond the end of buffer
We could be reading 8 bytes into a 4 byte buffer here.  It seems
harmless but adding a check is the right thing to do and it silences a
static checker warning.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-11 13:46:37 +01:00
Ana Rey ce674173e9 netfilter: nft_meta: add cgroup support
This allows you to filter traffic by process control group (cgroup).

Signed-off-by: Ana Rey <anarey@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-09 16:21:22 +01:00
Steven Rostedt (Red Hat) e71456ae98 netfilter: Remove checks of seq_printf() return values
The return value of seq_printf() is soon to be removed. Remove the
checks from seq_printf() in favor of seq_has_overflowed().

Link: http://lkml.kernel.org/r/20141104142236.GA10239@salvia
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-05 14:11:02 -05:00
Joe Perches 824f1fbee7 netfilter: Convert print_tuple functions to return void
Since adding a new function to seq_file (seq_has_overflowed())
there isn't any value for functions called from seq_show to
return anything.   Remove the int returns of the various
print_tuple/<foo>_print_tuple functions.

Link: http://lkml.kernel.org/p/f2e8cf8df433a197daa62cbaf124c900c708edc7.1412031505.git.joe@perches.com

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-05 14:10:33 -05:00
Steven Rostedt (Red Hat) 37246a5837 netfilter: Remove return values for print_conntrack callbacks
The seq_printf() and friends are having their return values removed.
The print_conntrack() returns the result of seq_printf(), which is
meaningless when seq_printf() returns void. Might as well remove the
return values of print_conntrack() as well.

Link: http://lkml.kernel.org/r/20141029220107.465008329@goodmis.org
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-05 14:09:47 -05:00
Pablo Neira Ayuso c5a589cc30 netfilter: nf_log: fix sparse warning in nf_logger_find_get()
net/netfilter/nf_log.c:157:16: warning: incorrect type in assignment (different address spaces)
net/netfilter/nf_log.c:157:16:    expected struct nf_logger *logger
net/netfilter/nf_log.c:157:16:    got struct nf_logger [noderef] <asn:4>*<noident>

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-04 17:56:31 +01:00
stephen hemminger 01cfa0a4ed netfilter: fix spelling errors
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-30 17:35:30 +01:00
Pablo Neira Ayuso c3ac759ea6 Merge branch 'ipvs-next'
Simon Horman says:

====================
The single patch in this series fixes some minor fallout from adding
support IPv6 real servers in IPv4 virtual-services and vice versa.

It should not have any run-time affect other than perhaps saving a few cycles.
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-30 16:52:30 +01:00
Marcelo Leitner 8ac2bde2a4 netfilter: log: protect nf_log_register against double registering
Currently, despite the comment right before the function,
nf_log_register allows registering two loggers on with the same type and
end up overwriting the previous register.

Not a real issue today as current tree doesn't have two loggers for the
same type but it's better to get this protected.

Also make sure that all of its callers do error checking.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-30 16:41:48 +01:00
Marcelo Leitner 0c26ed1c07 netfilter: nf_log: Introduce nft_log_dereference() macro
Wrap up a common call pattern in an easier to handle call.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-30 16:39:40 +01:00
Alex Gartrell d770108911 ipvs: remove unnecessary assignment in __ip_vs_get_out_rt
It is a precondition of the function that daddr be equal to dest->addr.ip
if dest is non-NULL, so this additional assignment is just confusing for
stupid engineers like me.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-10-28 09:50:06 +09:00
Alex Gartrell 3d53666b40 ipvs: Avoid null-pointer deref in debug code
Use daddr instead of reaching into dest.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-10-28 09:48:31 +09:00
Arturo Borrero e9105f1bea netfilter: nf_tables: add new expression nft_redir
This new expression provides NAT in the redirect flavour, which is to
redirect packets to local machine.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-27 22:49:39 +01:00
Arturo Borrero 9de920eddb netfilter: refactor NAT redirect IPv6 code to use it from nf_tables
This patch refactors the IPv6 code so it can be usable both from xt and
nf_tables.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-27 22:48:10 +01:00
Arturo Borrero 8b13eddfdf netfilter: refactor NAT redirect IPv4 to use it from nf_tables
This patch refactors the IPv4 code so it can be usable both from xt and
nf_tables.

A similar patch follows-up to handle IPv6.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-27 22:47:06 +01:00
Arturo Borrero 7965ee9371 netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops()
The code looks for an already loaded target, and the correct list to search
is nft_target_list, not nft_match_list.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-27 22:17:46 +01:00
Houcheng Lin b51d3fa364 netfilter: nf_log: release skbuff on nlmsg put failure
The kernel should reserve enough room in the skb so that the DONE
message can always be appended.  However, in case of e.g. new attribute
erronously not being size-accounted for, __nfulnl_send() will still
try to put next nlmsg into this full skbuf, causing the skb to be stuck
forever and blocking delivery of further messages.

Fix issue by releasing skb immediately after nlmsg_put error and
WARN() so we can track down the cause of such size mismatch.

[ fw@strlen.de: add tailroom/len info to WARN ]

Signed-off-by: Houcheng Lin <houcheng@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-24 14:34:11 +02:00
Florian Westphal c1e7dc91ee netfilter: nfnetlink_log: fix maximum packet length logged to userspace
don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work.
The nla length includes the size of the nla struct, so anything larger
results in u16 integer overflow.

This patch is similar to
9cefbbc9c8 (netfilter: nfnetlink_queue: cleanup copy_range usage).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-24 14:32:27 +02:00
Florian Westphal 9dfa1dfe4d netfilter: nf_log: account for size of NLMSG_DONE attribute
We currently neither account for the nlattr size, nor do we consider
the size of the trailing NLMSG_DONE when allocating nlmsg skb.

This can result in nflog to stop working, as __nfulnl_send() re-tries
sending forever if it failed to append NLMSG_DONE (which will never
work if buffer is not large enough).

Reported-by: Houcheng Lin <houcheng@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-24 14:30:15 +02:00
Sabrina Dubroca c123bb7163 netfilter: nf_tables: check for NULL in nf_tables_newchain pcpu stats allocation
alloc_percpu returns NULL on failure, not a negative error code.

Fixes: ff3cd7b3c9 ("netfilter: nf_tables: refactor chain statistic routines")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-22 14:12:51 +02:00
Dan Carpenter 0f9f5e1b83 netfilter: ipset: off by one in ip_set_nfnl_get_byindex()
The ->ip_set_list[] array is initialized in ip_set_net_init() and it
has ->ip_set_max elements so this check should be >= instead of >
otherwise we are off by one.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-22 14:12:50 +02:00
Marcelo Leitner e37ad9fd63 netfilter: nf_conntrack: allow server to become a client in TW handling
When a port that was used to listen for inbound connections gets closed
and reused for outgoing connections (like rsh ends up doing for stderr
flow), current we may reject the SYN/ACK packet for the new connection
because tcp_conntracks states forbirds a port to become a client while
there is still a TIME_WAIT entry in there for it.

As TCP may expire the TIME_WAIT socket in 60s and conntrack's timeout
for it is 120s, there is a ~60s window that the application can end up
opening a port that conntrack will end up blocking.

This patch fixes this by simply allowing such state transition: if we
see a SYN, in TIME_WAIT state, on REPLY direction, move it to sSS. Note
that the rest of the code already handles this situation, more
specificly in tcp_packet(), first switch clause.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-22 14:12:50 +02:00
Florian Westphal 330966e501 net: make skb_gso_segment error handling more robust
skb_gso_segment has three possible return values:
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL.  This can happen when GSO is used for header verification.

However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.

Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.

However, there have been issues with some protocol handlers erronously not
respecting the specified feature mask in some cases.

It is preferable to get 'have to turn off hw offloading, else slow' reports
rather than 'kernel crashes'.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-20 12:38:13 -04:00
David S. Miller ce8ec48967 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
netfilter fixes for net

The following patchset contains netfilter fixes for your net tree,
they are:

1) Fix missing MODULE_LICENSE() in the new nf_reject_ipv{4,6} modules.

2) Restrict nat and masq expressions to the nat chain type. Otherwise,
   users may crash their kernel if they attach a nat/masq rule to a non
   nat chain.

3) Fix hook validation in nft_compat when non-base chains are used.
   Basically, initialize hook_mask to zero.

4) Make sure you use match/targets in nft_compat from the right chain
   type. The existing validation relies on the table name which can be
   avoided by

5) Better netlink attribute validation in nft_nat. This expression has
   to reject the configuration when no address and proto configurations
   are specified.

6) Interpret NFTA_NAT_REG_*_MAX if only if NFTA_NAT_REG_*_MIN is set.
   Yet another sanity check to reject incorrect configurations from
   userspace.

7) Conditional NAT attribute dumping depending on the existing
   configuration.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-20 11:57:47 -04:00
Pablo Neira Ayuso 1e2d56a5d3 netfilter: nft_nat: dump attributes if they are set
Dump NFTA_NAT_REG_ADDR_MIN if this is non-zero. Same thing with
NFTA_NAT_REG_PROTO_MIN.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-18 14:16:13 +02:00
Pablo Neira Ayuso 61cfac6b42 netfilter: nft_nat: NFTA_NAT_REG_ADDR_MAX depends on NFTA_NAT_REG_ADDR_MIN
Interpret NFTA_NAT_REG_ADDR_MAX if NFTA_NAT_REG_ADDR_MIN is present,
otherwise, skip it. Same thing with NFTA_NAT_REG_PROTO_MAX.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-18 14:16:12 +02:00
Pablo Neira Ayuso 5c819a3975 netfilter: nft_nat: insufficient attribute validation
We have to validate that we at least get an NFTA_NAT_REG_ADDR_MIN or
NFTA_NFT_REG_PROTO_MIN attribute. Reject the configuration if none
of them are present.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-18 14:16:11 +02:00
Pablo Neira Ayuso f3f5ddeddd netfilter: nft_compat: validate chain type in match/target
We have to validate the real chain type to ensure that matches/targets
are not used out from their scope (eg. MASQUERADE in nat chain type).
The existing validation relies on the table name, but this is not
sufficient since userspace can fool us by using the appropriate table
name with a different chain type.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-18 14:14:07 +02:00
Pablo Neira Ayuso 493618a92c netfilter: nft_compat: fix hook validation for non-base chains
Set hook_mask to zero for non-base chains, otherwise people may hit
bogus errors from the xt_check_target() and xt_check_match() when
validating the uninitialized hook_mask.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-14 12:52:40 +02:00
Rasmus Villemoes 18082746a2 netfilter: replace strnicmp with strncasecmp
The kernel used to contain two functions for length-delimited,
case-insensitive string comparison, strnicmp with correct semantics and
a slightly buggy strncasecmp.  The latter is the POSIX name, so strnicmp
was renamed to strncasecmp, and strnicmp made into a wrapper for the new
strncasecmp to avoid breaking existing users.

To allow the compat wrapper strnicmp to be removed at some point in the
future, and to avoid the extra indirection cost, do
s/strnicmp/strncasecmp/g.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:24 +02:00
Pablo Neira Ayuso 7210e4e38f netfilter: nf_tables: restrict nat/masq expressions to nat chain type
This adds the missing validation code to avoid the use of nat/masq from
non-nat chains. The validation assumes two possible configuration
scenarios:

1) Use of nat from base chain that is not of nat type. Reject this
   configuration from the nft_*_init() path of the expression.

2) Use of nat from non-base chain. In this case, we have to wait until
   the non-base chain is referenced by at least one base chain via
   jump/goto. This is resolved from the nft_*_validate() path which is
   called from nf_tables_check_loops().

The user gets an -EOPNOTSUPP in both cases.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-13 20:42:00 +02:00
David S. Miller 7b6fa1eef6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter fixes for net-next

This batch contains two fixes for what you have in your net-next,
they are:

1) Remove nf_send_reset6() from header file. This function now resides
   in the nf_reject_ipv6 module. Reported by Eric Dumazet.

2) Fix wrong NFT_REJECT_ICMPX_MAX definition and adjust code to fix
   errors reported by Dan Carpenter's static analysis tools.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-10 15:01:09 -04:00
Linus Torvalds 35a9ad8af0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Most notable changes in here:

   1) By far the biggest accomplishment, thanks to a large range of
      contributors, is the addition of multi-send for transmit.  This is
      the result of discussions back in Chicago, and the hard work of
      several individuals.

      Now, when the ->ndo_start_xmit() method of a driver sees
      skb->xmit_more as true, it can choose to defer the doorbell
      telling the driver to start processing the new TX queue entires.

      skb->xmit_more means that the generic networking is guaranteed to
      call the driver immediately with another SKB to send.

      There is logic added to the qdisc layer to dequeue multiple
      packets at a time, and the handling mis-predicted offloads in
      software is now done with no locks held.

      Finally, pktgen is extended to have a "burst" parameter that can
      be used to test a multi-send implementation.

      Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4,
      virtio_net

      Adding support is almost trivial, so export more drivers to
      support this optimization soon.

      I want to thank, in no particular or implied order, Jesper
      Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal
      Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann,
      David Tat, Hannes Frederic Sowa, and Rusty Russell.

   2) PTP and timestamping support in bnx2x, from Michal Kalderon.

   3) Allow adjusting the rx_copybreak threshold for a driver via
      ethtool, and add rx_copybreak support to enic driver.  From
      Govindarajulu Varadarajan.

   4) Significant enhancements to the generic PHY layer and the bcm7xxx
      driver in particular (EEE support, auto power down, etc.) from
      Florian Fainelli.

   5) Allow raw buffers to be used for flow dissection, allowing drivers
      to determine the optimal "linear pull" size for devices that DMA
      into pools of pages.  The objective is to get exactly the
      necessary amount of headers into the linear SKB area pre-pulled,
      but no more.  The new interface drivers use is eth_get_headlen().
      From WANG Cong, with driver conversions (several had their own
      by-hand duplicated implementations) by Alexander Duyck and Eric
      Dumazet.

   6) Support checksumming more smoothly and efficiently for
      encapsulations, and add "foo over UDP" facility.  From Tom
      Herbert.

   7) Add Broadcom SF2 switch driver to DSA layer, from Florian
      Fainelli.

   8) eBPF now can load programs via a system call and has an extensive
      testsuite.  Alexei Starovoitov and Daniel Borkmann.

   9) Major overhaul of the packet scheduler to use RCU in several major
      areas such as the classifiers and rate estimators.  From John
      Fastabend.

  10) Add driver for Intel FM10000 Ethernet Switch, from Alexander
      Duyck.

  11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric
      Dumazet.

  12) Add Datacenter TCP congestion control algorithm support, From
      Florian Westphal.

  13) Reorganize sk_buff so that __copy_skb_header() is significantly
      faster.  From Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits)
  netlabel: directly return netlbl_unlabel_genl_init()
  net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers
  net: description of dma_cookie cause make xmldocs warning
  cxgb4: clean up a type issue
  cxgb4: potential shift wrapping bug
  i40e: skb->xmit_more support
  net: fs_enet: Add NAPI TX
  net: fs_enet: Remove non NAPI RX
  r8169:add support for RTL8168EP
  net_sched: copy exts->type in tcf_exts_change()
  wimax: convert printk to pr_foo()
  af_unix: remove 0 assignment on static
  ipv6: Do not warn for informational ICMP messages, regardless of type.
  Update Intel Ethernet Driver maintainers list
  bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
  tipc: fix bug in multicast congestion handling
  net: better IFF_XMIT_DST_RELEASE support
  net/mlx4_en: remove NETDEV_TX_BUSY
  3c59x: fix bad split of cpu_to_le32(pci_map_single())
  net: bcmgenet: fix Tx ring priority programming
  ...
2014-10-08 21:40:54 -04:00
Linus Torvalds 28596c9722 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull "trivial tree" updates from Jiri Kosina:
 "Usual pile from trivial tree everyone is so eagerly waiting for"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  Remove MN10300_PROC_MN2WS0038
  mei: fix comments
  treewide: Fix typos in Kconfig
  kprobes: update jprobe_example.c for do_fork() change
  Documentation: change "&" to "and" in Documentation/applying-patches.txt
  Documentation: remove obsolete pcmcia-cs from Changes
  Documentation: update links in Changes
  Documentation: Docbook: Fix generated DocBook/kernel-api.xml
  score: Remove GENERIC_HAS_IOMAP
  gpio: fix 'CONFIG_GPIO_IRQCHIP' comments
  tty: doc: Fix grammar in serial/tty
  dma-debug: modify check_for_stack output
  treewide: fix errors in printk
  genirq: fix reference in devm_request_threaded_irq comment
  treewide: fix synchronize_rcu() in comments
  checkstack.pl: port to AArch64
  doc: queue-sysfs: minor fixes
  init/do_mounts: better syntax description
  MIPS: fix comment spelling
  powerpc/simpleboot: fix comment
  ...
2014-10-07 21:16:26 -04:00
Pablo Neira Ayuso f0d1f04f0a netfilter: fix wrong arithmetics regarding NFT_REJECT_ICMPX_MAX
NFT_REJECT_ICMPX_MAX should be __NFT_REJECT_ICMPX_MAX - 1.

nft_reject_icmp_code() and nft_reject_icmpv6_code() are called from the
packet path, so BUG_ON in case we try to access an unknown abstracted
ICMP code. This should not happen since we already validate this from
nft_reject_{inet,bridge}_init().

Fixes: 51b0a5d ("netfilter: nft_reject: introduce icmp code abstraction for inet and bridge")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-07 20:16:31 +02:00
David S. Miller 61b37d2f54 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains another batch with Netfilter/IPVS updates
for net-next, they are:

1) Add abstracted ICMP codes to the nf_tables reject expression. We
   introduce four reasons to reject using ICMP that overlap in IPv4
   and IPv6 from the semantic point of view. This should simplify the
   maintainance of dual stack rule-sets through the inet table.

2) Move nf_send_reset() functions from header files to per-family
   nf_reject modules, suggested by Patrick McHardy.

3) We have to use IS_ENABLED(CONFIG_BRIDGE_NETFILTER) everywhere in the
   code now that br_netfilter can be modularized. Convert remaining spots
   in the network stack code.

4) Use rcu_barrier() in the nf_tables module removal path to ensure that
   we don't leave object that are still pending to be released via
   call_rcu (that may likely result in a crash).

5) Remove incomplete arch 32/64 compat from nft_compat. The original (bad)
   idea was to probe the word size based on the xtables match/target info
   size, but this assumption is wrong when you have to dump the information
   back to userspace.

6) Allow to filter from prerouting and postrouting in the nf_tables bridge.
   In order to emulate the ebtables NAT chains (which are actually simple
   filter chains with no special semantics), we have support filtering from
   this hooks too.

7) Add explicit module dependency between xt_physdev and br_netfilter.
   This provides a way to detect if the user needs br_netfilter from
   the configuration path. This should reduce the breakage of the
   br_netfilter modularization.

8) Cleanup coding style in ip_vs.h, from Simon Horman.

9) Fix crash in the recently added nf_tables masq expression. We have
   to register/unregister the notifiers to clean up the conntrack table
   entries from the module init/exit path, not from the rule addition /
   deletion path. From Arturo Borrero.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-05 21:32:37 -04:00
David S. Miller 739e4a758e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/r8152.c
	net/netfilter/nfnetlink.c

Both r8152 and nfnetlink conflicts were simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-02 11:25:43 -07:00
Pablo Neira Ayuso 4b7fd5d97e netfilter: explicit module dependency between br_netfilter and physdev
You can use physdev to match the physical interface enslaved to the
bridge device. This information is stored in skb->nf_bridge and it is
set up by br_netfilter. So, this is only available when iptables is
used from the bridge netfilter path.

Since 34666d4 ("netfilter: bridge: move br_netfilter out of the core"),
the br_netfilter code is modular. To reduce the impact of this change,
we can autoload the br_netfilter if the physdev match is used since
we assume that the users need br_netfilter in place.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:57 +02:00
Pablo Neira Ayuso 756c1b1a7f netfilter: nft_compat: remove incomplete 32/64 bits arch compat code
This code was based on the wrong asumption that you can probe based
on the match/target private size that we get from userspace. This
doesn't work at all when you have to dump the info back to userspace
since you don't know what word size the userspace utility is using.

Currently, the extensions that require arch compat are limit match
and the ebt_mark match/target. The standard targets are not used by
the nft-xt compat layer, so they are not affected. We can work around
this limitation with a new revision that uses arch agnostic types.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:55 +02:00
Pablo Neira Ayuso 1b1bc49c0f netfilter: nf_tables: wait for call_rcu completion on module removal
Make sure the objects have been released before the nf_tables modules
is removed.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:54 +02:00
Pablo Neira Ayuso 1109a90c01 netfilter: use IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
In 34666d4 ("netfilter: bridge: move br_netfilter out of the core"),
the bridge netfilter code has been modularized.

Use IS_ENABLED instead of ifdef to cover the module case.

Fixes: 34666d4 ("netfilter: bridge: move br_netfilter out of the core")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:54 +02:00
Pablo Neira Ayuso 51b0a5d8c2 netfilter: nft_reject: introduce icmp code abstraction for inet and bridge
This patch introduces the NFT_REJECT_ICMPX_UNREACH type which provides
an abstraction to the ICMP and ICMPv6 codes that you can use from the
inet and bridge tables, they are:

* NFT_REJECT_ICMPX_NO_ROUTE: no route to host - network unreachable
* NFT_REJECT_ICMPX_PORT_UNREACH: port unreachable
* NFT_REJECT_ICMPX_HOST_UNREACH: host unreachable
* NFT_REJECT_ICMPX_ADMIN_PROHIBITED: administratevely prohibited

You can still use the specific codes when restricting the rule to match
the corresponding layer 3 protocol.

I decided to not overload the existing NFT_REJECT_ICMP_UNREACH to have
different semantics depending on the table family and to allow the user
to specify ICMP family specific codes if they restrict it to the
corresponding family.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:29:57 +02:00
John Fastabend 22e0f8b932 net: sched: make bstats per cpu and estimator RCU safe
In order to run qdisc's without locking statistics and estimators
need to be handled correctly.

To resolve bstats make the statistics per cpu. And because this is
only needed for qdiscs that are running without locks which is not
the case for most qdiscs in the near future only create percpu
stats when qdiscs set the TCQ_F_CPUSTATS flag.

Next because estimators use the bstats to calculate packets per
second and bytes per second the estimator code paths are updated
to use the per cpu statistics.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 01:02:26 -04:00
Florian Westphal db29a9508a netfilter: conntrack: disable generic tracking for known protocols
Given following iptables ruleset:

-P FORWARD DROP
-A FORWARD -m sctp --dport 9 -j ACCEPT
-A FORWARD -p tcp --dport 80 -j ACCEPT
-A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT

One would assume that this allows SCTP on port 9 and TCP on port 80.
Unfortunately, if the SCTP conntrack module is not loaded, this allows
*all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT,
which we think is a security issue.

This is because on the first SCTP packet on port 9, we create a dummy
"generic l4" conntrack entry without any port information (since
conntrack doesn't know how to extract this information).

All subsequent packets that are unknown will then be in established
state since they will fallback to proto_generic and will match the
'generic' entry.

Our originally proposed version [1] completely disabled generic protocol
tracking, but Jozsef suggests to not track protocols for which a more
suitable helper is available, hence we now mitigate the issue for in
tree known ct protocol helpers only, so that at least NAT and direction
information will still be preserved for others.

 [1] http://www.spinics.net/lists/netfilter-devel/msg33430.html

Joint work with Daniel Borkmann.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-29 12:17:49 +02:00
Arturo Borrero 9363dc4b59 netfilter: nf_tables: store and dump set policy
We want to know in which cases the user explicitly sets the policy
options. In that case, we also want to dump back the info.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-29 11:28:03 +02:00
David S. Miller e7af85db54 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
nf pull request for net

This series contains netfilter fixes for net, they are:

1) Fix lockdep splat in nft_hash when releasing sets from the
   rcu_callback context. We don't the mutex there anymore.

2) Remove unnecessary spinlock_bh in the destroy path of the nf_tables
   rbtree set type from rcu_callback context.

3) Fix another lockdep splat in rhashtable. None of the callers hold
   a mutex when calling rhashtable_destroy.

4) Fix duplicated error reporting from nfnetlink when aborting and
   replaying a batch.

5) Fix a Kconfig issue reported by kbuild robot.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 16:21:29 -04:00
Rob Jones 772476df70 net/netfilter/x_tables.c: use __seq_open_private()
Reduce boilerplate code by using __seq_open_private() instead of seq_open()
in xt_match_open() and xt_target_open().

Signed-off-by: Rob Jones <rob.jones@codethink.co.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-26 18:42:29 +02:00
Pablo Neira Ayuso 84d7fce693 netfilter: nf_tables: export rule-set generation ID
This patch exposes the ruleset generation ID in three ways:

1) The new command NFT_MSG_GETGEN that exposes the 32-bits ruleset
   generation ID. This ID is incremented in every commit and it
   should be large enough to avoid wraparound problems.

2) The less significant 16-bits of the generation ID are exposed through
   the nfgenmsg->res_id header field. This allows us to quickly catch
   if the ruleset has change between two consecutive list dumps from
   different object lists (in this specific case I think the risk of
   wraparound is unlikely).

3) Userspace subscribers may receive notifications of new rule-set
   generation after every commit. This also provides an alternative
   way to monitor the generation ID. If the events are lost, the
   userspace process hits a overrun error, so it knows that it is
   working with a stale ruleset anyway.

Patrick spotted that rule-set transformations in userspace may take
quite some time. In that case, it annotates the 32-bits generation ID
before fetching the rule-set, then:

1) it compares it to what we obtain after the transformation to
   make sure it is not working with a stale rule-set and no wraparound
   has ocurred.

2) it subscribes to ruleset notifications, so it can watch for new
   generation ID.

This is complementary to the NLM_F_DUMP_INTR approach, which allows
us to detect an interference in the middle one single list dumping.
There is no way to explicitly check that an interference has occurred
between two list dumps from the kernel, since it doesn't know how
many lists the userspace client is actually going to dump.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-19 11:14:43 +02:00
Pablo Neira Ayuso fc04733a1a netfilter: nfnetlink: use original skbuff when committing/aborting
This allows us to access the original content of the batch from
the commit and the abort paths.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-19 11:14:42 +02:00
Pablo Neira Ayuso fcfa8f493f Merge branch 'ipvs-next'
Simon Horman says:

====================
This pull requests makes the following changes:

* Add simple weighted fail-over scheduler.
  - Unlike other IPVS schedulers this offers fail-over rather than load
    balancing. Connections are directed to the appropriate server based
    solely on highest weight value and server availability.
  - Thanks to Kenny Mathis

* Support IPv6 real servers in IPv4 virtual-services and vice versa
  - This feature is supported in conjunction with the tunnel (IPIP)
    forwarding mechanism. That is, IPv4 may be forwarded in IPv6 and
    vice versa.
  - The motivation for this is to allow more flexibility in the
    choice of IP version offered by both virtual-servers and
    real-servers as they no longer need to match: An IPv4 connection from an
    end-user may be forwarded to a real-server using IPv6 and vice versa.
  - Further work need to be done to support this feature in conjunction
    with connection synchronisation. For now such configurations are
    not allowed.
  - This change includes update to netlink protocol, adding a new
    destination address family attribute. And the necessary changes
    to plumb this information throughout IPVS.
  - Thanks to Alex Gartrell and Julian Anastasov
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-18 10:59:33 +02:00
Alex Gartrell bc18d37f67 ipvs: Allow heterogeneous pools now that we support them
Remove the temporary consistency check and add a case statement to only
allow ipip mixed dests.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-18 08:59:29 +09:00
Julian Anastasov f18ae7206e ipvs: use the new dest addr family field
Use the new address family field cp->daf when printing
cp->daddr in logs or connection listing.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-18 08:59:28 +09:00
Julian Anastasov 4d316f3f9a ipvs: use correct address family in scheduler logs
Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-18 08:59:23 +09:00
Julian Anastasov cf34e646da ipvs: address family of LBLCR entry depends on svc family
The LBLCR entries should use svc->af, not dest->af.
Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:38 +09:00
Julian Anastasov f7fa380069 ipvs: address family of LBLC entry depends on svc family
The LBLC entries should use svc->af, not dest->af.
Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:38 +09:00
Alex Gartrell 8052ba2925 ipvs: support ipv4 in ipv6 and ipv6 in ipv4 tunnel forwarding
Pull the common logic for preparing an skb to prepend the header into a
single function and then set fields such that they can be used in either
case (generalize tos and tclass to dscp, hop_limit and ttl to ttl, etc)

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:37 +09:00
Alex Gartrell c63e4de2be ipvs: Add generic ensure_mtu_is_adequate to handle mixed pools
The out_rt functions check to see if the mtu is large enough for the packet
and, if not, send icmp messages (TOOBIG or DEST_UNREACH) to the source and
bail out.  We needed the ability to send ICMP from the out_rt_v6 function
and DEST_UNREACH from the out_rt function, so we just pulled it out into a
common function.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:37 +09:00
Alex Gartrell 919aa0b2bb ipvs: Pull out update_pmtu code
Another step toward heterogeneous pools, this removes another piece of
functionality currently specific to each address family type.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:36 +09:00
Alex Gartrell 4a4739d56b ipvs: Pull out crosses_local_route_boundary logic
This logic is repeated in both out_rt functions so it was redundant.
Additionally, we'll need to be able to do checks to route v4 to v6 and vice
versa in order to deal with heterogeneous pools.

This patch also updates the callsites to add an additional parameter to the
out route functions.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:36 +09:00
Alex Gartrell 391f503d69 ipvs: prevent mixing heterogeneous pools and synchronization
The synchronization protocol is not compatible with heterogeneous pools, so
we need to verify that we're not turning both on at the same time.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:35 +09:00
Alex Gartrell ba38528aae ipvs: Supply destination address family to ip_vs_conn_new
The assumption that dest af is equal to service af is now unreliable, so we
must specify it manually so as not to copy just the first 4 bytes of a v6
address or doing an illegal read of 16 butes on a v6 address.

We "lie" in two places: for synchronization (which we will explicitly
disallow from happening when we have heterogeneous pools) and for black
hole addresses where there's no real dest.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:34 +09:00
Alex Gartrell ad147aa4dd ipvs: Pass destination address family to ip_vs_trash_get_dest
Part of a series of diffs to tease out destination family from virtual
family.  This diff just adds a parameter to ip_vs_trash_get and then uses
it for comparison rather than svc->af.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:34 +09:00
Alex Gartrell 655eef103d ipvs: Supply destination addr family to ip_vs_{lookup_dest,find_dest}
We need to remove the assumption that virtual address family is the same as
real address family in order to support heterogeneous services (that is,
services with v4 vips and v6 backends or the opposite).

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:33 +09:00
Alex Gartrell 6cff339bbd ipvs: Add destination address family to netlink interface
This is necessary to support heterogeneous pools.  For example, if you have
an ipv6 addressed network, you'll want to be able to forward ipv4 traffic
into it.

This patch enforces that destination address family is the same as service
family, as none of the forwarding mechanisms support anything else.

For the old setsockopt mechanism, we simply set the dest address family to
AF_INET as we do with the service.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:33 +09:00