redonkable/remarkable-linux

Author	SHA1	Message	Date
Patrick McHardy	c29b72e025	netfilter: nft_payload: add optimized payload implementation for small loads Add an optimized payload expression implementation for small (up to 4 bytes) aligned data loads from the linear packet area. This patch also includes original Patrick McHardy's entitled (nf_tables: inline nft_payload_fast_eval() into main evaluation loop). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-14 17:16:10 +02:00
Patrick McHardy	cb7dbfd039	netfilter: nf_tables: add optimized data comparison for small values Add an optimized version of nft_data_cmp() that only handles values of to 4 bytes length. This patch includes original Patrick McHardy's patch entitled (nf_tables: inline nft_cmp_fast_eval() into main evaluation loop). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-14 17:16:09 +02:00
Patrick McHardy	ef1f7df917	netfilter: nf_tables: expression ops overloading Split the expression ops into two parts and support overloading of the runtime expression ops based on the requested function through a ->select_ops() callback. This can be used to provide optimized implementations, for instance for loading small aligned amounts of data from the packet or inlining frequently used operations into the main evaluation loop. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-14 17:16:08 +02:00
Patrick McHardy	20a69341f2	netfilter: nf_tables: add netlink set API This patch adds the new netlink API for maintaining nf_tables sets independently of the ruleset. The API supports the following operations: - creation of sets - deletion of sets - querying of specific sets - dumping of all sets - addition of set elements - removal of set elements - dumping of all set elements Sets are identified by name, each table defines an individual namespace. The name of a set may be allocated automatically, this is mostly useful in combination with the NFT_SET_ANONYMOUS flag, which destroys a set automatically once the last reference has been released. Sets can be marked constant, meaning they're not allowed to change while linked to a rule. This allows to perform lockless operation for set types that would otherwise require locking. Additionally, if the implementation supports it, sets can (as before) be used as maps, associating a data value with each key (or range), by specifying the NFT_SET_MAP flag and can be used for interval queries by specifying the NFT_SET_INTERVAL flag. Set elements are added and removed incrementally. All element operations support batching, reducing netlink message and set lookup overhead. The old "set" and "hash" expressions are replaced by a generic "lookup" expression, which binds to the specified set. Userspace is not aware of the actual set implementation used by the kernel anymore, all configuration options are generic. Currently the implementation selection logic is largely missing and the kernel will simply use the first registered implementation supporting the requested operation. Eventually, the plan is to have userspace supply a description of the data characteristics and select the implementation based on expected performance and memory use. This patch includes the new 'lookup' expression to look up for element matching in the set. This patch includes kernel-doc descriptions for this set API and it also includes the following fixes. From Patrick McHardy: * netfilter: nf_tables: fix set element data type in dumps * netfilter: nf_tables: fix indentation of struct nft_set_elem comments * netfilter: nf_tables: fix oops in nft_validate_data_load() * netfilter: nf_tables: fix oops while listing sets of built-in tables * netfilter: nf_tables: destroy anonymous sets immediately if binding fails * netfilter: nf_tables: propagate context to set iter callback * netfilter: nf_tables: add loop detection From Pablo Neira Ayuso: * netfilter: nf_tables: allow to dump all existing sets * netfilter: nf_tables: fix wrong type for flags variable in newelem Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-14 17:16:07 +02:00
Patrick McHardy	96518518cc	netfilter: add nftables This patch adds nftables which is the intended successor of iptables. This packet filtering framework reuses the existing netfilter hooks, the connection tracking system, the NAT subsystem, the transparent proxying engine, the logging infrastructure and the userspace packet queueing facilities. In a nutshell, nftables provides a pseudo-state machine with 4 general purpose registers of 128 bits and 1 specific purpose register to store verdicts. This pseudo-machine comes with an extensible instruction set, a.k.a. "expressions" in the nftables jargon. The expressions included in this patch provide the basic functionality, they are: * bitwise: to perform bitwise operations. * byteorder: to change from host/network endianess. * cmp: to compare data with the content of the registers. * counter: to enable counters on rules. * ct: to store conntrack keys into register. * exthdr: to match IPv6 extension headers. * immediate: to load data into registers. * limit: to limit matching based on packet rate. * log: to log packets. * meta: to match metainformation that usually comes with the skbuff. * nat: to perform Network Address Translation. * payload: to fetch data from the packet payload and store it into registers. * reject (IPv4 only): to explicitly close connection, eg. TCP RST. Using this instruction-set, the userspace utility 'nft' can transform the rules expressed in human-readable text representation (using a new syntax, inspired by tcpdump) to nftables bytecode. nftables also inherits the table, chain and rule objects from iptables, but in a more configurable way, and it also includes the original datatype-agnostic set infrastructure with mapping support. This set infrastructure is enhanced in the follow up patch (netfilter: nf_tables: add netlink set API). This patch includes the following components: * the netlink API: net/netfilter/nf_tables_api.c and include/uapi/netfilter/nf_tables.h * the packet filter core: net/netfilter/nf_tables_core.c * the expressions (described above): net/netfilter/nft_.c the filter tables: arp, IPv4, IPv6 and bridge: net/ipv4/netfilter/nf_tables_ipv4.c net/ipv6/netfilter/nf_tables_ipv6.c net/ipv4/netfilter/nf_tables_arp.c net/bridge/netfilter/nf_tables_bridge.c * the NAT table (IPv4 only): net/ipv4/netfilter/nf_table_nat_ipv4.c * the route table (similar to mangle): net/ipv4/netfilter/nf_table_route_ipv4.c net/ipv6/netfilter/nf_table_route_ipv6.c * internal definitions under: include/net/netfilter/nf_tables.h include/net/netfilter/nf_tables_core.h * It also includes an skeleton expression: net/netfilter/nft_expr_template.c and the preliminary implementation of the meta target net/netfilter/nft_meta_target.c It also includes a change in struct nf_hook_ops to add a new pointer to store private data to the hook, that is used to store the rule list per chain. This patch is based on the patch from Patrick McHardy, plus merged accumulated cleanups, fixes and small enhancements to the nftables code that has been done since 2009, which are: From Patrick McHardy: * nf_tables: adjust netlink handler function signatures * nf_tables: only retry table lookup after successful table module load * nf_tables: fix event notification echo and avoid unnecessary messages * nft_ct: add l3proto support * nf_tables: pass expression context to nft_validate_data_load() * nf_tables: remove redundant definition * nft_ct: fix maxattr initialization * nf_tables: fix invalid event type in nf_tables_getrule() * nf_tables: simplify nft_data_init() usage * nf_tables: build in more core modules * nf_tables: fix double lookup expression unregistation * nf_tables: move expression initialization to nf_tables_core.c * nf_tables: build in payload module * nf_tables: use NFPROTO constants * nf_tables: rename pid variables to portid * nf_tables: save 48 bits per rule * nf_tables: introduce chain rename * nf_tables: check for duplicate names on chain rename * nf_tables: remove ability to specify handles for new rules * nf_tables: return error for rule change request * nf_tables: return error for NLM_F_REPLACE without rule handle * nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification * nf_tables: fix NLM_F_MULTI usage in netlink notifications * nf_tables: include NLM_F_APPEND in rule dumps From Pablo Neira Ayuso: * nf_tables: fix stack overflow in nf_tables_newrule * nf_tables: nft_ct: fix compilation warning * nf_tables: nft_ct: fix crash with invalid packets * nft_log: group and qthreshold are 2^16 * nf_tables: nft_meta: fix socket uid,gid handling * nft_counter: allow to restore counters * nf_tables: fix module autoload * nf_tables: allow to remove all rules placed in one chain * nf_tables: use 64-bits rule handle instead of 16-bits * nf_tables: fix chain after rule deletion * nf_tables: improve deletion performance * nf_tables: add missing code in route chain type * nf_tables: rise maximum number of expressions from 12 to 128 * nf_tables: don't delete table if in use * nf_tables: fix basechain release From Tomasz Bursztyka: * nf_tables: Add support for changing users chain's name * nf_tables: Change chain's name to be fixed sized * nf_tables: Add support for replacing a rule by another one * nf_tables: Update uapi nftables netlink header documentation From Florian Westphal: * nft_log: group is u16, snaplen u32 From Phil Oester: * nf_tables: operational limit match Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-14 17:15:48 +02:00
Maxime Jayat	3f79410c7c	treewide: Fix common typo in "identify" Correct common misspelling of "identify" as "indentify" throughout the kernel Signed-off-by: Maxime Jayat <maxime@artisandeveloppeur.fr> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-10-14 15:31:06 +02:00
Pablo Neira Ayuso	f59cb0453c	netfilter: nf_nat: move alloc_null_binding to nf_nat_core.c Similar to nat_decode_session, alloc_null_binding is needed for both ip_tables and nf_tables, so move it to nf_nat_core.c. This change is required by nf_tables. This is an adapted version of the original patch from Patrick McHardy. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-14 11:29:39 +02:00
Patrick McHardy	795aa6ef6a	netfilter: pass hook ops to hookfn Pass the hook ops to the hookfn to allow for generic hook functions. This change is required by nf_tables. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-14 11:29:31 +02:00
Ingo Molnar	37bf06375c	Linux 3.12-rc4 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQEcBAABAgAGBQJSUc9zAAoJEHm+PkMAQRiG9DMH/AtpuAF6LlMRPjrCeuJQ1pyh T0IUO+CsLKO6qtM5IyweP8V6zaasNjIuW1+B6IwVIl8aOrM+M7CwRiKvpey26ldM I8G2ron7hqSOSQqSQs20jN2yGAqQGpYIbTmpdGLAjQ350NNNvEKthbP5SZR5PAmE UuIx5OGEkaOyZXvCZJXU9AZkCxbihlMSt2zFVxybq2pwnGezRUYgCigE81aeyE0I QLwzzMVdkCxtZEpkdJMpLILAz22jN4RoVDbXRa2XC7dA9I2PEEXI9CcLzqCsx2Ii 8eYS+no2K5N2rrpER7JFUB2B/2X8FaVDE+aJBCkfbtwaYTV9UYLq3a/sKVpo1Cs= =xSFJ -----END PGP SIGNATURE----- Merge tag 'v3.12-rc4' into sched/core Merge Linux v3.12-rc4 to fix a conflict and also to refresh the tree before applying more scheduler patches. Conflicts: arch/avr32/include/asm/Kbuild Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-09 12:36:13 +02:00
Eric Dumazet	efe4208f47	ipv6: make lookups simpler and faster TCP listener refactoring, part 4 : To speed up inet lookups, we moved IPv4 addresses from inet to struct sock_common Now is time to do the same for IPv6, because it permits us to have fast lookups for all kind of sockets, including upcoming SYN_RECV. Getting IPv6 addresses in TCP lookups currently requires two extra cache lines, plus a dereference (and memory stall). inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6 This patch is way bigger than its IPv4 counter part, because for IPv4, we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6, it's not doable easily. inet6_sk(sk)->daddr becomes sk->sk_v6_daddr inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr at the same offset. We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic macro. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-09 00:01:25 -04:00
David S. Miller	d639feaaf3	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter updates for your net-next tree, mostly ipset improvements and enhancements features, they are: * Don't call ip_nest_end needlessly in the error path from me, suggested by Pablo Neira Ayuso, from Jozsef Kadlecsik. * Fixed sparse warnings about shadowed variable and missing rcu annotation and fix of "may be used uninitialized" warnings, also from Jozsef. * Renamed simple macro names to avoid namespace issues, reported by David Laight, again from Jozsef. * Use fix sized type for timeout in the extension part, and cosmetic ordering of matches and targets separatedly in xt_set.c, from Jozsef. * Support package fragments for IPv4 protos without ports from Anders K. Pedersen. For example this allows a hash:ip,port ipset containing the entry 192.168.0.1,gre:0 to match all package fragments for PPTP VPN tunnels to/from the host. Without this patch only the first package fragment (with fragment offset 0) was matched. * Introduced a new operation to get both setname and family, from Jozsef. ip[6]tables set match and SET target need to know the family of the set in order to reject adding rules which refer to a set with a non-mathcing family. Currently such rules are silently accepted and then ignored instead of generating an error message to the user. * Reworked extensions support in ipset types from Jozsef. The approach of defining structures with all variations is not manageable as the number of extensions grows. Therefore a blob for the extensions is introduced, somewhat similar to conntrack. The support of extensions which need a per data destroy function is added as well. * When an element timed out in a list:set type of set, the garbage collector skipped the checking of the next element. So the purging was delayed to the next run of the gc, fixed by Jozsef. * A small Kconfig fix: NETFILTER_NETLINK cannot be selected and ipset requires it. * hash:net,net type from Oliver Smith. The type provides the ability to store pairs of subnets in a set. * Comment for ipset entries from Oliver Smith. This makes possible to annotate entries in a set with comments, for example: ipset n foo hash:net,net comment ipset a foo 10.0.0.0/21,192.168.1.0/24 comment "office nets A and B" * Fix of hash types resizing with comment extension from Jozsef. * Fix of new extensions for list:set type when an element is added into a slot from where another element was pushed away from Jozsef. * Introduction of a common function for the listing of the element extensions from Jozsef. * Net namespace support for ipset from Vitaly Lavrov. * hash:net,port,net type from Oliver Smith, which makes possible to store the triples of two subnets and a protocol, port pair in a set. * Get xt_TCPMSS working with net namespace, by Gao feng. * Use the proper net netnamespace to allocate skbs, also by Gao feng. * A couple of cleanups for the conntrack SIP helper, by Holger Eitzenberger. * Extend cttimeout to allow setting default conntrack timeouts via nfnetlink, so we can get rid of all our sysctl/proc interfaces in the future for timeout tuning, from me. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-04 13:26:38 -04:00
Peter Zijlstra	35a2af94c7	sched/wait: Make the __wait_event() interface more friendly Change all __wait_event() implementations to match the corresponding wait_event() signature for convenience. In particular this does away with the weird 'ret' logic. Since there are __wait_event() users this requires we update them too. Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131002092529.042563462@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-04 10:16:25 +02:00
David S. Miller	e024bdc051	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter/IPVS fixes for your net tree, they are: * Fix BUG_ON splat due to malformed TCP packets seen by synproxy, from Patrick McHardy. * Fix possible weight overflow in lblc and lblcr schedulers due to 32-bits arithmetics, from Simon Kirby. * Fix possible memory access race in the lblc and lblcr schedulers, introduced when it was converted to use RCU, two patches from Julian Anastasov. * Fix hard dependency on CPU 0 when reading per-cpu stats in the rate estimator, from Julian Anastasov. * Fix race that may lead to object use after release, when invoking ipvsadm -C && ipvsadm -R, introduced when adding RCU, from Julian Anastasov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-01 12:39:35 -04:00
Pablo Neira Ayuso	91cb498e6a	netfilter: cttimeout: allow to set/get default protocol timeouts Default timeouts are currently set via proc/sysctl interface, the typical pattern is a file name like: /proc/sys/net/netfilter/nf_conntrack_PROTOCOL_timeout_STATE This results in one entry per default protocol state timeout. This patch simplifies this by allowing to set default protocol timeouts via cttimeout netlink interface. This should allow us to get rid of the existing proc/sysctl code in the midterm. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-01 13:17:39 +02:00
holger@eitzenberger.org	180cf72f56	netfilter: nf_ct_sip: consolidate NAT hook functions There are currently seven different NAT hooks used in both nf_conntrack_sip and nf_nat_sip, each of the hooks is exported in nf_conntrack_sip, then set from the nf_nat_sip NAT helper. And because each of them is exported there is quite some overhead introduced due of this. By introducing nf_nat_sip_hooks I am able to reduce both text/data somewhat. For nf_conntrack_sip e. g. I get text data bss dec old 15243 5256 32 20531 new 15010 5192 32 20234 Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-01 12:47:09 +02:00
Gao feng	afff14f608	netfilter: nfnetlink_log: use proper net to allocate skb Use proper net struct to allocate skb, otherwise netlink mmap will be of no effect. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-01 12:46:56 +02:00
Gao feng	7433268783	netfilter: nfnetlink_queue: use proper net namespace to allocate skb Use proper net struct to allocate skb, otherwise netlink mmap will have no effect. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-01 12:20:31 +02:00
Oliver Smith	7c3ad056ef	netfilter: ipset: Add hash:net,port,net module to kernel. This adds a new set that provides similar functionality to ip,port,net but permits arbitrary size subnets for both the first and last parameter. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:42:58 +02:00
Vitaly Lavrov	1785e8f473	netfiler: ipset: Add net namespace for ipset This patch adds netns support for ipset. Major changes were made in ip_set_core.c and ip_set.h. Global variables are moved to per net namespace. Added initialization code and the destruction of the network namespace ipset subsystem. In the prototypes of public functions ip_set_* added parameter "struct net". The remaining corrections related to the change prototypes of public functions ip_set_. The patch for git://git.netfilter.org/ipset.git commit 6a4ec96c0b8caac5c35474e40e319704d92ca347 Signed-off-by: Vitaly Lavrov <lve@guap.ru> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:42:52 +02:00
Jozsef Kadlecsik	3fd986b3d9	netfilter: ipset: Use a common function at listing the extensions Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:42:36 +02:00
Jozsef Kadlecsik	8ec81f9a4d	netfilter: ipset: For set:list types, replaced elements must be zeroed out The new extensions require zero initialization for the new element to be added into a slot from where another element was pushed away. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:29 +02:00
Jozsef Kadlecsik	80571a9ea4	netfilter: ipset: Fix hash resizing with comments The destroy function must take into account that resizing doesn't create new extensions so those cannot be destroyed at resize. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:29 +02:00
Oliver Smith	fda75c6d9e	netfilter: ipset: Support comments in hash-type ipsets. This provides kernel support for creating ipsets with comment support. This does incur a penalty to flushing/destroying an ipset since all entries are walked in order to free the allocated strings, this penalty is of course less expensive than the operation of listing an ipset to userspace, so for general-purpose usage the overall impact is expected to be little to none. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:29 +02:00
Oliver Smith	81b10bb4bd	netfilter: ipset: Support comments in the list-type ipset. This provides kernel support for creating list ipsets with the comment annotation extension. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:29 +02:00
Oliver Smith	b90cb8ba19	netfilter: ipset: Support comments in bitmap-type ipsets. This provides kernel support for creating bitmap ipsets with comment support. As is the case for hashes, this incurs a penalty when flushing or destroying the entire ipset as the entries must first be walked in order to free the comment strings. This penalty is of course far less than the cost of listing an ipset to userspace. Any set created without support for comments will be flushed/destroyed as before. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:28 +02:00
Oliver Smith	68b63f08d2	netfilter: ipset: Support comments for ipset entries in the core. This adds the core support for having comments on ipset entries. The comments are stored as standard null-terminated strings in dynamically allocated memory after being passed to the kernel. As a result of this, code has been added to the generic destroy function to iterate all extensions and call that extension's destroy task if the set has that extension activated, and if such a task is defined. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:28 +02:00
Oliver Smith	ea53ac5b63	netfilter: ipset: Add hash:net,net module to kernel. This adds a new set that provides the ability to configure pairs of subnets. A small amount of additional handling code has been added to the generic hash header file - this code is conditionally activated by a preprocessor definition. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:28 +02:00
Jozsef Kadlecsik	d9628bbeca	netfilter: ipset: Kconfig: ipset needs NETFILTER_NETLINK Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:28 +02:00
Jozsef Kadlecsik	b91b396d5e	netfilter: ipset: list:set: make sure all elements are checked by the gc When an element timed out, the next one was skipped by the garbage collector, fixed. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:27 +02:00
Jozsef Kadlecsik	40cd63bf33	netfilter: ipset: Support extensions which need a per data destroy function Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:27 +02:00
Jozsef Kadlecsik	03c8b234e6	netfilter: ipset: Generalize extensions support Get rid of the structure based extensions and introduce a blob for the extensions. Thus we can support more extension types easily. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:27 +02:00
Jozsef Kadlecsik	ca134ce864	netfilter: ipset: Move extension data to set structure Default timeout and extension offsets are moved to struct set, because all set types supports all extensions and it makes possible to generalize extension support. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:27 +02:00
Jozsef Kadlecsik	f925f70569	netfilter: ipset: Rename extension offset ids to extension ids Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:27 +02:00
Jozsef Kadlecsik	a04d8b6bd9	netfilter: ipset: Prepare ipset to support multiple networks for hash types In order to support hash:net,net, hash:net,port,net etc. types, arrays are introduced for the book-keeping of existing cidr sizes and network numbers in a set. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:26 +02:00
Jozsef Kadlecsik	5e04c0c38c	netfilter: ipset: Introduce new operation to get both setname and family ip[6]tables set match and SET target need to know the family of the set in order to reject adding rules which refer to a set with a non-mathcing family. Currently such rules are silently accepted and then ignored instead of generating a clear error message to the user, which is not helpful. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:26 +02:00
Jozsef Kadlecsik	bd3129fc5e	netfilter: ipset: order matches and targets separatedly in xt_set.c Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:26 +02:00
Anders K. Pedersen	60b0fe3724	netfilter: ipset: Support package fragments for IPv4 protos without ports Enable ipset port set types to match IPv4 package fragments for protocols that doesn't have ports (or the port information isn't supported by ipset). For example this allows a hash:ip,port ipset containing the entry 192.168.0.1,gre:0 to match all package fragments for PPTP VPN tunnels to/from the host. Without this patch only the first package fragment (with fragment offset 0) was matched, while subsequent fragments wasn't. This is not possible for IPv6, where the protocol is in the fragmented part of the package unlike IPv4, where the protocol is in the IP header. IPPROTO_ICMPV6 is deliberately not included, because it isn't relevant for IPv4. Signed-off-by: Anders K. Pedersen <akp@surftown.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:26 +02:00
Jozsef Kadlecsik	20b2fab483	netfilter: ipset: Fix "may be used uninitialized" warnings Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:25 +02:00
Jozsef Kadlecsik	35b8dcf8c3	netfilter: ipset: Rename simple macro names to avoid namespace issues. Reported-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:25 +02:00
Jozsef Kadlecsik	a0f28dc754	netfilter: ipset: Fix sparse warnings due to missing rcu annotations Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:25 +02:00
Jozsef Kadlecsik	b3aabd149c	netfilter: ipset: Sparse warning about shadowed variable fixed net/netfilter/ipset/ip_set_hash_ipportnet.c:275:20: warning: symbol 'cidr' shadows an earlier one Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:25 +02:00
Jozsef Kadlecsik	122ebbf24c	netfilter: ipset: Don't call ip_nest_end needlessly in the error path Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:25 +02:00
Patrick McHardy	f4a87e7bd2	netfilter: synproxy: fix BUG_ON triggered by corrupt TCP packets TCP packets hitting the SYN proxy through the SYNPROXY target are not validated by TCP conntrack. When th->doff is below 5, an underflow happens when calculating the options length, causing skb_header_pointer() to return NULL and triggering the BUG_ON(). Handle this case gracefully by checking for NULL instead of using BUG_ON(). Reported-by: Martin Topholm <mph@one.com> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-30 12:44:38 +02:00
Gao feng	7722e0d1c0	netfilter: xt_TCPMSS: lookup route from proper net namespace Otherwise the pmtu will be incorrect. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-27 16:18:23 +02:00
Gao feng	de1389b116	netfilter: xt_TCPMSS: Get mtu only if clamp-mss-to-pmtu is specified This patch refactors the code to skip tcpmss_reverse_mtu if no clamp-mss-to-pmtu is specified. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-27 16:17:59 +02:00
holger@eitzenberger.org	b21613aeb6	netfilter: nf_ct_sip: extend RCU read lock in set_expected_rtp_rtcp() Currently set_expected_rtp_rtcp() in the SIP helper uses rcu_dereference() two times to access two different NAT hook functions. However, only the first one is protected by the RCU reader lock, but the 2nd isn't. Fix it by extending the RCU protected area. This is more a cosmetic thing since we rely on all netfilter hooks being rcu_read_lock()ed by nf_hook_slow() in many places anyways, as Patrick McHardy clarified. Signed-off-by: Holger Eitzenberger <holger.eitzenberger@sophos.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-27 16:17:47 +02:00
Ansis Atteka	703133de33	ip: generate unique IP identificator if local fragmentation is allowed If local fragmentation is allowed, then ip_select_ident() and ip_select_ident_more() need to generate unique IDs to ensure correct defragmentation on the peer. For example, if IPsec (tunnel mode) has to encrypt large skbs that have local_df bit set, then all IP fragments that belonged to different ESP datagrams would have used the same identificator. If one of these IP fragments would get lost or reordered, then peer could possibly stitch together wrong IP fragments that did not belong to the same datagram. This would lead to a packet loss or data corruption. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-19 14:11:15 -04:00
Julian Anastasov	d1ee4fea0b	ipvs: stats should not depend on CPU 0 When reading percpu stats we need to properly reset the sum when CPU 0 is not present in the possible mask. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-09-18 14:40:20 -05:00
Julian Anastasov	742617b176	ipvs: do not use dest after ip_vs_dest_put in LBLCR commit `c5549571f9` ("ipvs: convert lblcr scheduler to rcu") allows RCU readers to use dest after calling ip_vs_dest_put(). In the corner case it can race with ip_vs_dest_trash_expire() which can release the dest while it is being returned to the RCU readers as scheduling result. To fix the problem do not allow e->dest to be replaced and defer the ip_vs_dest_put() call by using RCU callback. Now e->dest does not need to be RCU pointer. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-09-18 14:39:39 -05:00
Julian Anastasov	2f3d771a35	ipvs: do not use dest after ip_vs_dest_put in LBLC commit `c2a4ffb70e` ("ipvs: convert lblc scheduler to rcu") allows RCU readers to use dest after calling ip_vs_dest_put(). In the corner case it can race with ip_vs_dest_trash_expire() which can release the dest while it is being returned to the RCU readers as scheduling result. To fix the problem do not allow en->dest to be replaced and defer the ip_vs_dest_put() call by using RCU callback. Now en->dest does not need to be RCU pointer. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-09-18 14:39:09 -05:00
Julian Anastasov	bcbde4c0a7	ipvs: make the service replacement more robust commit `578bc3ef1e` ("ipvs: reorganize dest trash") added IP_VS_DEST_STATE_REMOVING flag and RCU callback named ip_vs_dest_wait_readers() to keep dests and services after removal for at least a RCU grace period. But we have the following corner cases: - we can not reuse the same dest if its service is removed while IP_VS_DEST_STATE_REMOVING is still set because another dest removal in the first grace period can not extend this period. It can happen when ipvsadm -C && ipvsadm -R is used. - dest->svc can be replaced but ip_vs_in_stats() and ip_vs_out_stats() have no explicit read memory barriers when accessing dest->svc. It can happen that dest->svc was just freed (replaced) while we use it to update the stats. We solve the problems as follows: - IP_VS_DEST_STATE_REMOVING is removed and we ensure a fixed idle period for the dest (IP_VS_DEST_TRASH_PERIOD). idle_start will remember when for first time after deletion we noticed dest->refcnt=0. Later, the connections can grab a reference while in RCU grace period but if refcnt becomes 0 we can safely free the dest and its svc. - dest->svc becomes RCU pointer. As result, we add explicit RCU locking in ip_vs_in_stats() and ip_vs_out_stats(). - __ip_vs_unbind_svc is renamed to __ip_vs_svc_put(), it now can free the service immediately or after a RCU grace period. dest->svc is not set to NULL anymore. As result, unlinked dests and their services are freed always after IP_VS_DEST_TRASH_PERIOD period, unused services are freed after a RCU grace period. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-09-18 14:39:03 -05:00
Simon Kirby	c16526a7b9	ipvs: fix overflow on dest weight multiply Schedulers such as lblc and lblcr require the weight to be as high as the maximum number of active connections. In commit `b552f7e3a9` ("ipvs: unify the formula to estimate the overhead of processing connections"), the consideration of inactconns and activeconns was cleaned up to always count activeconns as 256 times more important than inactconns. In cases where 3000 or more connections are expected, a weight of 3000 * 256 * 3000 connections overflows the 32-bit signed result used to determine if rescheduling is required. On amd64, this merely changes the multiply and comparison instructions to 64-bit. On x86, a 64-bit result is already present from imull, so only a few more comparison instructions are emitted. Signed-off-by: Simon Kirby <sim@hostway.ca> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-09-18 14:38:53 -05:00
Gao feng	0a0d80eb39	netfilter: nfnetlink_queue: use network skb for sequence adjustment Instead of the netlink skb. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-17 13:05:12 +02:00
Oliver Smith	2cf55125c6	netfilter: ipset: Fix serious failure in CIDR tracking This fixes a serious bug affecting all hash types with a net element - specifically, if a CIDR value is deleted such that none of the same size exist any more, all larger (less-specific) values will then fail to match. Adding back any prefix with a CIDR equal to or more specific than the one deleted will fix it. Steps to reproduce: ipset -N test hash:net ipset -A test 1.1.0.0/16 ipset -A test 2.2.2.0/24 ipset -T test 1.1.1.1 #1.1.1.1 IS in set ipset -D test 2.2.2.0/24 ipset -T test 1.1.1.1 #1.1.1.1 IS NOT in set This is due to the fact that the nets counter was unconditionally decremented prior to the iteration that shifts up the entries. Now, we first check if there is a proceeding entry and if not, decrement it and return. Otherwise, we proceed to iterate and then zero the last element, which, in most cases, will already be zero. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-16 20:36:09 +02:00
Jozsef Kadlecsik	169faa2e19	netfilter: ipset: Validate the set family and not the set type family at swapping This closes netfilter bugzilla #843, reported by Quentin Armitage. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-16 20:36:05 +02:00
Jozsef Kadlecsik	0f1799ba1a	netfilter: ipset: Consistent userspace testing with nomatch flag The "nomatch" commandline flag should invert the matching at testing, similarly to the --return-nomatch flag of the "set" match of iptables. Until now it worked with the elements with "nomatch" flag only. From now on it works with elements without the flag too, i.e: # ipset n test hash:net # ipset a test 10.0.0.0/24 nomatch # ipset t test 10.0.0.1 10.0.0.1 is NOT in set test. # ipset t test 10.0.0.1 nomatch 10.0.0.1 is in set test. # ipset a test 192.168.0.0/24 # ipset t test 192.168.0.1 192.168.0.1 is in set test. # ipset t test 192.168.0.1 nomatch 192.168.0.1 is NOT in set test. Before the patch the results were ... # ipset t test 192.168.0.1 192.168.0.1 is in set test. # ipset t test 192.168.0.1 nomatch 192.168.0.1 is in set test. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-16 20:35:55 +02:00
Jozsef Kadlecsik	55524c219a	netfilter: ipset: Skip really non-first fragments for IPv6 when getting port/protocol Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-16 20:33:44 +02:00
David S. Miller	1a5bbfc3d6	netfilter: Fix build errors with xt_socket.c As reported by Randy Dunlap: ==================== when CONFIG_IPV6=m and CONFIG_NETFILTER_XT_MATCH_SOCKET=y: net/built-in.o: In function `socket_mt6_v1_v2': xt_socket.c:(.text+0x51b55): undefined reference to `udp6_lib_lookup' net/built-in.o: In function `socket_mt_init': xt_socket.c:(.init.text+0x1ef8): undefined reference to `nf_defrag_ipv6_enable' ==================== Like several other modules under net/netfilter/ we have to have a dependency "IPV6 disabled or set compatibly with this module" clause. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 14:38:03 -04:00
Phil Oester	1205e1fa61	netfilter: xt_TCPMSS: correct return value in tcpmss_mangle_packet In commit `b396966c4` (netfilter: xt_TCPMSS: Fix missing fragmentation handling), I attempted to add safe fragment handling to xt_TCPMSS. However, Andy Padavan of Project N56U correctly points out that returning XT_CONTINUE in this function does not work. The callers (tcpmss_tg[46]) expect to receive a value of 0 in order to return XT_CONTINUE. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-04 14:20:03 +02:00
Patrick McHardy	f4de4c89d8	netfilter: synproxy_core: fix warning in __nf_ct_ext_add_length() With CONFIG_NETFILTER_DEBUG we get the following warning during SYNPROXY init: [ 80.558906] WARNING: CPU: 1 PID: 4833 at net/netfilter/nf_conntrack_extend.c:80 __nf_ct_ext_add_length+0x217/0x220 [nf_conntrack]() The reason is that the conntrack template is set to confirmed before adding the extension and it is invalid to add extensions to already confirmed conntracks. Fix by adding the extensions before setting the conntrack to confirmed. Reported-by: Jesper Dangaard Brouer <jesper.brouer@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-04 11:43:36 +02:00
Florian Westphal	b7e092c05b	netfilter: ctnetlink: fix uninitialized variable net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_nfqueue_attach_expect': 'helper' may be used uninitialized in this function It was only initialized in if CTA_EXPECT_HELP_NAME attribute was present, it must be NULL otherwise. Problem added recently in `bd077937` (netfilter: nfnetlink_queue: allow to attach expectations to conntracks). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:28:19 +02:00
Patrick McHardy	48b1de4c11	netfilter: add SYNPROXY core/target Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy core with common functions and an address family specific target. The SYNPROXY receives the connection request from the client, responds with a SYN/ACK containing a SYN cookie and announcing a zero window and checks whether the final ACK from the client contains a valid cookie. It then establishes a connection to the original destination and, if successful, sends a window update to the client with the window size announced by the server. Support for timestamps, SACK, window scaling and MSS options can be statically configured as target parameters if the features of the server are known. If timestamps are used, the timestamp value sent back to the client in the SYN/ACK will be different from the real timestamp of the server. In order to now break PAWS, the timestamps are translated in the direction server->client. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:27:54 +02:00
Patrick McHardy	41d73ec053	netfilter: nf_conntrack: make sequence number adjustments usuable without NAT Split out sequence number adjustments from NAT and move them to the conntrack core to make them usable for SYN proxying. The sequence number adjustment information is moved to a seperate extend. The extend is added to new conntracks when a NAT mapping is set up for a connection using a helper. As a side effect, this saves 24 bytes per connection with NAT in the common case that a connection does not have a helper assigned. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:26:48 +02:00
David S. Miller	89d5e23210	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Conflicts: net/netfilter/nf_conntrack_proto_tcp.c The conflict had to do with overlapping changes dealing with fixing the use of an "s32" to hold the value returned by NAT_OFFSET(). Pablo Neira Ayuso says: ==================== The following batch contains Netfilter/IPVS updates for your net-next tree. More specifically, they are: * Trivial typo fix in xt_addrtype, from Phil Oester. * Remove net_ratelimit in the conntrack logging for consistency with other logging subsystem, from Patrick McHardy. * Remove unneeded includes from the recently added xt_connlabel support, from Florian Westphal. * Allow to update conntracks via nfqueue, don't need NFQA_CFG_F_CONNTRACK for this, from Florian Westphal. * Remove tproxy core, now that we have socket early demux, from Florian Westphal. * A couple of patches to refactor conntrack event reporting to save a good bunch of lines, from Florian Westphal. * Fix missing locking in NAT sequence adjustment, it did not manifested in any known bug so far, from Patrick McHardy. * Change sequence number adjustment variable to 32 bits, to delay the possible early overflow in long standing connections, also from Patrick. * Comestic cleanups for IPVS, from Dragos Foianu. * Fix possible null dereference in IPVS in the SH scheduler, from Daniel Borkmann. * Allow to attach conntrack expectations via nfqueue. Before this patch, you had to use ctnetlink instead, thus, we save the conntrack lookup. * Export xt_rpfilter and xt_HMARK header files, from Nicolas Dichtel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:30:54 -07:00
David S. Miller	2ff1cf12c9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2013-08-16 15:37:26 -07:00
Pablo Neira Ayuso	bd07793705	netfilter: nfnetlink_queue: allow to attach expectations to conntracks This patch adds the capability to attach expectations via nfnetlink_queue. This is required by conntrack helpers that trigger expectations based on the first packet seen like the TFTP and the DHCPv6 user-space helpers. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-13 16:32:10 +02:00
Pablo Neira Ayuso	0ef71ee1a5	netfilter: ctnetlink: refactor ctnetlink_create_expect This patch refactors ctnetlink_create_expect by spliting it in two chunks. As a result, we have a new function ctnetlink_alloc_expect to allocate and to setup the expectation from ctnetlink. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-13 15:48:20 +02:00
Yuchung Cheng	356d7d88e0	netfilter: nf_conntrack: fix tcp_in_window for Fast Open Currently the conntrack checks if the ending sequence of a packet falls within the observed receive window. However it does so even if it has not observe any packet from the remote yet and uses an uninitialized receive window (td_maxwin). If a connection uses Fast Open to send a SYN-data packet which is dropped afterward in the network. The subsequent SYNs retransmits will all fail this check and be discarded, leading to a connection timeout. This is because the SYN retransmit does not contain data payload so end == initial sequence number (isn) + 1 sender->td_end == isn + syn_data_len receiver->td_maxwin == 0 The fix is to only apply this check after td_maxwin is initialized. Reported-by: Michael Chan <mcfchan@stanford.edu> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-10 18:36:22 +02:00
Florian Westphal	c655bc6896	netfilter: nf_conntrack: don't send destroy events from iterator Let nf_ct_delete handle delivery of the DESTROY event. Based on earlier patch from Pablo Neira. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-09 12:03:33 +02:00
Daniel Borkmann	54e35cc523	ipvs: ip_vs_sh: ip_vs_sh_get_port: check skb_header_pointer for NULL skb_header_pointer could return NULL, so check for it as we do it everywhere else in ipvs code. This fixes a coverity warning. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-08-07 08:57:57 +09:00
Dragos Foianu	70e3ca79cd	ipvs: fixed spacing at for statements found using checkpatch.pl Signed-off-by: Dragos Foianu <dragos.foianu@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-08-06 15:09:58 +09:00
Dan Carpenter	e4d091d7bf	netfilter: nfnetlink_{log,queue}: fix information leaks in netlink message These structs have a "_pad" member. Also the "phw" structs have an 8 byte "hw_addr[]" array but sometimes only the first 6 bytes are initialized. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-05 17:36:04 +02:00
Florian Westphal	d8b3bfc253	netfilter: tproxy: fix build with IP6_NF_IPTABLES=n after commit `93742cf` (netfilter: tproxy: remove nf_tproxy_core.h) CONFIG_IPV6=y CONFIG_IP6_NF_IPTABLES=n gives us: net/netfilter/xt_TPROXY.c: In function 'nf_tproxy_get_sock_v6': net/netfilter/xt_TPROXY.c:178:4: error: implicit declaration of function 'inet6_lookup_listener' Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-05 12:57:38 +02:00
David S. Miller	0e76a3a587	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Merge net into net-next to setup some infrastructure Eric Dumazet needs for usbnet changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-03 21:36:46 -07:00
Pablo Neira Ayuso	a206bcb3b0	netfilter: xt_TCPOPTSTRIP: fix possible off by one access Fix a possible off by one access since optlen() touches opt[offset+1] unsafely when i == tcp_hdrlen(skb) - 1. This patch replaces tcp_hdrlen() by the local variable tcp_hdrlen that stores the TCP header length, to save some cycles. Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-01 11:45:15 +02:00
Pablo Neira Ayuso	71ffe9c77d	netfilter: xt_TCPMSS: fix handling of malformed TCP header and options Make sure the packet has enough room for the TCP header and that it is not malformed. While at it, store tcph->doff*4 in a variable, as it is used several times. This patch also fixes a possible off by one in case of malformed TCP options. Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-01 11:42:53 +02:00
Patrick McHardy	12e7ada385	netfilter: nf_nat: use per-conntrack locking for sequence number adjustments Get rid of the global lock and use per-conntrack locks for protecting the sequencen number adjustment data. Additionally saves one lock/unlock operation for every TCP packet. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-07-31 19:54:59 +02:00
Patrick McHardy	2d89c68ac7	netfilter: nf_nat: change sequence number adjustments to 32 bits Using 16 bits is too small, when many adjustments happen the offsets might overflow and break the connection. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-07-31 19:54:51 +02:00
Patrick McHardy	0658cdc8f3	netfilter: nf_nat: fix locking in nf_nat_seq_adjust() nf_nat_seq_adjust() needs to grab nf_nat_seqofs_lock to protect against concurrent changes to the sequence adjustment data. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-07-31 19:54:24 +02:00
Florian Westphal	02982c27ba	netfilter: nf_conntrack: remove duplicate code in ctnetlink ctnetlink contains copy-paste code from death_by_timeout. In order to avoid changing both places in upcoming event delivery patch, export death_by_timeout functionality and use it in the ctnetlink code. Based on earlier patch from Pablo Neira. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-07-31 18:51:23 +02:00
Florian Westphal	93742cf8af	netfilter: tproxy: remove nf_tproxy_core.h We've removed nf_tproxy_core.ko, so also remove its header. The lookup helpers are split and then moved to tproxy target/socket match. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-07-31 18:43:45 +02:00
Florian Westphal	fd158d79d3	netfilter: tproxy: remove nf_tproxy_core, keep tw sk assigned to skb The module was "permanent", due to the special tproxy skb->destructor. Nowadays we have tcp early demux and its sock_edemux destructor in networking core which can be used instead. Thanks to early demux changes the input path now also handles "skb->sk is tw socket" correctly, so this no longer needs the special handling introduced with commit `d503b30bd6` (netfilter: tproxy: do not assign timewait sockets to skb->sk). Thus: - move assign_sock function to where its needed - don't prevent timewait sockets from being assigned to the skb - remove nf_tproxy_core. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-07-31 16:39:40 +02:00
Florian Westphal	957bec3685	netfilter: nf_queue: relax NFQA_CT attribute check Allow modifying attributes of the conntrack associated with a packet without first requesting ct data via CFG_F_CONNTRACK or extra nfnetlink_conntrack socket. Also remove unneded rcu_read_lock; the entire function is already protected by rcu. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-07-31 16:39:29 +02:00
Florian Westphal	5813a8eb47	netfilter: connlabels: remove unneeded includes leftovers from the (never merged) v1 patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-07-31 16:39:18 +02:00
Patrick McHardy	312a0c16c1	netfilter: nf_conntrack: constify sk_buff argument to nf_ct_attach() Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-07-31 16:37:38 +02:00
Phil Oester	5774c94ace	netfilter: xt_addrtype: fix trivial typo Fix typo in error message. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-07-31 16:36:25 +02:00
Joe Stringer	024ec3deac	net/sctp: Refactor SCTP skb checksum computation This patch consolidates the SCTP checksum calculation code from various places to a single new function, sctp_compute_cksum(skb, offset). Signed-off-by: Joe Stringer <joe@wand.net.nz> Reviewed-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-27 20:07:15 -07:00
Eric Dumazet	baf60efa58	netfilter: xt_socket: fix broken v0 support commit `681f130f39` ("netfilter: xt_socket: add XT_SOCKET_NOWILDCARD flag") added a potential NULL dereference if an old iptables package uses v0 of the match. Fix this by removing the test on @info in fast path. IPv6 can remove the test as well, as it uses v1 or v2. Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-07-15 11:15:21 +02:00
Pablo Neira Ayuso	f09eca8db0	netfilter: ctnetlink: fix incorrect NAT expectation dumping nf_ct_expect_alloc leaves unset the expectation NAT fields. However, ctnetlink_exp_dump_expect expects them to be zeroed in case they are not used, which may not be the case. This results in dumping the NAT tuple of the expectation when it should not. Fix it by zeroing the NAT fields of the expectation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-07-15 11:14:51 +02:00
David S. Miller	0c1072ae02	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/freescale/fec_main.c drivers/net/ethernet/renesas/sh_eth.c net/ipv4/gre.c The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list) and the splitting of the gre.c code into seperate files. The FEC conflict was two sets of changes adding ethtool support code in an "!CONFIG_M5272" CPP protected block. Finally the sh_eth.c conflict was between one commit add bits set in the .eesr_err_check mask whilst another commit removed the .tx_error_check member and assignments. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-03 14:55:13 -07:00
Florian Westphal	496e4ae7dc	netfilter: nf_queue: add NFQA_SKB_CSUM_NOTVERIFIED info flag The common case is that TCP/IP checksums have already been verified, e.g. by hardware (rx checksum offload), or conntrack. Userspace can use this flag to determine when the checksum has not been validated yet. If the flag is set, this doesn't necessarily mean that the packet has an invalid checksum, e.g. if NIC doesn't support rx checksum. Userspace that sucessfully enabled NFQA_CFG_F_GSO queue feature flag can infer that IP/TCP checksum has already been validated if either the SKB_INFO attribute is not present or the NFQA_SKB_CSUM_NOTVERIFIED flag is unset. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-06-30 18:15:48 +02:00
Julian Anastasov	4d0c875dcc	ipvs: add sync_persist_mode flag Add sync_persist_mode flag to reduce sync traffic by syncing only persistent templates. Signed-off-by: Julian Anastasov <ja@ssi.bg> Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-06-26 18:01:46 +09:00
Alexander Frolkin	eba3b5a787	ipvs: SH fallback and L4 hashing By default the SH scheduler rejects connections that are hashed onto a realserver of weight 0. This patch adds a flag to make SH choose a different realserver in this case, instead of rejecting the connection. The patch also adds a flag to make SH include the source port (TCP, UDP, SCTP) in the hash as well as the source address. This basically allows for deterministic round-robin load balancing (i.e., where any director in a cluster of directors with identical config will send the same packet the same way). The flags are service flags (IP_VS_SVC_F_SCHED*) so that these options can be set per service. They are set using a new option to ipvsadm. Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-06-26 18:01:46 +09:00
Julian Anastasov	acaac5d8bb	ipvs: drop SCTP connections depending on state Drop SCTP connections under load (dropentry context) depending on the protocol state, just like for TCP: INIT conns are dropped immediately, established are dropped randomly while connections in progress or shutdown are skipped. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-06-26 18:01:46 +09:00
Julian Anastasov	61e7c420b4	ipvs: replace the SCTP state machine Convert the SCTP state table, so that it is more readable. Change the states to be according to the diagram in RFC 2960 and add more states suitable for middle box. Still, such change in states adds incompatibility if systems in sync setup include this change and others do not include it. With this change we also have proper transitions in INPUT-ONLY mode (DR/TUN) where we see packets only from client. Now we should not switch to 10-second CLOSED state at a time when we should stay in ESTABLISHED state. The short names for states are because we have 16-char space in ipvsadm and 11-char limit for the connection list format. It is a sequence of the TCP implementation where the longest state name is ESTABLISHED. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-06-26 18:01:46 +09:00
Alexander Frolkin	c6c96c1883	ipvs: sloppy TCP and SCTP This adds support for sloppy TCP and SCTP modes to IPVS. When enabled (sysctls net.ipv4.vs.sloppy_tcp and net.ipv4.vs.sloppy_sctp), allows IPVS to create connection state on any packet, not just a TCP SYN (or SCTP INIT). This allows connections to fail over from one IPVS director to another mid-flight. Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-06-26 18:01:46 +09:00
Julian Anastasov	bba54de5bd	ipvs: provide iph to schedulers Before now the schedulers needed access only to IP addresses and it was easy to get them from skb by using ip_vs_fill_iph_addr_only. New changes for the SH scheduler will need the protocol and ports which is difficult to get from skb for the IPv6 case. As we have all the data in the iph structure, to avoid the same slow lookups provide the iph to schedulers. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-06-26 18:01:45 +09:00
Florian Westphal	797a7d66d2	netfilter: ctnetlink: send event when conntrack label was modified commit `0ceabd8387` (netfilter: ctnetlink: deliver labels to userspace) sets the event bit when we raced with another packet, instead of raising the event bit when the label bit is set for the first time. commit `9b21f6a909` (netfilter: ctnetlink: allow userspace to modify labels) forgot to update the event mask in the "conntrack already exists" case. Both issues result in CTA_LABELS attribute not getting included in the conntrack event. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-06-24 11:32:56 +02:00
Balazs Peter Odor	5aed93875c	netfilter: nf_nat_sip: fix mangling In (b20ab9c netfilter: nf_ct_helper: better logging for dropped packets) there were some missing brackets around the logging information, thus always returning drop. Closes https://bugzilla.kernel.org/show_bug.cgi?id=60061 Signed-off-by: Balazs Peter Odor <balazs@obiserver.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-06-24 11:32:40 +02:00
Eric Dumazet	681f130f39	netfilter: xt_socket: add XT_SOCKET_NOWILDCARD flag xt_socket module can be a nice replacement to conntrack module in some cases (SYN filtering for example) But it lacks the ability to match the 3rd packet of TCP handshake (ACK coming from the client). Add a XT_SOCKET_NOWILDCARD flag to disable the wildcard mechanism. The wildcard is the legacy socket match behavior, that ignores LISTEN sockets bound to INADDR_ANY (or ipv6 equivalent) iptables -I INPUT -p tcp --syn -j SYN_CHAIN iptables -I INPUT -m socket --nowildcard -j ACCEPT Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-06-20 20:28:49 +02:00
Florian Westphal	6547a22187	netfilter: nf_conntrack: avoid large timeout for mid-stream pickup When loose tracking is enabled (default), non-syn packets cause creation of new conntracks in established state with default timeout for established state (5 days). This causes the table to fill up with UNREPLIED when the 'new ack' packet happened to be the last-ack of a previous, already timed-out connection. Consider: A 192.168.x.52792 > 10.184.y.80: F, 426:426(0) ack 9237 win 255 B 10.184.y.80 > 192.168.x.52792: ., ack 427 win 123 <61 second pause> C 10.184.y.80 > 192.168.x.52792: F, 9237:9237(0) ack 427 win 123 D 192.168.x.52792 > 10.184.y.80: ., ack 9238 win 255 B moves conntrack to CLOSE_WAIT and will kill it after 60 second timeout, C is ignored (FIN set), but last packet (D) causes new ct with 5-days timeout. Use UNACK timeout (5 minutes) instead to get rid of these entries sooner when in ESTABLISHED state without having seen traffic in both directions. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-06-20 11:20:13 +02:00
Daniel Borkmann	130ffbc263	netfilter: check return code from nla_parse_tested These are the only calls under net/ that do not check nla_parse_nested() for its error code, but simply continue execution. If parsing of netlink attributes fails, we should return with an error instead of continuing. In nearly all of these calls we have a policy attached, that is being type verified during nla_parse_nested(), which we would miss checking for otherwise. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-06-20 11:20:13 +02:00
David S. Miller	d98cae64e4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/wireless/ath/ath9k/Kconfig drivers/net/xen-netback/netback.c net/batman-adv/bat_iv_ogm.c net/wireless/nl80211.c The ath9k Kconfig conflict was a change of a Kconfig option name right next to the deletion of another option. The xen-netback conflict was overlapping changes involving the handling of the notify list in xen_netbk_rx_action(). Batman conflict resolution provided by Antonio Quartulli, basically keep everything in both conflict hunks. The nl80211 conflict is a little more involved. In 'net' we added a dynamic memory allocation to nl80211_dump_wiphy() to fix a race that Linus reported. Meanwhile in 'net-next' the handlers were converted to use pre and post doit handlers which use a flag to determine whether to hold the RTNL mutex around the operation. However, the dump handlers to not use this logic. Instead they have to explicitly do the locking. There were apparent bugs in the conversion of nl80211_dump_wiphy() in that we were not dropping the RTNL mutex in all the return paths, and it seems we very much should be doing so. So I fixed that whilst handling the overlapping changes. To simplify the initial returns, I take the RTNL mutex after we try to allocate 'tb'. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-19 16:49:39 -07:00
Julian Anastasov	06f3d7f973	ipvs: SCTP ports should be writable in ICMP packets Make sure that SCTP ports are writable when embedded in ICMP from client, so that ip_vs_nat_icmp can translate them safely. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-06-19 09:53:52 +09:00
Joe Perches	fe2c6338fd	net: Convert uses of typedef ctl_table to struct ctl_table Reduce the uses of this unnecessary typedef. Done via perl script: $ git grep --name-only -w ctl_table net \| \ xargs perl -p -i -e '\ sub trim { my ($local) = @_; $local =~ s/(^\s+\|\s+$)//g; return $local; } \ s/\b(?<!struct\s)ctl_table\b(\s\\s*\|\s+\w+)/"struct ctl_table " . trim($1)/ge' Reflow the modified lines that now exceed 80 columns. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-13 02:36:09 -07:00
Phil Oester	b396966c46	netfilter: xt_TCPMSS: Fix missing fragmentation handling Similar to commit `bc6bcb59` ("netfilter: xt_TCPOPTSTRIP: fix possible mangling beyond packet boundary"), add safe fragment handling to xt_TCPMSS. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-06-12 11:06:19 +02:00
Phil Oester	70d19f805f	netfilter: xt_TCPMSS: Fix IPv6 default MSS too As a followup to commit `409b545a` ("netfilter: xt_TCPMSS: Fix violation of RFC879 in absence of MSS option"), John Heffner points out that IPv6 has a higher MTU than IPv4, and thus a higher minimum MSS. Update TCPMSS target to account for this, and update RFC comment. While at it, point to more recent reference RFC1122 instead of RFC879. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-06-12 11:04:41 +02:00
Eric Dumazet	45203a3b38	net_sched: add 64bit rate estimators struct gnet_stats_rate_est contains u32 fields, so the bytes per second field can wrap at 34360Mbit. Add a new gnet_stats_rate_est64 structure to get 64bit bps/pps fields, and switch the kernel to use this structure natively. This structure is dumped to user space as a new attribute : TCA_STATS_RATE_EST64 Old tc command will now display the capped bps (to 34360Mbit), instead of wrapped values, and updated tc command will display correct information. Old tc command output, after patch : eric:~# tc -s -d qd sh dev lo qdisc pfifo 8001: root refcnt 2 limit 1000p Sent 80868245400 bytes 1978837 pkt (dropped 0, overlimits 0 requeues 0) rate 34360Mbit 189696pps backlog 0b 0p requeues 0 This patch carefully reorganizes "struct Qdisc" layout to get optimal performance on SMP. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-11 02:51:03 -07:00
Pablo Neira Ayuso	ed82c43732	netfilter: xt_TCPOPTSTRIP: don't use tcp_hdr() In (`bc6bcb5` netfilter: xt_TCPOPTSTRIP: fix possible mangling beyond packet boundary), the use of tcp_hdr was introduced. However, we cannot assume that skb->transport_header is set for non-local packets. Cc: Florian Westphal <fw@strlen.de> Reported-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-06-11 01:55:07 +02:00
Dan Carpenter	a8241c6351	ipvs: info leak in __ip_vs_get_dest_entries() The entry struct has a 2 byte hole after ->port and another 4 byte hole after ->stats.outpkts. You must have CAP_NET_ADMIN in your namespace to hit this information leak. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-06-10 14:53:00 +02:00
Pablo Neira Ayuso	7b8dfe289f	netfilter: nfnetlink_queue: fix missing HW protocol Locally generated IPv4 and IPv6 traffic gets skb->protocol unset, thus passing zero. ip6tables -I OUTPUT -j NFQUEUE libmnl/examples/netfilter# ./nf-queue 0 & ping6 ::1 packet received (id=1 hw=0x0000 hook=3) ^^^^^^ Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-06-07 18:55:20 +02:00
David S. Miller	143554ace8	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Conflicts: net/netfilter/nf_log.c The conflict in nf_log.c is that in 'net' we added CONFIG_PROC_FS protection around foo_proc_entry() calls to fix a build failure, whereas in Pablo's tree a guard if() test around a call is remove_proc_entry() was removed. Trivially resolved. Pablo Neira Ayuso says: ==================== The following patchset contains the first batch of Netfilter/IPVS updates for your net-next tree, they are: * Three patches with improvements and code refactorization for nfnetlink_queue, from Florian Westphal. * FTP helper now parses replies without brackets, as RFC1123 recommends, from Jeff Mahoney. * Rise a warning to tell everyone about ULOG deprecation, NFLOG has been already in the kernel tree for long time and supersedes the old logging over netlink stub, from myself. * Don't panic if we fail to load netfilter core framework, just bail out instead, from myself. * Add cond_resched_rcu, used by IPVS to allow rescheduling while walking over big hashtables, from Simon Horman. * Change type of IPVS sysctl_sync_qlen_max sysctl to avoid possible overflow, from Zhang Yanfei. * Use strlcpy instead of strncpy to skip zeroing of already initialized area to write the extension names in ebtables, from Chen Gang. * Use already existing per-cpu notrack object from xt_CT, from Eric Dumazet. * Save explicit socket lookup in xt_socket now that we have early demux, also from Eric Dumazet. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-06 01:03:06 -07:00
David S. Miller	6bc19fb82d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Merge 'net' bug fixes into 'net-next' as we have patches that will build on top of them. This merge commit includes a change from Emil Goode (emilgoode@gmail.com) that fixes a warning that would have been introduced by this merge. Specifically it fixes the pingv6_ops method ipv6_chk_addr() to add a "const" to the "struct net_device *dev" argument and likewise update the dummy_ipv6_chk_addr() declaration. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-05 16:37:30 -07:00
Phil Oester	409b545ac1	netfilter: xt_TCPMSS: Fix violation of RFC879 in absence of MSS option The clamp-mss-to-pmtu option of the xt_TCPMSS target can cause issues connecting to websites if there was no MSS option present in the original SYN packet from the client. In these cases, it may add a MSS higher than the default specified in RFC879. Fix this by never setting a value > 536 if no MSS option was specified by the client. This closes netfilter's bugzilla #662. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-06-05 13:59:22 +02:00
Florian Westphal	7f87712c01	netfilter: nfnetlink_queue: only add CAP_LEN attr when needed CAP_LEN contains the size of the network packet we're queueing to userspace, i.e. normally it is the same as the NFQA_PAYLOAD attribute len. Include it only in the unlikely case when NFQA_PAYLOAD is truncated due to copy_range limitations. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-06-05 12:40:54 +02:00
Florian Westphal	9cefbbc9c8	netfilter: nfnetlink_queue: cleanup copy_range usage For every packet queued, we check if configured copy_range is 0, and treat that as 'copy entire packet'. We can move this check to the queue configuration, and can set copy_range appropriately. Also, convert repetitive '0xffff - NLA_HDRLEN' to a macro. [ queue initialization still used 0xffff, although its harmless since the initial setting is overwritten on queue config ] Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-06-05 12:40:28 +02:00
Pablo Neira Ayuso	37bc4f8dfa	netfilter: nfnetlink_cttimeout: fix incomplete dumping of objects Fix broken incomplete object dumping if the list of objects does not fit into one single netlink message. Reported-by: Gabriel Lazar <Gabriel.Lazar@com.utcluj.ro> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-06-05 12:36:37 +02:00
Pablo Neira Ayuso	991a6b735f	netfilter: nfnetlink_acct: fix incomplete dumping of objects Fix broken incomplete object dumping if the list of objects does not fit into one single netlink message. Reported-by: Gabriel Lazar <Gabriel.Lazar@com.utcluj.ro> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-06-05 12:36:36 +02:00
Simon Horman	938177e9f3	netfilter: Correct calculation using skb->tail and skb-network_header This corrects an regression introduced by "net: Use 16bits for *_headers fields of struct skbuff" when NET_SKBUFF_DATA_USES_OFFSET is not set. In that case skb->tail will be a pointer whereas skb->network_header will be an offset from head. This is corrected by using wrappers that ensure that calculations are always made using pointers. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-31 16:38:25 -07:00
Jan Beulich	a70b9641e6	ipvs: ip_vs_sh: fix build kfree_rcu() requires offsetof(..., rcu_head) < 4096, which can get violated with a sufficiently high CONFIG_IP_VS_SH_TAB_BITS. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-05-29 17:50:39 +02:00
Michal Kubeček	d660164d79	netfilter: xt_LOG: fix mark logging for IPv6 packets In dump_ipv6_packet(), the "recurse" parameter is zero only if dumping contents of a packet embedded into an ICMPv6 error message. Therefore we want to log packet mark if recurse is non-zero, not when it is zero. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-05-29 12:29:18 +02:00
Jiri Pirko	351638e7de	net: pass info struct via netdevice notifier So far, only net_device * could be passed along with netdevice notifier event. This patch provides a possibility to pass custom structure able to provide info that event listener needs to know. Signed-off-by: Jiri Pirko <jiri@resnulli.us> v2->v3: fix typo on simeth shortened dev_getter shortened notifier_info struct name v1->v2: fix notifier_call parameter in call_netdevice_notifier() Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-28 13:11:01 -07:00
Jeff Mahoney	4e7dba99c9	netfilter: Implement RFC 1123 for FTP conntrack The FTP conntrack code currently only accepts the following format for the 227 response for PASV: 227 Entering Passive Mode (148,100,81,40,31,161). It doesn't accept the following format from an obscure server: 227 Data transfer will passively listen to 67,218,99,134,50,144 From RFC 1123: The format of the 227 reply to a PASV command is not well standardized. In particular, an FTP client cannot assume that the parentheses shown on page 40 of RFC-959 will be present (and in fact, Figure 3 on page 43 omits them). Therefore, a User-FTP program that interprets the PASV reply must scan the reply for the first digit of the host and port numbers. This patch adds support for the RFC 1123 clarification by: - Allowing a search filter to specify NUL as the terminator so that try_number will return successfully if the array of numbers has been filled when an unexpected character is encountered. - Using space as the separator for the 227 reply and then scanning for the first digit of the number sequence. The number sequence is parsed out using the existing try_rfc959 but with a NUL terminator. References: https://bugzilla.novell.com/show_bug.cgi?id=466279 References: http://bugzilla.netfilter.org/show_bug.cgi?id=574 Reported-by: Mark Post <mpost@novell.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: netfilter-devel@vger.kernel.org Cc: netfilter@vger.kernel.org Cc: coreteam@netfilter.org Cc: netdev@vger.kernel.org Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-05-27 13:32:43 +02:00
Grzegorz Lyczba	dc7b3eb900	ipvs: Fix reuse connection if real server is dead Expire cached connection for new TCP/SCTP connection if real server is down. Otherwise, IPVS uses the dead server for the reused connection, instead of a new working one. Signed-off-by: Grzegorz Lyczba <grzegorz.lyczba@gmail.com> Acked-by: Hans Schillstrom <hans@schillstrom.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-05-27 13:00:45 +02:00
Florian Westphal	9d5242b192	netfilter: nfnetlink_queue: avoid peer_portid test The portid is set to NETLINK_CB(skb).portid at create time. The run-time check will always be false. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-05-26 22:05:11 +02:00
Zhang Yanfei	0799567424	ipvs: change type of netns_ipvs->sysctl_sync_qlen_max This member of struct netns_ipvs is calculated from nr_free_buffer_pages so change its type to unsigned long in case of overflow. Also, type of its related proc var sync_qlen_max and the return type of function sysctl_sync_qlen_max() should be changed to unsigned long, too. Besides, the type of ipvs_master_sync_state->sync_queue_len should be changed to unsigned long accordingly. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Julian Anastasov <ja@ssi.bg> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-05-26 08:17:33 +09:00
Simon Horman	a38e5e230e	ipvs: use cond_resched_rcu() helper when walking connections This avoids the situation where walking of a large number of connections may prevent scheduling for a long time while also avoiding excessive calls to rcu_read_unlock() and rcu_read_lock(). Note that in the case of !CONFIG_PREEMPT_RCU this will add a call to cond_resched(). Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-05-23 14:23:18 +02:00
Pablo Neira Ayuso	6d11cfdba5	netfilter: don't panic on error while walking through the init path Don't panic if we hit an error while adding the nf_log or pernet netfilter support, just bail out. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Gao feng <gaofeng@cn.fujitsu.com>	2013-05-23 14:22:30 +02:00
Florian Westphal	2a7851bffb	netfilter: add nf_ipv6_ops hook to fix xt_addrtype with IPv6 Quoting https://bugzilla.netfilter.org/show_bug.cgi?id=812: [ ip6tables -m addrtype ] When I tried to use in the nat/PREROUTING it messes up the routing cache even if the rule didn't matched at all. [..] If I remove the --limit-iface-in from the non-working scenario, so just use the -m addrtype --dst-type LOCAL it works! This happens when LOCAL type matching is requested with --limit-iface-in, and the default ipv6 route is via the interface the packet we test arrived on. Because xt_addrtype uses ip6_route_output, the ipv6 routing implementation creates an unwanted cached entry, and the packet won't make it to the real/expected destination. Silently ignoring --limit-iface-in makes the routing work but it breaks rule matching (--dst-type LOCAL with limit-iface-in is supposed to only match if the dst address is configured on the incoming interface; without --limit-iface-in it will match if the address is reachable via lo). The test should call ipv6_chk_addr() instead. However, this would add a link-time dependency on ipv6. There are two possible solutions: 1) Revert the commit that moved ipt_addrtype to xt_addrtype, and put ipv6 specific code into ip6t_addrtype. 2) add new "nf_ipv6_ops" struct to register pointers to ipv6 functions. While the former might seem preferable, Pablo pointed out that there are more xt modules with link-time dependeny issues regarding ipv6, so lets go for 2). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-05-23 11:58:55 +02:00
Eric Dumazet	00028aa370	netfilter: xt_socket: use IP early demux With IP early demux added in linux-3.6, we perform TCP lookup in IP layer before iptables hooks. We can avoid doing a second lookup in xt_socket. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-05-23 11:09:53 +02:00
Eric Dumazet	27e7190efd	netfilter: xt_CT: optimize XT_CT_NOTRACK The percpu untracked ct are not currently used for XT_CT_NOTRACK. xt_ct_tg_check()/xt_ct_target() provides a single ct. Thats not optimal as the ct->ct_general.use cache line will bounce among cpus. Use the intended [1] thing : xt_ct_target() should select the percpu object. [1] Refs : commit `5bfddbd46a` ("netfilter: nf_conntrack: IPS_UNTRACKED bit") commit `b3c5163fe0` ("netfilter: nf_conntrack: per_cpu untracking") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-05-23 11:09:29 +02:00
Pablo Neira Ayuso	bc6bcb59dd	netfilter: xt_TCPOPTSTRIP: fix possible mangling beyond packet boundary This target assumes that tcph->doff is well-formed, that may be well not the case. Add extra sanity checkings to avoid possible crash due to read/write out of the real packet boundary. After this patch, the default action on malformed TCP packets is to drop them. Moreover, fragments are skipped. Reported-by: Rafal Kupka <rkupka@telemetry.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-05-16 17:35:53 +02:00
Hans Schillstrom	8cdb46da06	netfilter: log: netns NULL ptr bug when calling from conntrack Since (`69b34fb` netfilter: xt_LOG: add net namespace support for xt_LOG), we hit this: [ 4224.708977] BUG: unable to handle kernel NULL pointer dereference at 0000000000000388 [ 4224.709074] IP: [<ffffffff8147f699>] ipt_log_packet+0x29/0x270 when callling log functions from conntrack both in and out are NULL i.e. the net pointer is invalid. Adding struct net *net in call to nf_logfn() will secure that there always is a vaild net ptr. Reported as netfilter's bugzilla bug 818: https://bugzilla.netfilter.org/show_bug.cgi?id=818 Reported-by: Ronald <ronald645@gmail.com> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-05-15 14:11:07 +02:00
Pablo Neira Ayuso	e778f56e2f	netfilter: nf_{log,queue}: fix compilation without CONFIG_PROC_FS This patch fixes the following compilation error: net/netfilter/nf_log.c:373:38: error: 'struct netns_nf' has no member named 'proc_netfilter' if procfs is not set. The netns support for nf_log, nfnetlink_log and nfnetlink_queue_core requires CONFIG_PROC_FS in the removal path of their respective /proc interface since net->nf.proc_netfilter is undefined in that case. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Gao feng <gaofeng@cn.fujitsu.com>	2013-05-06 12:28:01 +02:00
Linus Torvalds	20b4fb4852	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS updates from Al Viro, Misc cleanups all over the place, mainly wrt /proc interfaces (switch create_proc_entry to proc_create(), get rid of the deprecated create_proc_read_entry() in favor of using proc_create_data() and seq_file etc). 7kloc removed. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits) don't bother with deferred freeing of fdtables proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h proc: Make the PROC_I() and PDE() macros internal to procfs proc: Supply a function to remove a proc entry by PDE take cgroup_open() and cpuset_open() to fs/proc/base.c ppc: Clean up scanlog ppc: Clean up rtas_flash driver somewhat hostap: proc: Use remove_proc_subtree() drm: proc: Use remove_proc_subtree() drm: proc: Use minor->index to label things, not PDE->name drm: Constify drm_proc_list[] zoran: Don't print proc_dir_entry data in debug reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show() proc: Supply an accessor for getting the data from a PDE's parent airo: Use remove_proc_subtree() rtl8192u: Don't need to save device proc dir PDE rtl8187se: Use a dir under /proc/net/r8180/ proc: Add proc_mkdir_data() proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} proc: Move PDE_NET() to fs/proc/proc_net.c ...	2013-05-01 17:51:54 -07:00
David Howells	271a15eabe	proc: Supply PDE attribute setting accessor functions Supply accessor functions to set attributes in proc_dir_entry structs. The following are supplied: proc_set_size() and proc_set_user(). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> cc: linuxppc-dev@lists.ozlabs.org cc: linux-media@vger.kernel.org cc: netdev@vger.kernel.org cc: linux-wireless@vger.kernel.org cc: linux-pci@vger.kernel.org cc: netfilter-devel@vger.kernel.org cc: alsa-devel@alsa-project.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-01 17:29:18 -04:00
Linus Torvalds	73287a43cc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "Highlights (1721 non-merge commits, this has to be a record of some sort): 1) Add 'random' mode to team driver, from Jiri Pirko and Eric Dumazet. 2) Make it so that any driver that supports configuration of multiple MAC addresses can provide the forwarding database add and del calls by providing a default implementation and hooking that up if the driver doesn't have an explicit set of handlers. From Vlad Yasevich. 3) Support GSO segmentation over tunnels and other encapsulating devices such as VXLAN, from Pravin B Shelar. 4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton. 5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita Dukkipati. 6) In the PHY layer, allow supporting wake-on-lan in situations where the PHY registers have to be written for it to be configured. Use it to support wake-on-lan in mv643xx_eth. From Michael Stapelberg. 7) Significantly improve firewire IPV6 support, from YOSHIFUJI Hideaki. 8) Allow multiple packets to be sent in a single transmission using network coding in batman-adv, from Martin Hundebøll. 9) Add support for T5 cxgb4 chips, from Santosh Rastapur. 10) Generalize the VXLAN forwarding tables so that there is more flexibility in configurating various aspects of the endpoints. From David Stevens. 11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver, from Dmitry Kravkov. 12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo Neira Ayuso. 13) Start adding networking selftests. 14) In situations of overload on the same AF_PACKET fanout socket, or per-cpu packet receive queue, minimize drop by distributing the load to other cpus/fanouts. From Willem de Bruijn and Eric Dumazet. 15) Add support for new payload offset BPF instruction, from Daniel Borkmann. 16) Convert several drivers over to mdoule_platform_driver(), from Sachin Kamat. 17) Provide a minimal BPF JIT image disassembler userspace tool, from Daniel Borkmann. 18) Rewrite F-RTO implementation in TCP to match the final specification of it in RFC4138 and RFC5682. From Yuchung Cheng. 19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear you like netlink, so I implemented netlink dumping of netlink sockets.") From Andrey Vagin. 20) Remove ugly passing of rtnetlink attributes into rtnl_doit functions, from Thomas Graf. 21) Allow userspace to be able to see if a configuration change occurs in the middle of an address or device list dump, from Nicolas Dichtel. 22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes Frederic Sowa. 23) Increase accuracy of packet length used by packet scheduler, from Jason Wang. 24) Beginning set of changes to make ipv4/ipv6 fragment handling more scalable and less susceptible to overload and locking contention, from Jesper Dangaard Brouer. 25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_() instead. From Hong Zhiguo. 26) Optimize route usage in IPVS by avoiding reference counting where possible, from Julian Anastasov. 27) Convert IPVS schedulers to RCU, also from Julian Anastasov. 28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger Eitzenberger. 29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG, nfnetlink_log, and nfnetlink_queue. From Gao feng. 30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa. 31) Support several new r8169 chips, from Hayes Wang. 32) Support tokenized interface identifiers in ipv6, from Daniel Borkmann. 33) Use usbnet_link_change() helper in USB net driver, from Ming Lei. 34) Add 802.1ad vlan offload support, from Patrick McHardy. 35) Support mmap() based netlink communication, also from Patrick McHardy. 36) Support HW timestamping in mlx4 driver, from Amir Vadai. 37) Rationalize AF_PACKET packet timestamping when transmitting, from Willem de Bruijn and Daniel Borkmann. 38) Bring parity to what's provided by /proc/net/packet socket dumping and the info provided by netlink socket dumping of AF_PACKET sockets. From Nicolas Dichtel. 39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin Poirier" git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits) filter: fix va_list build error af_unix: fix a fatal race with bit fields bnx2x: Prevent memory leak when cnic is absent bnx2x: correct reading of speed capabilities net: sctp: attribute printl with __printf for gcc fmt checks netlink: kconfig: move mmap i/o into netlink kconfig netpoll: convert mutex into a semaphore netlink: Fix skb ref counting. net_sched: act_ipt forward compat with xtables mlx4_en: fix a build error on 32bit arches Revert "bnx2x: allow nvram test to run when device is down" bridge: avoid OOPS if root port not found drivers: net: cpsw: fix kernel warn on cpsw irq enable sh_eth: use random MAC address if no valid one supplied 3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA) tg3: fix to append hardware time stamping flags unix/stream: fix peeking with an offset larger than data in queue unix/dgram: fix peeking with an offset larger than data in queue unix/dgram: peek beyond 0-sized skbs openvswitch: Remove unneeded ovs_netdev_get_ifindex() ...	2013-05-01 14:08:52 -07:00
David S. Miller	58717686cf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c drivers/net/ethernet/emulex/benet/be.h include/net/tcp.h net/mac802154/mac802154.h Most conflicts were minor overlapping stuff. The be2net driver brought in some fixes that added __vlan_put_tag calls, which in net-next take an additional argument. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-30 03:55:20 -04:00
Akinobu Mita	ca3d41a588	net/netfilter: rename random32() to prandom_u32() Use preferable function name which implies using a pseudo-random number generator. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:43 -07:00
Al Viro	14b872f02e	xt_hashlimit: allocate a copy of name explicitly, don't rely on procfs guts Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-04-29 15:41:49 -04:00
Simon Horman	eee1d5a147	sctp: Correct type and usage of sctp_end_cksum() Change the type of the crc32 parameter of sctp_end_cksum() from __be32 to __u32 to reflect that fact that it is passed to cpu_to_le32(). There are five in-tree users of sctp_end_cksum(). The following four had warnings flagged by sparse which are no longer present with this change. net/netfilter/ipvs/ip_vs_proto_sctp.c:sctp_nat_csum() net/netfilter/ipvs/ip_vs_proto_sctp.c:sctp_csum_check() net/sctp/input.c:sctp_rcv_checksum() net/sctp/output.c:sctp_packet_transmit() The fifth user is net/netfilter/nf_nat_proto_sctp.c:sctp_manip_pkt(). It has been updated to pass a __u32 instead of a __be32, the value in question was already calculated in cpu byte-order. net/netfilter/nf_nat_proto_sctp.c:sctp_manip_pkt() has also been updated to assign the return value of sctp_end_cksum() directly to a variable of type __le32, matching the type of the return value. Previously the return value was assigned to a variable of type __be32 and then that variable was finally assigned to another variable of type __le32. Problems flagged by sparse. Compile and sparse tested only. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:09:08 +02:00
Florian Westphal	00bd1cc24a	netfilter: nfnetlink_queue: avoid expensive gso segmentation and checksum fixup Userspace can now indicate that it can cope with larger-than-mtu sized packets and packets that have invalid ipv4/tcp checksums. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:09:07 +02:00
Florian Westphal	7237190df8	netfilter: nfnetlink_queue: add skb info attribute Once we allow userspace to receive gso/gro packets, userspace needs to be able to determine when checksums appear to be broken, but are not. NFQA_SKB_CSUMNOTREADY means 'checksums will be fixed in kernel later, pretend they are ok'. NFQA_SKB_GSO could be used for statistics, or to determine when packet size exceeds mtu. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:09:06 +02:00
Florian Westphal	a5fedd43d5	netfilter: move skb_gso_segment into nfnetlink_queue module skb_gso_segment is expensive, so it would be nice if we could avoid it in the future. However, userspace needs to be prepared to receive larger-than-mtu-packets (which will also have incorrect l3/l4 checksums), so we cannot simply remove it. The plan is to add a per-queue feature flag that userspace can set when binding the queue. The problem is that in nf_queue, we only have a queue number, not the queue context/configuration settings. This patch should have no impact other than the skb_gso_segment call now being in a function that has access to the queue config data. A new size attribute in nf_queue_entry is needed so nfnetlink_queue can duplicate the entry of the gso skb when segmenting the skb while also copying the route key. The follow up patch adds switch to disable skb_gso_segment when queue config says so. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:09:05 +02:00
Florian Westphal	4bd60443cc	netfilter: nf_queue: move device refcount bump to extra function required by future patch that will need to duplicate the nf_queue_entry, bumping refcounts of the copy. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:09:04 +02:00
Jozsef Kadlecsik	6e01781d1c	netfilter: ipset: set match: add support to match the counters The new revision of the set match supports to match the counters and to suppress updating the counters at matching too. At the set:list types, the updating of the subcounters can be suppressed as well. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:09:03 +02:00
Jozsef Kadlecsik	de76303c5a	netfilter: ipset: The list:set type with counter support Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:09:02 +02:00
Jozsef Kadlecsik	00d71b270e	netfilter: ipset: The hash types with counter support Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:09:01 +02:00
Jozsef Kadlecsik	f48d19db12	netfilter: ipset: The bitmap types with counter support Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:09:00 +02:00
Jozsef Kadlecsik	34d666d489	netfilter: ipset: Introduce the counter extension in the core Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:08:59 +02:00
Jozsef Kadlecsik	7d47d972b5	netfilter: ipset: list:set type using the extension interface Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:08:58 +02:00
Jozsef Kadlecsik	5d50e1d883	netfilter: ipset: Hash types using the unified code base Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:08:57 +02:00
Jozsef Kadlecsik	1feab10d7e	netfilter: ipset: Unified hash type generation Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:08:56 +02:00
Jozsef Kadlecsik	b0da3905bb	netfilter: ipset: Bitmap types using the unified code base Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:08:55 +02:00
Jozsef Kadlecsik	4d73de38c2	netfilter: ipset: Unified bitmap type generation Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:08:54 +02:00
Jozsef Kadlecsik	075e64c041	netfilter: ipset: Introduce extensions to elements in the core Introduce extensions to elements in the core and prepare timeout as the first one. This patch also modifies the em_ipset classifier to use the new extension struct layout. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:08:54 +02:00
Jozsef Kadlecsik	8672d4d1a0	netfilter: ipset: Move often used IPv6 address masking function to header file Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:08:50 +02:00
Jozsef Kadlecsik	43c56e595b	netfilter: ipset: Make possible to test elements marked with nomatch Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:08:44 +02:00
Hans Schillstrom	f7a1dd6e3a	ipvs: ip_vs_sip_fill_param() BUG: bad check of return value The reason for this patch is crash in kmemdup caused by returning from get_callid with uniialized matchoff and matchlen. Removing Zero check of matchlen since it's done by ct_sip_get_header() BUG: unable to handle kernel paging request at ffff880457b5763f IP: [<ffffffff810df7fc>] kmemdup+0x2e/0x35 PGD 27f6067 PUD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: xt_state xt_helper nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle xt_connmark xt_conntrack ip6_tables nf_conntrack_ftp ip_vs_ftp nf_nat xt_tcpudp iptable_mangle xt_mark ip_tables x_tables ip_vs_rr ip_vs_lblcr ip_vs_pe_sip ip_vs nf_conntrack_sip nf_conntrack bonding igb i2c_algo_bit i2c_core CPU 5 Pid: 0, comm: swapper/5 Not tainted 3.9.0-rc5+ #5 /S1200KP RIP: 0010:[<ffffffff810df7fc>] [<ffffffff810df7fc>] kmemdup+0x2e/0x35 RSP: 0018:ffff8803fea03648 EFLAGS: 00010282 RAX: ffff8803d61063e0 RBX: 0000000000000003 RCX: 0000000000000003 RDX: 0000000000000003 RSI: ffff880457b5763f RDI: ffff8803d61063e0 RBP: ffff8803fea03658 R08: 0000000000000008 R09: 0000000000000011 R10: 0000000000000011 R11: 00ffffffff81a8a3 R12: ffff880457b5763f R13: ffff8803d67f786a R14: ffff8803fea03730 R15: ffffffffa0098e90 FS: 0000000000000000(0000) GS:ffff8803fea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff880457b5763f CR3: 0000000001a0c000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper/5 (pid: 0, threadinfo ffff8803ee18c000, task ffff8803ee18a480) Stack: ffff8803d822a080 000000000000001c ffff8803fea036c8 ffffffffa000937a ffffffff81f0d8a0 000000038135fdd5 ffff880300000014 ffff880300110000 ffffffff150118ac ffff8803d7e8a000 ffff88031e0118ac 0000000000000000 Call Trace: <IRQ> [<ffffffffa000937a>] ip_vs_sip_fill_param+0x13a/0x187 [ip_vs_pe_sip] [<ffffffffa007b209>] ip_vs_sched_persist+0x2c6/0x9c3 [ip_vs] [<ffffffff8107dc53>] ? __lock_acquire+0x677/0x1697 [<ffffffff8100972e>] ? native_sched_clock+0x3c/0x7d [<ffffffff8100972e>] ? native_sched_clock+0x3c/0x7d [<ffffffff810649bc>] ? sched_clock_cpu+0x43/0xcf [<ffffffffa007bb1e>] ip_vs_schedule+0x181/0x4ba [ip_vs] ... Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-29 11:35:30 -04:00
David S. Miller	d3734b0496	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== The following patchset contains fixes for recently applied Netfilter/IPVS updates to the net-next tree, most relevantly they are: * Fix sparse warnings introduced in the RCU conversion, from Julian Anastasov. * Fix wrong endianness in the size field of IPVS sync messages, from Simon Horman. * Fix missing if checking in nf_xfrm_me_harder, from Dan Carpenter. * Fix off by one access in the IPVS SCTP tracking code, again from Dan Carpenter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-25 00:53:40 -04:00
Dan Carpenter	e7e6f6300f	netfilter: nf_nat: missing condition in nf_xfrm_me_harder() This if statement was accidentally dropped in (`aaa795a` netfilter: nat: propagate errors from xfrm_me_harder()) so now it returns unconditionally. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-25 01:58:16 +02:00
Simon Horman	38561437d0	ipvs: Use network byte order for sync message size struct ip_vs_sync_mesg and ip_vs_sync_mesg_v0 are both sent across the wire and used internally to store IPVS synchronisation messages. Up until now the scheme used has been to convert the size field to network byte order before sending a message on the wire and convert it to host byte order when sending a message. This patch changes that scheme to always treat the field as being network byte order. This seems appropriate as the structure is sent across the wire. And by consistently treating the field has network byte order it is now possible to take advantage of sparse to flag any future miss-use. Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:06 +09:00
Dan Carpenter	4bfbfbf91f	ipvs: off by one in set_sctp_state() The sctp_events[] come from sch->type in set_sctp_state(). They are between 0-255 so that means we need 256 elements in the array. I believe that because of how the code is aligned there is normally a hole after sctp_events[] so this patch doesn't actually change anything. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:06 +09:00
Simon Horman	9c37510b8f	ipvs: Use min3() in ip_vs_dbg_callid() There are two motivations for this: 1. It improves readability to my eyes 2. Using nested min() calls results in a shadowed _min1 variable, which is a bit untidy. Sparse complained about this. I have also replaced (size_t)64 with a variable of type size_t and value 64. This also improves readability to my eyes. Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:06 +09:00
Simon Horman	9fd0fa7ac3	ipvs: Avoid shadowing net variable in ip_vs_leave() Flagged by sparse. Compile and sparse tested only. Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:06 +09:00
Julian Anastasov	0a925864c1	ipvs: fix sparse warnings for some parameters Some service fields are in network order: - netmask: used once in network order and also as prefix len for IPv6 - port Other parameters are in host order: - struct ip_vs_flags: flags and mask moved between user and kernel only - sync state: moved between user and kernel only - syncid: sent over network as single octet Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:05 +09:00
Julian Anastasov	f33c8b94fd	ipvs: fix sparse warnings in lblc and lblcr kbuild test robot reports for sparse warnings in commits `c2a4ffb70e` ("ipvs: convert lblc scheduler to rcu") and `c5549571f9` ("ipvs: convert lblcr scheduler to rcu"). Fix it by removing extra __rcu annotation. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:05 +09:00
Julian Anastasov	371990eeec	ipvs: fix the remaining sparse warnings in ip_vs_ctl.c - RCU annotations for ip_vs_info_seq_start and _stop - __percpu for cpustats - properly dereference svc->pe in ip_vs_genl_fill_service Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:05 +09:00
Julian Anastasov	7cf2eb7bcc	ipvs: fix sparse warnings for ip_vs_conn listing kbuild test robot reports for sparse warnings in commit `088339a57d` ("ipvs: convert connection locking"): net/netfilter/ipvs/ip_vs_conn.c:962:13: warning: context imbalance in 'ip_vs_conn_array' - wrong count at exit include/linux/rcupdate.h:326:30: warning: context imbalance in 'ip_vs_conn_seq_next' - unexpected unlock include/linux/rcupdate.h:326:30: warning: context imbalance in 'ip_vs_conn_seq_stop' - unexpected unlock Fix it by running ip_vs_conn_array under RCU lock to avoid conditional locking and by adding proper RCU annotations. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:05 +09:00
Julian Anastasov	d717bb2a98	ipvs: properly dereference dest_dst in ip_vs_forget_dev Use rcu_dereference_protected to resolve sparse warning, found by kbuild test robot: net/netfilter/ipvs/ip_vs_ctl.c:1464:35: warning: dereference of noderef expression Problem from commit `026ace060d` ("ipvs: optimize dst usage for real server") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:05 +09:00
David S. Miller	6e0895c2ea	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/emulex/benet/be_main.c drivers/net/ethernet/intel/igb/igb_main.c drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c include/net/scm.h net/batman-adv/routing.c net/ipv4/tcp_input.c The e{uid,gid} --> {uid,gid} credentials fix conflicted with the cleanup in net-next to now pass cred structs around. The be2net driver had a bug fix in 'net' that overlapped with the VLAN interface changes by Patrick McHardy in net-next. An IGB conflict existed because in 'net' the build_skb() support was reverted, and in 'net-next' there was a comment style fix within that code. Several batman-adv conflicts were resolved by making sure that all calls to batadv_is_my_mac() are changed to have a new bat_priv first argument. Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO rewrite in 'net-next', mostly overlapping changes. Thanks to Stephen Rothwell and Antonio Quartulli for help with several of these merge resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-22 20:32:51 -04:00
David S. Miller	95a06161e6	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== The following patchset contains a small batch of Netfilter updates for your net-next tree, they are: * Three patches that provide more accurate error reporting to user-space, instead of -EPERM, in IPv4/IPv6 netfilter re-routing code and NAT, from Patrick McHardy. * Update copyright statements in Netfilter filters of Patrick McHardy, from himself. * Add Kconfig dependency on the raw/mangle tables to the rpfilter, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 17:55:29 -04:00
Patrick McHardy	3ab1f683bf	nfnetlink: add support for memory mapped netlink Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:58:36 -04:00
Patrick McHardy	ec464e5dc5	netfilter: rename netlink related "pid" variables to "portid" Get rid of the confusing mix of pid and portid and use portid consistently for all netlink related socket identities. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:58:36 -04:00
Patrick McHardy	e32123e598	netlink: rename ssk to sk in struct netlink_skb_params Memory mapped netlink needs to store the receiving userspace socket when sending from the kernel to userspace. Rename 'ssk' to 'sk' to avoid confusion. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:57:56 -04:00
Jozsef Kadlecsik	5add189a12	netfilter: ipset: bitmap:ip,mac: fix listing with timeout The type when timeout support was enabled, could not list all elements, just the first ones which could fit into one netlink message: it just did not continue listing after the first message. Reported-by: Yoann JUET <yoann.juet@univ-nantes.fr> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Tested-by: Yoann JUET <yoann.juet@univ-nantes.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-18 23:40:41 +02:00
Patrick McHardy	f229f6ce48	netfilter: add my copyright statements Add copyright statements to all netfilter files which have had significant changes done by myself in the past. Some notes: - nf_conntrack_ecache.c was incorrectly attributed to Rusty and Netfilter Core Team when it got split out of nf_conntrack_core.c. The copyrights even state a date which lies six years before it was written. It was written in 2005 by Harald and myself. - net/ipv{4,6}/netfilter.c, net/netfitler/nf_queue.c were missing copyright statements. I've added the copyright statement from net/netfilter/core.c, where this code originated - for nf_conntrack_proto_tcp.c I've also added Jozsef, since I didn't want it to give the wrong impression Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-18 20:27:55 +02:00
Florian Westphal	c2d421e171	netfilter: nf_nat: fix race when unloading protocol modules following oops was reported: RIP: 0010:[<ffffffffa03227f2>] [<ffffffffa03227f2>] nf_nat_cleanup_conntrack+0x42/0x70 [nf_nat] RSP: 0018:ffff880202c63d40 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8801ac7bec28 RCX: ffff8801d0eedbe0 RDX: dead000000200200 RSI: 0000000000000011 RDI: ffffffffa03265b8 [..] Call Trace: [..] [<ffffffffa02febed>] destroy_conntrack+0xbd/0x110 [nf_conntrack] Happens when a conntrack timeout expires right after first part of the nat cleanup has completed (bysrc hash removal), but before part 2 has completed (re-initialization of nat area). [ destroy callback tries to delete bysrc again ] Patrick suggested to just remove the affected conntracks -- the connections won't work properly anyway without nat transformation. So, lets do that. Reported-by: CAI Qian <caiqian@redhat.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-12 11:46:31 +02:00
Jozsef Kadlecsik	6eb4c7e96e	netfilter: ipset: hash:net: nomatch flag not excluded on set resize If a resize is triggered the nomatch flag is not excluded at hashing, which leads to the element missed at lookup in the resized set. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-09 21:04:16 +02:00
Jozsef Kadlecsik	02f815cb6d	netfilter: ipset: list:set: fix reference counter update The last element can be replaced or pushed off and in both cases the reference counter must be updated. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-09 21:02:11 +02:00
Al Viro	d9dda78bad	procfs: new helper - PDE_DATA(inode) The only part of proc_dir_entry the code outside of fs/proc really cares about is PDE(inode)->data. Provide a helper for that; static inline for now, eventually will be moved to fs/proc, along with the knowledge of struct proc_dir_entry layout. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-04-09 14:13:32 -04:00
Patrick McHardy	aaa795ad25	netfilter: nat: propagate errors from xfrm_me_harder() Propagate errors from ip_xfrm_me_harder() instead of returning EPERM in all cases. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-08 12:34:01 +02:00
David S. Miller	d978a6361a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/nfc/microread/mei.c net/netfilter/nfnetlink_queue_core.c Pull in 'net' to get Eric Biederman's AF_UNIX fix, upon which some cleanups are going to go on-top. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-07 18:37:01 -04:00
David S. Miller	d16658206a	Merge branch 'master' of git://1984.lsi.us.es/nf-next Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter and IPVS updates for your net-next tree, most relevantly they are: * Add net namespace support to NFLOG, ULOG and ebt_ulog and NFQUEUE. The LOG and ebt_log target has been also adapted, but they still depend on the syslog netnamespace that seems to be missing, from Gao Feng. * Don't lose indications of congestion in IPv6 fragmentation handling, from Hannes Frederic Sowa.i * IPVS conversion to use RCU, including some code consolidation patches and optimizations, also some from Julian Anastasov. * cpu fanout support for NFQUEUE, from Holger Eitzenberger. * Better error reporting to userspace when dropping packets from all our _*_[xfrm\|route]_me_harder functions, from Patrick McHardy. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-07 12:22:06 -04:00
Patrick McHardy	3a7b21eaf4	netfilter: nf_ct_sip: don't drop packets with offsets pointing outside the packet Some Cisco phones create huge messages that are spread over multiple packets. After calculating the offset of the SIP body, it is validated to be within the packet and the packet is dropped otherwise. This breaks operation of these phones. Since connection tracking is supposed to be passive, just let those packets pass unmodified and untracked. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-06 14:03:18 +02:00
Pablo Neira Ayuso	12202fa757	netfilter: remove unneeded variable proc_net_netfilter Now that this supports net namespace for nflog and nfqueue, we can remove the global proc_net_netfilter which has no clients anymore. Based on patch from Gao feng. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-05 21:08:11 +02:00
Gao feng	e817961048	netfilter: nfnetlink_queue: add net namespace support for nfnetlink_queue This patch makes /proc/net/netfilter/nfnetlink_queue pernet. Moreover, there's a pernet instance table and lock. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-05 21:08:11 +02:00
Gao feng	5b023fc8d8	netfilter: enable per netns support for nf_loggers After this patch, all nf_loggers support net namespace. Still xt_LOG and ebt_log require syslog netns support. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-05 21:08:10 +02:00
Gao feng	9368a53c47	netfilter: nfnetlink_log: add net namespace support for nfnetlink_log This patch makes /proc/net/netfilter/nfnetlink_log pernet. Moreover, there's a pernet instance table and lock. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-05 21:08:01 +02:00
Gao feng	69b34fb996	netfilter: xt_LOG: add net namespace support for xt_LOG Add pernet support to xt_LOG by means of the new nf_log_set function added in (`30e0c6a` netfilter: nf_log: prepare net namespace support for loggers). Since syslog ns has yet not been implemented, we don't want the containers to DDOS host's syslogd. So only enable ebt_log only from init_net and wait for syslog ns support Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-05 20:58:45 +02:00
Gao feng	30e0c6a6be	netfilter: nf_log: prepare net namespace support for loggers This patch adds netns support to nf_log and it prepares netns support for existing loggers. It is composed of four major changes. 1) nf_log_register has been split to two functions: nf_log_register and nf_log_set. The new nf_log_register is used to globally register the nf_logger and nf_log_set is used for enabling pernet support from nf_loggers. Per netns is not yet complete after this patch, it comes in separate follow up patches. 2) Add net as a parameter of nf_log_bind_pf. Per netns is not yet complete after this patch, it only allows to bind the nf_logger to the protocol family from init_net and it skips other cases. 3) Adapt all nf_log_packet callers to pass netns as parameter. After this patch, this function only works for init_net. 4) Make the sysctl net/netfilter/nf_log pernet. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-05 20:12:54 +02:00
Gao feng	f3c1a44a22	netfilter: make /proc/net/netfilter pernet This patch makes this proc dentry pernet. So far only init_net had a /proc/net/netfilter directory. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-05 19:35:02 +02:00
holger@eitzenberger.org	5c33448c40	netfilter: xt_NFQUEUE: coalesce IPv4 and IPv6 hashing Because rev1 and rev3 of the target share the same hashing generalize it by introduing nfqueue_hash(). Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-02 01:26:10 +02:00
holger@eitzenberger.org	8746ddcf12	netfilter: xt_NFQUEUE: introduce CPU fanout Current NFQUEUE target uses a hash, computed over source and destination address (and other parameters), for steering the packet to the actual NFQUEUE. This, however forgets about the fact that the packet eventually is handled by a particular CPU on user request. If E. g. 1) IRQ affinity is used to handle packets on a particular CPU already (both single-queue or multi-queue case) and/or 2) RPS is used to steer packets to a specific softirq the target easily chooses an NFQUEUE which is not handled by a process pinned to the same CPU. The idea is therefore to use the CPU index for determining the NFQUEUE handling the packet. E. g. when having a system with 4 CPUs, 4 MQ queues and 4 NFQUEUEs it looks like this: +-----+ +-----+ +-----+ +-----+ \|NFQ#0\| \|NFQ#1\| \|NFQ#2\| \|NFQ#3\| +-----+ +-----+ +-----+ +-----+ ^ ^ ^ ^ \| \|NFQUEUE \| \| + + + + +-----+ +-----+ +-----+ +-----+ \|rx-0 \| \|rx-1 \| \|rx-2 \| \|rx-3 \| +-----+ +-----+ +-----+ +-----+ The NFQUEUEs not necessarily have to start with number 0, setups with less NFQUEUEs than packet-handling CPUs are not a problem as well. This patch extends the NFQUEUE target to accept a new NFQ_FLAG_CPU_FANOUT flag. If this is specified the target uses the CPU index for determining the NFQUEUE being used. I have to introduce rev3 for this. The 'flags' are folded into _v2 'bypass'. By changing the way which queue is assigned, I'm able to improve the performance if the processes reading on the NFQUEUs are pinned correctly. Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-02 01:25:44 +02:00
Julian Anastasov	ac69269a45	ipvs: do not disable bh for long time We used a global BH disable in LOCAL_OUT hook. Add _bh suffix to all places that need it and remove the disabling from LOCAL_OUT and sync code. Functions like ip_defrag need protection from BH, so add it. As for nf_nat_mangle_tcp_packet, it needs RCU lock. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-02 00:23:58 +02:00
Julian Anastasov	ceec4c3816	ipvs: convert services to rcu This is the final step in RCU conversion. Things that are removed: - svc->usecnt: now svc is accessed under RCU read lock - svc->inc: and some unused code - ip_vs_bind_pe and ip_vs_unbind_pe: no ability to replace PE - __ip_vs_svc_lock: replaced with RCU - IP_VS_WAIT_WHILE: now readers lookup svcs and dests under RCU and work in parallel with configuration Other changes: - before now, a RCU read-side critical section included the calling of the schedule method, now it is extended to include service lookup - ip_vs_svc_table and ip_vs_svc_fwm_table are now using hlist - svc->pe and svc->scheduler remain to the end (of grace period), the schedulers are prepared for such RCU readers even after done_service is called but they need to use synchronize_rcu because last ip_vs_scheduler_put can happen while RCU read-side critical sections use an outdated svc->scheduler pointer - as planned, update_service is removed - empty services can be freed immediately after grace period. If dests were present, the services are freed from the dest trash code Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-02 00:23:58 +02:00
Julian Anastasov	413c2d04e9	ipvs: convert dests to rcu In previous commits the schedulers started to access svc->destinations with _rcu list traversal primitives because the IP_VS_WAIT_WHILE macro still plays the role of grace period. Now it is time to finish the updating part, i.e. adding and deleting of dests with _rcu suffix before removing the IP_VS_WAIT_WHILE in next commit. We use the same rule for conns as for the schedulers: dests can be searched in RCU read-side critical section where ip_vs_dest_hold can be called by ip_vs_bind_dest. Some things are not perfect, for example, calling functions like ip_vs_lookup_dest from updating code under RCU, just because we use some function both from reader and from updater. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-02 00:23:57 +02:00
Julian Anastasov	ba3a3ce14e	ipvs: convert sched_lock to spin lock As all read_locks are gone spin lock is preferred. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-02 00:23:56 +02:00
Julian Anastasov	ed3ffc4e48	ipvs: do not expect result from done_service This method releases the scheduler state, it can not fail. Such change will help to properly replace the scheduler in following patch. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-02 00:23:56 +02:00
Julian Anastasov	578bc3ef1e	ipvs: reorganize dest trash All dests will go to trash, no exceptions. But we have to use new list node t_list for this, due to RCU changes in following patches. Dests will wait there initial grace period and later all conns and schedulers to put their reference. The dests don't get reference for staying in dest trash as before. As result, we do not load ip_vs_dest_put with extra checks for last refcnt and the schedulers do not need to play games with atomic_inc_not_zero while selecting best destination. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-02 00:23:55 +02:00

... 2 3 4 5 6 ...

2492 commits