alistair23-linux

redonkable

Author	SHA1	Message	Date
Alexander Aring	965e613d29	ieee802154: 6lowpan: fix ARPHRD to ARPHRD_6LOWPAN Currently there exists two interface types with ARPHRD_IEEE802154. These are the 802.15.4 interfaces and 802.15.4 6LoWPAN interfaces. This is more a bug because some userspace applications checks on this value like wireshark. This occurs that wireshark will always try to parse a lowpan interface as 802.15.4 frames. With ARPHRD_6LOWPAN wireshark will parse it as IPv6 frames which is correct. Much applications checks on this value to readout the EUI64 mac address which should be the same for ARPHRD_6LOWPAN. BTLE 6LoWPAN and ieee802154 6LoWPAN will share now the same ARPHRD. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-03-14 17:11:30 +01:00
Eric Dumazet	c8e2c80d7e	inet_diag: fix possible overflow in inet_diag_dump_one_icsk() inet_diag_dump_one_icsk() allocates too small skb. Add inet_sk_attr_size() helper right before inet_sk_diag_fill() so that it can be updated if/when new attributes are added. iproute2/ss currently does not use this dump_one() interface, this might explain nobody noticed this problem yet. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-13 15:54:27 -04:00
Marcel Holtmann	b7cb93e528	Bluetooth: Merge hdev->dbg_flags fields into hdev->dev_flags With the extension of hdev->dev_flags utilizing a bitmap now, the space is no longer restricted. Merge the hdev->dbg_flags into hdev->dev_flags to save space on 64-bit architectures. On 32-bit architectures no size reduction happens. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-03-13 19:28:36 +02:00
Marcel Holtmann	eacb44dff9	Bluetooth: Use DECLARE_BITMAP for hdev->dev_flags field The hdev->dev_flags field has outgrown itself on 32-bit systems. So instead of hacking around it, switch to using DECLARE_BITMAP. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-03-13 18:35:45 +02:00
Herbert Xu	d8bdff59ce	netfilter: Fix potential crash in nft_hash walker When we get back an EAGAIN from rhashtable_walk_next we were treating it as a valid object which obviously doesn't work too well. Luckily this is hard to trigger so it seems nobody has run into it yet. This patch fixes it by redoing the next call when we get an EAGAIN. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-13 12:03:00 +01:00
Marcel Holtmann	238be788fc	Bluetooth: Introduce hci_dev_test_and_set_flag helper macro Instead of manually coding test_and_set_bit on hdev->dev_flags all the time, use hci_dev_test_and_set_flag helper macro. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-03-13 12:09:33 +02:00
Marcel Holtmann	a69d892726	Bluetooth: Introduce hci_dev_test_and_clear_flag helper macro Instead of manually coding test_and_clear_bit on hdev->dev_flags all the time, use hci_dev_test_and_clear_flag helper macro. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-03-13 12:09:32 +02:00
Marcel Holtmann	516018a9c0	Bluetooth: Introduce hci_dev_test_and_change_flag helper macro Instead of manually coding test_and_change_bit on hdev->dev_flags all the time, use hci_dev_test_and_change_flag helper macro. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-03-13 12:09:31 +02:00
Marcel Holtmann	ce05d603af	Bluetooth: Introduce hci_dev_change_flag helper macro Instead of manually coding change_bit on hdev->dev_flags all the time, use hci_dev_change_flag helper macro. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-03-13 12:09:29 +02:00
Marcel Holtmann	a358dc11d8	Bluetooth: Introduce hci_dev_clear_flag helper macro Instead of manually coding clear_bit on hdev->dev_flags all the time, use hci_dev_clear_flag helper macro. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-03-13 12:09:27 +02:00
Marcel Holtmann	a1536da255	Bluetooth: Introduce hci_dev_set_flag helper macro Instead of manually coding set_bit on hdev->dev_flags all the time, use hci_dev_set_flag helper macro. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-03-13 12:09:26 +02:00
Marcel Holtmann	d7a5a11d7f	Bluetooth: Introduce hci_dev_test_flag helper macro Instead of manually coding test_bit on hdev->dev_flags all the time, use hci_dev_test_flag helper macro. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-03-13 12:09:25 +02:00
Marcel Holtmann	cc91cb042c	Bluetooth: Add support connectable advertising setting The patch adds a second advertising setting that allows switching of the controller into connectable mode independent of the global connectable setting. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-03-13 12:07:54 +02:00
Eric W. Biederman	098a697b49	tcp_metrics: Use a single hash table for all network namespaces. Now that all of the operations are safe on a single hash table accross network namespaces, allocate a single global hash table and update the code to use it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-13 01:57:07 -04:00
Eric W. Biederman	04f721c671	tcp_metrics: Rewrite tcp_metrics_flush_all Rewrite tcp_metrics_flush_all so that it can cope with entries from different network namespaces on it's hash chain. This is based on the logic in tcp_metrics_nl_cmd_del for deleting a selection of entries from a tcp metrics hash chain. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-13 01:57:07 -04:00
Eric W. Biederman	8a4bff714f	tcp_metrics: Remove the unused return code from tcp_metrics_flush_all tcp_metrics_flush_all always returns 0. Remove the unnecessary return code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-13 01:57:07 -04:00
Eric W. Biederman	849e8a0ca8	tcp_metrics: Add a field tcpm_net and verify it matches on lookup In preparation for using one tcp metrics hash table for all network namespaces add a field tcpm_net to struct tcp_metrics_block, and verify that field on all hash table lookups. Make the field tcpm_net of type possible_net_t so it takes no space when network namespaces are disabled. Further add a function tm_net to read that field so we can be efficient when network namespaces are disabled and concise the rest of the time. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-13 01:57:07 -04:00
Eric W. Biederman	3e5da62d0b	tcp_metrics: Mix the network namespace into the hash function. In preparation for using one hash table for all network namespaces mix the network namespace into the hash value. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-13 01:57:07 -04:00
Eric W. Biederman	6493517eae	tcp_metrics: panic when tcp_metrics_init fails. There is not a practical way to cleanup during boot so just panic if there is a problem initializing tcp_metrics. That will at least give us a clear place to start debugging if something does go wrong. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-13 01:57:07 -04:00
Michael S. Tsirkin	8051a2a518	9p/trans_virtio: fix hot-unplug On device hot-unplug, 9p/virtio currently will kfree channel while it might still be in use. Of course, it might stay used forever, so it's an extremely ugly hack, but it seems better than use-after-free that we have now. [ Unused variable removed, whitespace cleanup, msg single-lined --RR ] Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-03-13 15:55:41 +10:30
Eric W. Biederman	76fecd8275	mpls: In mpls_egress verify the packet length. Reobert Shearman noticed that mpls_egress is failing to verify that the bytes to be examined are in fact present in the packet before mpls_egress reads those bytes. As suggested by David Miller reduce this to a single pskb_may_pull call so that we don't do unnecessary work in the fast path. Reported-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-12 23:05:04 -04:00
Eric Dumazet	3f66b083a5	inet: introduce ireq_family Before inserting request socks into general hash table, fill their socket family. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-12 22:58:13 -04:00
Eric Dumazet	d4f06873b6	inet: get_openreq4() & get_openreq6() do not need listener ireq->ir_num contains local port, use it. Also, get_openreq4() dumping listen_sk->refcnt makes litle sense. inet_diag_fill_req() can also use ireq->ir_num Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-12 22:58:13 -04:00
Eric Dumazet	41b822c59e	inet: prepare sock_edemux() & sock_gen_put() for new SYN_RECV state sock_edemux() & sock_gen_put() should be ready to cope with request socks. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-12 22:58:13 -04:00
Eric Dumazet	0159dfd3d7	net: add req_prot_cleanup() & req_prot_init() helpers Make proto_register() & proto_unregister() a bit nicer. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-12 22:58:13 -04:00
Eric Dumazet	bd337c581b	ipv6: add missing ireq_net & ir_cookie initializations I forgot to update dccp_v6_conn_request() & cookie_v6_check(). They both need to set ireq->ireq_net and ireq->ir_cookie Lets clear ireq->ir_cookie in inet_reqsk_alloc() Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `33cf7c90fe` ("net: add real socket cookies") Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-12 22:58:12 -04:00
Daniel Borkmann	54720df130	cls_bpf: do eBPF invocation under non-bh RCU lock variant for maps Currently, it is possible in cls_bpf to access eBPF maps only under rcu_read_lock_bh() variants: while on ingress side, that is, handle_ing(), the classifier would be called from __netif_receive_skb_core() under rcu_read_lock(); on egress side, however, it's rcu_read_lock_bh() via __dev_queue_xmit(). This rcu/rcu_bh mix doesn't work together with eBPF maps as they require soley to be called under rcu_read_lock(). eBPF maps could also be shared among various other eBPF programs (possibly even with other eBPF program types, f.e. tracing) and user space processes, so any context is assumed. Therefore, a possible fix for cls_bpf is to wrap/nest eBPF program invocation under non-bh RCU lock variant. Fixes: `e2e9b6541d` ("cls_bpf: add initial eBPF support for programmable classifiers") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-12 18:33:15 -04:00
Alexander Duyck	0b65bd97ba	fib_trie: Provide a deterministic order for fib_alias w/ tables merged This change makes it so that we should always have a deterministic ordering for the main and local aliases within the merged table when two leaves overlap. So for example if we have a leaf with a key of 192.168.254.0. If we previously added two aliases with a prefix length of 24 from both local and main the first entry would be first and the second would be second. When I was coding this I had added a WARN_ON should such a situation occur as I wasn't sure how likely it would be. However this WARN_ON has been triggered so this is something that should be addressed. With this patch the ordering of the aliases is as follows. First they are sorted on prefix length, then on their table ID, then tos, and finally priority. This way what we end up doing is essentially interleaving the two tables on what used to be leaf_info structure boundaries. Fixes: `0ddcf43d5` ("ipv4: FIB Local/MAIN table collapse") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-12 18:26:51 -04:00
Alexander Duyck	3c9e9f7320	fib_trie: Avoid NULL pointer if local table is not allocated The function fib_unmerge assumed the local table had already been allocated. If that is not the case however when custom rules are applied then this can result in a NULL pointer dereference. In order to prevent this we must check the value of the local table pointer and if it is NULL simply return 0 as there is no local table to separate from the main. Fixes: `0ddcf43d5` ("ipv4: FIB Local/MAIN table collapse") Reported-by: Madhu Challa <challa@noironetworks.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-12 18:26:51 -04:00
Eric W. Biederman	0c5c9fb551	net: Introduce possible_net_t Having to say > #ifdef CONFIG_NET_NS > struct net net; > #endif in structures is a little bit wordy and a little bit error prone. Instead it is possible to say: > typedef struct { > #ifdef CONFIG_NET_NS > struct net net; > #endif > } possible_net_t; And then in a header say: > possible_net_t net; Which is cleaner and easier to use and easier to test, as the possible_net_t is always there no matter what the compile options. Further this allows read_pnet and write_pnet to be functions in all cases which is better at catching typos. This change adds possible_net_t, updates the definitions of read_pnet and write_pnet, updates optional struct net * variables that write_pnet uses on to have the type possible_net_t, and finally fixes up the b0rked users of read_pnet and write_pnet. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-12 14:39:40 -04:00
Eric W. Biederman	efd7ef1c19	net: Kill hold_net release_net hold_net and release_net were an idea that turned out to be useless. The code has been disabled since 2008. Kill the code it is long past due. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-12 14:39:40 -04:00
Ian Wilson	78146572b9	netfilter: Zero the tuple in nfnl_cthelper_parse_tuple() nfnl_cthelper_parse_tuple() is called from nfnl_cthelper_new(), nfnl_cthelper_get() and nfnl_cthelper_del(). In each case they pass a pointer to an nf_conntrack_tuple data structure local variable: struct nf_conntrack_tuple tuple; ... ret = nfnl_cthelper_parse_tuple(&tuple, tb[NFCTH_TUPLE]); The problem is that this local variable is not initialized, and nfnl_cthelper_parse_tuple() only initializes two fields: src.l3num and dst.protonum. This leaves all other fields with undefined values based on whatever is on the stack: tuple->src.l3num = ntohs(nla_get_be16(tb[NFCTH_TUPLE_L3PROTONUM])); tuple->dst.protonum = nla_get_u8(tb[NFCTH_TUPLE_L4PROTONUM]); The symptom observed was that when the rpc and tns helpers were added then traffic to port 1536 was being sent to user-space. Signed-off-by: Ian Wilson <iwilson@brocade.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-12 13:07:36 +01:00
Marcel Holtmann	983f9814c0	Bluetooth: Remove two else branches that are not needed The SMP code contains two else branches that are not needed since the successful test will actually leave the function. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-03-12 09:00:48 +02:00
Arnd Bergmann	f862e07cf9	rds: avoid potential stack overflow The rds_iw_update_cm_id function stores a large 'struct rds_sock' object on the stack in order to pass a pair of addresses. This happens to just fit withint the 1024 byte stack size warning limit on x86, but just exceed that limit on ARM, which gives us this warning: net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=] As the use of this large variable is basically bogus, we can rearrange the code to not do that. Instead of passing an rds socket into rds_iw_get_device, we now just pass the two addresses that we have available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly, to create two address structures on the stack there. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-12 00:28:01 -04:00
Willem de Bruijn	3a8dd9711e	sock: fix possible NULL sk dereference in __skb_tstamp_tx Test that sk != NULL before reading sk->sk_tsflags. Fixes: `49ca0d8bfa` ("net-timestamp: no-payload option") Reported-by: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-12 00:09:55 -04:00
Eric Dumazet	c29390c6df	xps: must clear sender_cpu before forwarding John reported that my previous commit added a regression on his router. This is because sender_cpu & napi_id share a common location, so get_xps_queue() can see garbage and perform an out of bound access. We need to make sure sender_cpu is cleared before doing the transmit, otherwise any NIC busy poll enabled (skb_mark_napi_id()) can trigger this bug. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: John <jw@nuclearfallout.net> Bisected-by: John <jw@nuclearfallout.net> Fixes: `2bd82484bb` ("xps: fix xps for stacked devices") Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-11 23:51:18 -04:00
Eric Dumazet	d77c555d32	net: fix CONFIG_NET_NS=n compilation I forgot to use write_pnet() in three locations. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `33cf7c90fe` ("net: add real socket cookies") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-11 23:28:49 -04:00
Lubomir Rintel	c78ba6d64c	ipv6: expose RFC4191 route preference via rtnetlink This makes it possible to retain the route preference when RAs are handled in userspace. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-11 23:28:09 -04:00
Simon Horman	ac70c05b6f	switchdev: correct spelling of notifier in comments Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-11 23:06:39 -04:00
Eric Dumazet	33cf7c90fe	net: add real socket cookies A long standing problem in netlink socket dumps is the use of kernel socket addresses as cookies. 1) It is a security concern. 2) Sockets can be reused quite quickly, so there is no guarantee a cookie is used once and identify a flow. 3) request sock, establish sock, and timewait socks for a given flow have different cookies. Part of our effort to bring better TCP statistics requires to switch to a different allocator. In this patch, I chose to use a per network namespace 64bit generator, and to use it only in the case a socket needs to be dumped to netlink. (This might be refined later if needed) Note that I tried to carry cookies from request sock, to establish sock, then timewait sockets. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Eric Salo <salo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-11 21:55:28 -04:00
Alexey Kodanev	b1cb59cf2e	net: sysctl_net_core: check SNDBUF and RCVBUF for min length sysctl has sysctl.net.core.rmem_/wmem_ parameters which can be set to incorrect values. Given that 'struct sk_buff' allocates from rcvbuf, incorrectly set buffer length could result to memory allocation failures. For example, set them as follows: # sysctl net.core.rmem_default=64 net.core.wmem_default = 64 # sysctl net.core.wmem_default=64 net.core.wmem_default = 64 # ping localhost -s 1024 -i 0 > /dev/null This could result to the following failure: skbuff: skb_over_panic: text:ffffffff81628db4 len:-32 put:-32 head:ffff88003a1cc200 data:ffff88003a1cc200 tail:0xffffffe0 end:0xc0 dev:<NULL> kernel BUG at net/core/skbuff.c:102! invalid opcode: 0000 [#1] SMP ... task: ffff88003b7f5550 ti: ffff88003ae88000 task.ti: ffff88003ae88000 RIP: 0010:[<ffffffff8155fbd1>] [<ffffffff8155fbd1>] skb_put+0xa1/0xb0 RSP: 0018:ffff88003ae8bc68 EFLAGS: 00010296 RAX: 000000000000008d RBX: 00000000ffffffe0 RCX: 0000000000000000 RDX: ffff88003fdcf598 RSI: ffff88003fdcd9c8 RDI: ffff88003fdcd9c8 RBP: ffff88003ae8bc88 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 00000000000002b2 R12: 0000000000000000 R13: 0000000000000000 R14: ffff88003d3f7300 R15: ffff88000012a900 FS: 00007fa0e2b4a840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000d0f7e0 CR3: 000000003b8fb000 CR4: 00000000000006f0 Stack: ffff88003a1cc200 00000000ffffffe0 00000000000000c0 ffffffff818cab1d ffff88003ae8bd68 ffffffff81628db4 ffff88003ae8bd48 ffff88003b7f5550 ffff880031a09408 ffff88003b7f5550 ffff88000012aa48 ffff88000012ab00 Call Trace: [<ffffffff81628db4>] unix_stream_sendmsg+0x2c4/0x470 [<ffffffff81556f56>] sock_write_iter+0x146/0x160 [<ffffffff811d9612>] new_sync_write+0x92/0xd0 [<ffffffff811d9cd6>] vfs_write+0xd6/0x180 [<ffffffff811da499>] SyS_write+0x59/0xd0 [<ffffffff81651532>] system_call_fastpath+0x12/0x17 Code: 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 30 db 91 81 48 89 04 24 31 c0 e8 4f a8 0e 00 <0f> 0b eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 RIP [<ffffffff8155fbd1>] skb_put+0xa1/0xb0 RSP <ffff88003ae8bc68> Kernel panic - not syncing: Fatal exception Moreover, the possible minimum is 1, so we can get another kernel panic: ... BUG: unable to handle kernel paging request at ffff88013caee5c0 IP: [<ffffffff815604cf>] __alloc_skb+0x12f/0x1f0 ... Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-11 21:25:13 -04:00
Alexander Duyck	654eff4516	fib_trie: Only display main table in /proc/net/route When we merged the tries for local and main I had overlooked the iterator for /proc/net/route. As a result it was outputting both local and main when the two tries were merged. This patch resolves that by only providing output for aliases that are actually in the main trie. As a result we should go back to the original behavior which I assume will be necessary to maintain legacy support. Fixes: `0ddcf43d5` ("ipv4: FIB Local/MAIN table collapse") Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-11 21:24:32 -04:00
Florian Fainelli	cd28a1a9ba	net: dsa: fully divert PHY reads/writes if requested In case a PHY is found via Device Tree, and is also flagged by the switch driver as needing indirect reads/writes using the switch driver implemented MDIO bus, make sure that we bind this PHY to the slave MII bus in order for this to happen. Without this, we would succeed in having the PHY driver probe()'s function to use slave MII bus read/write functions, because this is done during dsa_slave_mii_init(), but past that point, the PHY driver would not go through these diverted reads and writes. Fixes: `0d8bcdd383` ("net: dsa: allow for more complex PHY setups") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-11 17:56:29 -04:00
Florian Fainelli	c305c1651c	net: dsa: move PHY setup on DSA MII bus to its own function In preparation for dealing with indirect reads and writes towards certain PHY devices, move the code which deals with binding the PHY device to the slave MII bus created by DSA to its own function: dsa_slave_phy_connect(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-11 17:56:28 -04:00
Alexander Duyck	61f0d861fc	fib_trie: Fix uninitialized variable warning The 0-day kernel test infrastructure reported a use of uninitialized variable warning for local_table due to the fact that the local and main allocations had been swapped from the original setup. This change corrects that by making it so that we free the main table if the local table allocation fails. Fixes: `0ddcf43d5` ("ipv4: FIB Local/MAIN table collapse") Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-11 17:33:44 -04:00
Neal Cardwell	d578e18ce9	tcp: restore 1.5x per RTT limit to CUBIC cwnd growth in congestion avoidance Commit `814d488c61` ("tcp: fix the timid additive increase on stretch ACKs") fixed a bug where tcp_cong_avoid_ai() would either credit a connection with an increase of snd_cwnd_cnt, or increase snd_cwnd, but not both, resulting in cwnd increasing by 1 packet on at most every alternate invocation of tcp_cong_avoid_ai(). Although the commit correctly implemented the CUBIC algorithm, which can increase cwnd by as much as 1 packet per 1 packet ACKed (2x per RTT), in practice that could be too aggressive: in tests on network paths with small buffers, YouTube server retransmission rates nearly doubled. This commit restores CUBIC to a maximum cwnd growth rate of 1 packet per 2 packets ACKed (1.5x per RTT). In YouTube tests this restored retransmit rates to low levels. Testing: This patch has been tested in datacenter netperf transfers and live youtube.com and google.com servers. Fixes: `9cd981dcf1` ("tcp: fix stretch ACK bugs in CUBIC") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-11 16:51:51 -04:00
Neal Cardwell	9949afa42b	tcp: fix tcp_cong_avoid_ai() credit accumulation bug with decreases in w The recent change to tcp_cong_avoid_ai() to handle stretch ACKs introduced a bug where snd_cwnd_cnt could accumulate a very large value while w was large, and then if w was reduced snd_cwnd could be incremented by a large delta, leading to a large burst and high packet loss. This was tickled when CUBIC's bictcp_update() sets "ca->cnt = 100 * cwnd". This bug crept in while preparing the upstream version of `814d488c61`. Testing: This patch has been tested in datacenter netperf transfers and live youtube.com and google.com servers. Fixes: `814d488c61` ("tcp: fix the timid additive increase on stretch ACKs") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-11 16:51:51 -04:00
Sabrina Dubroca	6dede75b7e	fib_trie: call fib_table_flush_external under RTNL Move rtnl_lock() before the call to fib4_rules_exit so that fib_table_flush_external is called under RTNL. Fixes: `104616e74e` ("switchdev: don't support custom ip rules, for now") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-11 16:46:26 -04:00
Robert Shearman	8a08919f43	mpls: Allow mpls_gso and mpls_router to be built as modules CONFIG_MPLS=m doesn't result in a kernel module being built because it applies to the net/mpls directory, rather than to .o files. So revert the MPLS menuitem to being a boolean and make MPLS_GSO and MPLS_ROUTING tristates to allow mpls_gso and mpls_router modules to be produced as desired. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-11 16:38:54 -04:00
Alexander Duyck	0ddcf43d5d	ipv4: FIB Local/MAIN table collapse This patch is meant to collapse local and main into one by converting tb_data from an array to a pointer. Doing this allows us to point the local table into the main while maintaining the same variables in the table. As such the tb_data was converted from an array to a pointer, and a new array called data is added in order to still provide an object for tb_data to point to. In order to track the origin of the fib aliases a tb_id value was added in a hole that existed on 64b systems. Using this we can also reverse the merge in the event that custom FIB rules are enabled. With this patch I am seeing an improvement of 20ns to 30ns for routing lookups as long as custom rules are not enabled, with custom rules enabled we fall back to split tables and the original behavior. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-11 16:22:14 -04:00
Johan Hedberg	4ba9faf35f	Bluetooth: Check for matching IRK when looking for paired LE devices If we're given an RPA when checking whether we're paired or not, we should consult the local RPA storage whether there's a matching IRK. This we we ensure that hci_bdaddr_is_paired() gives the right result even when trying to pair a second time with the same device with an RPA. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-03-11 15:54:23 +01:00
Johan Hedberg	87c8b28d29	Bluetooth: Fix missing rcu_read_unlock() in hci_bdaddr_is_paired() When finding a matching LTK the rcu_read_unlock() function was failing to release the RCU read lock. This patch adds the missing call to rcu_reaD_unlock(). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-03-11 08:52:32 +01:00
Marcel Holtmann	beb1c21b8e	Bluetooth: Increment management interface revision This patch increments the management interface revision due to introduction of new static address setting and fixes for the fast connectable feature. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-03-11 09:28:41 +02:00
David S. Miller	4363890079	net: Handle unregister properly when netdev namespace change fails. If rtnl_newlink() fails on it's call to dev_change_net_namespace(), we have to make use of the ->dellink() method, if present, just like we do when rtnl_configure_link() fails. Fixes: `317f4810e4` ("rtnl: allow to create device with IFLA_LINK_NETNSID set") Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-10 21:59:46 -04:00
Jon Paul Maloy	169bf9121b	tipc: ensure that idle links are deleted when a bearer is disabled commit `afaa3f65f6` (tipc: purge links when bearer is disabled) was an attempt to resolve a problem that turned out to have a more profound reason. When we disable a bearer, we delete all its pertaining links if there is no other bearer to perform failover to, or if the module is shutting down. In case there are dual bearers, we wait with deleting links until the failover procedure is finished. However, this misses the case when a link on the removed bearer was already down, so that there will be no failover procedure to finish the link delete. This causes confusion if a new bearer is added to replace the removed one, and also entails a small memory leak. This commit takes the current state of the link into account when deciding when to delete it, and also reverses the above-mentioned commit. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-10 18:37:36 -04:00
Alexander Duyck	ddb4b9a132	fib_trie: Address possible NULL pointer dereference in resize If the inflate call failed it would return NULL. As a result tp would be set to NULL and cause use to trigger a NULL pointer dereference in should_halve if the inflate failed on the first attempt. In order to prevent this we should decrement max_work before we actually attempt to inflate as this will force us to exit before attempting to halve a node we should have inflated. In order to keep things symmetric between inflate and halve I went ahead and also moved the decrement of max_work for the halve case as well so we take care of that before we actually attempt to halve the tnode. Fixes: `88bae714` ("fib_trie: Add key vector to root, return parent key_vector in resize") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-10 18:36:56 -04:00
Johan Hedberg	55e76b3898	Bluetooth: Add 'Already Paired' error for Pair Device command To make the behavior predictable when attempting to pair with a device for which we already have a Link Key or Long Term Key, this patch adds a new 'Already Paired' error which gets sent in such a scenario. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-03-10 21:42:05 +01:00
Alexander Duyck	3ec320dd5c	fib_trie: Correctly handle case of key == 0 in leaf_walk_rcu In the case of a trie that had no tnodes with a key of 0 the initial look-up would fail resulting in an out-of-bounds cindex on the first tnode. This resulted in an entire trie being skipped. In order resolve this I have updated the cindex logic in the initial look-up so that if the key is zero we will always traverse the child zero path. Fixes: `8be33e95` ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf") Reported-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Tested-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-10 16:13:55 -04:00
Oliver Hartkopp	7768eed8bf	net: add comment for sock_efree() usage Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-10 16:12:20 -04:00
Johan Hedberg	406ef2a67b	Bluetooth: Make Fast Connectable available while powered off To maximize the usability of the Fast Connectable feature we should make it possible to set (or unset) it at any given moment. This means removing the dependency on the 'connectable' setting as well as the 'powered' setting. The former makes also sense since page scan may get enabled through add_device even if 'connectable' is false. To keep the setting available over power cycles its flag also needs to be removed from the flags that are cleared upon HCI_Reset. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-03-10 19:37:02 +01:00
Eric Dumazet	34160ea3f9	inet_diag: add const to inet_diag_req_v2 diag dumpers should not modify the request. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-10 13:45:28 -04:00
Eric Dumazet	e31c5e0e48	inet_diag: cleanups Remove all inline keywords, add some const, and cleanup style. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-10 13:45:28 -04:00
Eric Dumazet	491da2a477	net: constify sock_diag_check_cookie() sock_diag_check_cookie() second parameter is constant Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-10 13:45:28 -04:00
David S. Miller	515fb5c317	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter fixes for net-next The following batch contains a couple of fixes to address some fallout from the previous pull request, they are: 1) Address link problems in the bridge code after `e5de75b`. Fix it by using rcu hook to address to avoid ifdef pollution and hard dependency between bridge and br_netfilter. 2) Address sparse warnings in the netfilter reject code, patch from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-10 12:48:47 -04:00
Pablo Neira Ayuso	1a4ba64d16	netfilter: bridge: use rcu hook to resolve br_netfilter dependency `e5de75b` ("netfilter: bridge: move DNAT helper to br_netfilter") results in the following link problem: net/bridge/br_device.c:29: undefined reference to `br_nf_prerouting_finish_bridge` Moreover it creates a hard dependency between br_netfilter and the bridge core, which is what we've been trying to avoid so far. Resolve this problem by using a hook structure so we reduce #ifdef pollution and keep bridge netfilter specific code under br_netfilter.c which was the original intention. Reported-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-10 15:03:02 +01:00
Florian Westphal	a03a8dbe20	netfilter: fix sparse warnings in reject handling make C=1 CF=-D__CHECK_ENDIAN__ shows following: net/bridge/netfilter/nft_reject_bridge.c:65:50: warning: incorrect type in argument 3 (different base types) net/bridge/netfilter/nft_reject_bridge.c:65:50: expected restricted __be16 [usertype] protocol [..] net/bridge/netfilter/nft_reject_bridge.c:102:37: warning: cast from restricted __be16 net/bridge/netfilter/nft_reject_bridge.c:102:37: warning: incorrect type in argument 1 (different base types) [..] net/bridge/netfilter/nft_reject_bridge.c:121:50: warning: incorrect type in argument 3 (different base types) [..] net/bridge/netfilter/nft_reject_bridge.c:168:52: warning: incorrect type in argument 3 (different base types) [..] net/bridge/netfilter/nft_reject_bridge.c:233:52: warning: incorrect type in argument 3 (different base types) [..] Caused by two (harmless) errors: 1. htons() instead of ntohs() 2. __be16 for protocol in nf_reject_ipXhdr_put API, use u8 instead. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-10 15:01:32 +01:00
Scott Feldman	f8f2147150	switchdev: add netlink flags to IPv4 FIB add op Pass in the netlink flags (NLM_F_*) into switchdev driver for IPv4 FIB add op to allow driver to 1) optimize hardware updates, 2) handle ip route prepend and append commands correctly. Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com> Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 23:56:52 -04:00
Florian Fainelli	769a020289	net: dsa: utilize of_find_net_device_by_node Using of_find_device_by_node() restricts the search to platform_device that match the specified device_node pointer. This is not even remotely true for network devices backed by a pci_device for instance. of_find_net_device_by_node() allows us to do a more thorough lookup to find the struct net_device corresponding to a particular device_node pointer. For symetry with the non-OF code path, we hold the net_device pointer in dsa_probe() just like what dev_to_net_dev() does when we call this function. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 23:50:21 -04:00
Florian Fainelli	aa836df958	net: core: add of_find_net_device_by_node() Add a helper function which allows getting the struct net_device pointer associated with a given struct device_node pointer. This is useful for instance for DSA Ethernet devices not backed by a platform_device, but a PCI device. Since we need to access net_class which is not accessible outside of net/core/net-sysfs.c, this helper function is also added here and gated with CONFIG_OF_NET. Network devices initialized with SET_NETDEV_DEV() are also taken into account by checking for dev->parent first and then falling back to checking the device pointer within struct net_device. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 23:50:20 -04:00
WANG Cong	5778d39d07	net_sched: fix struct tc_u_hnode layout in u32 We dynamically allocate divisor+1 entries for ->ht[] in tc_u_hnode: ht = kzalloc(sizeof(ht) + divisorsizeof(void *), GFP_KERNEL); So ->ht is supposed to be the last field of this struct, however this is broken, since an rcu head is appended after it. Fixes: `1ce87720d4` ("net: sched: make cls_u32 lockless") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 23:44:31 -04:00
David S. Miller	3cef5c5b0b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/cadence/macb.c Overlapping changes in macb driver, mostly fixes and cleanups in 'net' overlapping with the integration of at91_ether into macb in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 23:38:02 -04:00
Linus Torvalds	36bef88380	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) nft_compat accidently truncates ethernet protocol to 8-bits, from Arturo Borrero. 2) Memory leak in ip_vs_proc_conn(), from Julian Anastasov. 3) Don't allow the space required for nftables rules to exceed the maximum value representable in the dlen field. From Patrick McHardy. 4) bcm63xx_enet can accidently leave interrupts permanently disabled due to errors in the NAPI polling exit logic. Fix from Nicolas Schichan. 5) Fix OOPSes triggerable by the ping protocol module, due to missing address family validations etc. From Lorenzo Colitti. 6) Don't use RCU locking in sleepable context in team driver, from Jiri Pirko. 7) xen-netback miscalculates statistic offset pointers when reporting the stats to userspace. From David Vrabel. 8) Fix a leak of up to 256 pages per VIF destroy in xen-netaback, also from David Vrabel. 9) ip_check_defrag() cannot assume that skb_network_offset(), particularly when it is used by the AF_PACKET fanout defrag code. From Alexander Drozdov. 10) gianfar driver doesn't query OF node names properly when trying to determine the number of hw queues available. Fix it to explicitly check for OF nodes named queue-group. From Tobias Waldekranz. 11) MID field in macb driver should be 12 bits, not 16. From Punnaiah Choudary Kalluri. 12) Fix unintentional regression in traceroute due to timestamp socket option changes. Empty ICMP payloads should be allowed in non-timestamp cases. From Willem de Bruijn. 13) When devices are unregistered, we have to get rid of AF_PACKET multicast list entries that point to it via ifindex. Fix from Francesco Ruggeri. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits) tipc: fix bug in link failover handling net: delete stale packet_mclist entries net: macb: constify macb configuration data MAINTAINERS: add Marc Kleine-Budde as co maintainer for CAN networking layer MAINTAINERS: linux-can moved to github can: kvaser_usb: Read all messages in a bulk-in URB buffer can: kvaser_usb: Avoid double free on URB submission failures can: peak_usb: fix missing ctrlmode_ init for every dev can: add missing initialisations in CAN related skbuffs ip: fix error queue empty skb handling bgmac: Clean warning messages tcp: align tcp_xmit_size_goal() on tcp_tso_autosize() net: fec: fix unbalanced clk disable on driver unbind net: macb: Correct the MID field length value net: gianfar: correctly determine the number of queue groups ipv4: ip_check_defrag should not assume that skb_network_offset is zero net: bcmgenet: properly disable password matching net: eth: xgene: fix booting with devicetree bnx2x: Force fundamental reset for EEH recovery xen-netback: refactor xenvif_handle_frag_list() ...	2015-03-09 18:17:21 -07:00
Jon Paul Maloy	e6441bae32	tipc: fix bug in link failover handling In commit `c637c10355` ("tipc: resolve race problem at unicast message reception") we introduced a new mechanism for delivering buffers upwards from link to socket layer. That code contains a bug in how we handle the new link input queue during failover. When a link is reset, some of its users may be blocked because of congestion, and in order to resolve this, we add any pending wakeup pseudo messages to the link's input queue, and deliver them to the socket. This misses the case where the other, remaining link also may have congested users. Currently, the owner node's reference to the remaining link's input queue is unconditionally overwritten by the reset link's input queue. This has the effect that wakeup events from the remaining link may be unduely delayed (but not lost) for a potentially long period. We fix this by adding the pending events from the reset link to the input queue that is currently referenced by the node, whichever one it is. This commit should be applied to both net and net-next. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 16:20:41 -04:00
Francesco Ruggeri	82f17091e6	net: delete stale packet_mclist entries When an interface is deleted from a net namespace the ifindex in the corresponding entries in PF_PACKET sockets' mclists becomes stale. This can create inconsistencies if later an interface with the same ifindex is moved from a different namespace (not that unlikely since ifindexes are per-namespace). In particular we saw problems with dev->promiscuity, resulting in "promiscuity touches roof, set promiscuity failed. promiscuity feature of device might be broken" warnings and EOVERFLOW failures of setsockopt(PACKET_ADD_MEMBERSHIP). This patch deletes the mclist entries for interfaces that are deleted. Since this now causes setsockopt(PACKET_DROP_MEMBERSHIP) to fail with EADDRNOTAVAIL if called after the interface is deleted, also make packet_mc_drop not fail. Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 16:17:43 -04:00
Eric W. Biederman	ddb3b6033c	net: Remove protocol from struct dst_ops After my change to neigh_hh_init to obtain the protocol from the neigh_table there are no more users of protocol in struct dst_ops. Remove the protocol field from dst_ops and all of it's initializers. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 16:06:10 -04:00
David S. Miller	5428aef811	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. Basically, improvements for the packet rejection infrastructure, deprecation of CLUSTERIP, cleanups for nf_tables and some untangling for br_netfilter. More specifically they are: 1) Send packet to reset flow if checksum is valid, from Florian Westphal. 2) Fix nf_tables reject bridge from the input chain, also from Florian. 3) Deprecate the CLUSTERIP target, the cluster match supersedes it in functionality and it's known to have problems. 4) A couple of cleanups for nf_tables rule tracing infrastructure, from Patrick McHardy. 5) Another cleanup to place transaction declarations at the bottom of nf_tables.h, also from Patrick. 6) Consolidate Kconfig dependencies wrt. NF_TABLES. 7) Limit table names to 32 bytes in nf_tables. 8) mac header copying in bridge netfilter is already required when calling ip_fragment(), from Florian Westphal. 9) move nf_bridge_update_protocol() to br_netfilter.c, also from Florian. 10) Small refactor in br_netfilter in the transmission path, again from Florian. 11) Move br_nf_pre_routing_finish_bridge_slow() to br_netfilter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 15:58:21 -04:00
Geert Uytterhoeven	26c459a807	mpls: Spelling: s/conceved/conceived/, s/as/a/ Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 15:51:59 -04:00
Erik Hugne	143fe22f50	tipc: fix inconsistent signal handling regression Commit `9bbb4ecc68` ("tipc: standardize recvmsg routine") changed the sleep/wakeup behaviour for sockets entering recv() or accept(). In this process the order of reporting -EAGAIN/-EINTR was reversed. This caused problems with wrong errno being reported back if the timeout expires. The same problem happens if the socket is nonblocking and recv()/accept() is called when the process have pending signals. If there is no pending data read or connections to accept, -EINTR will be returned instead of -EAGAIN. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reported-by László Benedek <laszlo.benedek@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 15:42:19 -04:00
Jiri Pirko	f4427bc3e2	switchdev: use gpl variant of symbol export Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 15:37:39 -04:00
Erik Hugne	4fee6be813	tipc: sparse: fix htons conversion warnings Commit `d0f91938be` ("tipc: add ip/udp media type") introduced some new sparse warnings. Clean them up. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 15:37:04 -04:00
Cong Wang	1e052be69d	net_sched: destroy proto tp when all filters are gone Kernel automatically creates a tp for each (kind, protocol, priority) tuple, which has handle 0, when we add a new filter, but it still is left there after we remove our own, unless we don't specify the handle (literally means all the filters under the tuple). For example this one is left: # tc filter show dev eth0 filter parent 8001: protocol arp pref 49152 basic The user-space is hard to clean up these for kernel because filters like u32 are organized in a complex way. So kernel is responsible to remove it after all filters are gone. Each type of filter has its own way to store the filters, so each type has to provide its way to check if all filters are gone. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim<jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 15:35:55 -04:00
Pablo Neira Ayuso	e5de75bf88	netfilter: bridge: move DNAT helper to br_netfilter Only one caller, there is no need to keep this in a header. Move it to br_netfilter.c where this belongs to. Based on patch from Florian Westphal. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-09 17:56:07 +01:00
Florian Westphal	7a8d831df5	netfilter: bridge: refactor conditional in br_nf_dev_queue_xmit simpilifies followup patch that re-works brnf ip_fragment handling. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-09 13:22:58 +01:00
Florian Westphal	4a9d2f2008	netfilter: bridge: move nf_bridge_update_protocol to where its used no need to keep it in a header file. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-09 13:21:31 +01:00
Florian Westphal	8bd63cf1a4	bridge: move mac header copying into br_netfilter The mac header only has to be copied back into the skb for fragments generated by ip_fragment(), which only happens for bridge forwarded packets with nf-call-iptables=1 && active nf_defrag. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-09 13:20:48 +01:00
Oliver Hartkopp	969439016d	can: add missing initialisations in CAN related skbuffs When accessing CAN network interfaces with AF_PACKET sockets e.g. by dhclient this can lead to a skb_under_panic due to missing skb initialisations. Add the missing initialisations at the CAN skbuff creation times on driver level (rx path) and in the network layer (tx path). Reported-by: Austin Schuh <austin@peloton-tech.com> Reported-by: Daniel Steer <daniel.steer@mclaren.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2015-03-09 10:22:24 +01:00
Willem de Bruijn	c247f0534c	ip: fix error queue empty skb handling When reading from the error queue, msg_name and msg_control are only populated for some errors. A new exception for empty timestamp skbs added a false positive on icmp errors without payload. `traceroute -M udpconn` only displayed gateways that return payload with the icmp error: the embedded network headers are pulled before sock_queue_err_skb, leaving an skb with skb->len == 0 otherwise. Fix this regression by refining when msg_name and msg_control branches are taken. The solutions for the two fields are independent. msg_name only makes sense for errors that configure serr->port and serr->addr_offset. Test the first instead of skb->len. This also fixes another issue. saddr could hold the wrong data, as serr->addr_offset is not initialized in some code paths, pointing to the start of the network header. It is only valid when serr->port is set (non-zero). msg_control support differs between IPv4 and IPv6. IPv4 only honors requests for ICMP and timestamps with SOF_TIMESTAMPING_OPT_CMSG. The skb->len test can simply be removed, because skb->dev is also tested and never true for empty skbs. IPv6 honors requests for all errors aside from local errors and timestamps on empty skbs. In both cases, make the policy more explicit by moving this logic to a new function that decides whether to process msg_control and that optionally prepares the necessary fields in skb->cb[]. After this change, the IPv4 and IPv6 paths are more similar. The last case is rxrpc. Here, simply refine to only match timestamps. Fixes: `49ca0d8bfa` ("net-timestamp: no-payload option") Reported-by: Jan Niehusmann <jan@gondor.com> Signed-off-by: Willem de Bruijn <willemb@google.com> ---- Changes v1->v2 - fix local origin test inversion in ip6_datagram_support_cmsg - make v4 and v6 code paths more similar by introducing analogous ipv4_datagram_support_cmsg - fix compile bug in rxrpc Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-08 23:01:54 -04:00
Eric W. Biederman	b79bda3d38	neigh: Use neigh table index for neigh_packet_xmit Remove a little bit of unnecessary work when transmitting a packet with neigh_packet_xmit. Use the neighbour table index not the address family as a parameter. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-08 19:30:06 -04:00
Eric W. Biederman	7d5f41f276	mpls: Fix the openvswitch select of NET_MPLS_GSO Fix the OPENVSWITCH Kconfig option and old Kconfigs by having OPENVSWITCH select both NET_MPLS_GSO and MPLSO. A Kbuild test robot reported that when NET_MPLS_GSO is selected by OPENVSWITCH the generated .config is broken because MPLS is not selected. Cc: Simon Horman <horms@verge.net.au> Fixes: `cec9166ca4` mpls: Refactor how the mpls module is built Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-08 19:30:06 -04:00
Eric W. Biederman	aa7da93756	mpls: Correct the ttl decrement. According to RFC3032 section 2.4.2 packets with an outgoing ttl of 0 MUST NOT be forwarded. According to section 2.4.1 an outgoing TTL of 0 comes from an incomming TTL <= 1. Therefore any packets that is received with a ttl <= 1 should not have it's ttl decremented and forwarded. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-08 19:30:06 -04:00
Eric W. Biederman	0f7bbd5805	mpls: Better error code for unsupported option. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-08 19:30:06 -04:00
Eric W. Biederman	19d0c341d9	mpls: Cleanup the rcu usage in the code. Sparse was generating a lot of warnings mostly from missing annotations in the code. Add missing annotations and in a few cases tweak the code for performance by moving work before loops. This also fixes a problematic ommision of rcu_assign_pointer and rcu_dereference. Hopefully with complete rcu annotations any new rcu errors will stick out like a sore thumb. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-08 19:30:06 -04:00
Eric W. Biederman	d865616e18	mpls: Fix the kzalloc argument order in mpls_rt_alloc Blink I got the argument order wrong to kzalloc and the code was working properly when tested. Blink Fix that. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-08 19:30:06 -04:00
Al Viro	1711fd9add	sunrpc: fix braino in ->poll() POLL_OUT isn't what callers of ->poll() are expecting to see; it's actually __SI_POLL \| 2 and it's a siginfo code, not a poll bitmap bit... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Cc: Bruce Fields <bfields@fieldses.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-03-08 12:53:46 -07:00
Linus Torvalds	bbbce516bb	TTY/Serial fixes for 4.0-rc3 Here are some tty and serial driver fixes for 4.0-rc3. Along with the atime fix that you know about, here are some other serial driver bugfixes as well. Most notable is a wait_until_sent bugfix that was traced back to being around since before 2.6.12 that Johan has fixed up. All have been in linux-next successfully. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlT8RCYACgkQMUfUDdst+yk62QCgycxS4giC2hyRver3dyvaNR6g zYYAn2w0uRndW+AqP4Tls54isRz6owpF =gA2k -----END PGP SIGNATURE----- Merge tag 'tty-4.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here are some tty and serial driver fixes for 4.0-rc3. Along with the atime fix that you know about, here are some other serial driver bugfixes as well. Most notable is a wait_until_sent bugfix that was traced back to being around since before 2.6.12 that Johan has fixed up. All have been in linux-next successfully" * tag 'tty-4.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: TTY: fix tty_wait_until_sent maximum timeout TTY: fix tty_wait_until_sent on 64-bit machines USB: serial: fix infinite wait_until_sent timeout TTY: bfin_jtag_comm: remove incorrect wait_until_sent operation net: irda: fix wait_until_sent poll timeout serial: uapi: Declare all userspace-visible io types serial: core: Fix iotype userspace breakage serial: sprd: Fix missing spin_unlock in sprd_handle_irq() console: Fix console name size mismatch tty: fix up atime/mtime mess, take four serial: 8250_dw: Fix get_mctrl behaviour serial:8250:8250_pci: delete unneeded quirk entries serial:8250:8250_pci: fix redundant entry report for WCH_CH352_2S Change email address for 8250_pci serial: 8250: Revert "tty: serial: 8250_core: read only RX if there is something in the FIFO" Revert "tty/serial: of_serial: add DT alias ID handling"	2015-03-08 12:25:40 -07:00
Alexander Aring	0402d9f233	Bluetooth: fix sco_exit compile warning While compiling the following warning occurs: WARNING: net/built-in.o(.init.text+0x602c): Section mismatch in reference from the function bt_init() to the function .exit.text:sco_exit() The function __init bt_init() references a function __exit sco_exit(). This is often seen when error handling in the init function uses functionality in the exit path. The fix is often to remove the __exit annotation of sco_exit() so it may be used outside an exit section. Since commit `6d785aa345` ("Bluetooth: Convert mgmt to use HCI chan registration API") the function "sco_exit" is used inside of function "bt_init". The suggested solution by remove the __exit annotation solved this issue. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-03-07 22:13:17 +02:00
Eric Dumazet	58025e46ea	net: gro: remove obsolete code from skb_gro_receive() Some drivers use copybreak to copy tiny frames into smaller skb, and this smaller skb might not have skb->head_frag set for various reasons. skb_gro_receive() currently doesn't allow to aggregate the smaller skb into the previous GRO packet if this GRO packet has at least 2 MSS in it. Following workload easily demonstrates the problem. netperf -t TCP_RR -H target -- -r 3000,3000 (tcpdump shows one GRO packet with 2 MSS, plus one additional packet of 104 bytes that should have been appended.) It turns out that we can remove code from skb_gro_receive(), because commit `8a29111c7c` ("net: gro: allow to build full sized skb") and its followups removed the assumption that a GRO packet with a frag_list had to have an empty head. Removing this code allows the aggregation of the last (incomplete) frame in some RPC workloads. Note that tcp_gro_receive() already takes care of forcing a flush if necessary, including this case. If we want to avoid using frag_list in the first place (in forwarding workloads for example, as the outgoing NIC is generally not able to cope with skbs having a frag_list), we need to address this separately. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 21:50:55 -05:00
Shani Michaeli	c93682477b	net/dcb: Add IEEE QCN attribute As specified in 802.1Qau spec. Add this optional attribute to the DCB netlink layer. To allow for application to use the new attribute, NIC drivers should implement and register the callbacks ieee_getqcn, ieee_setqcn and ieee_getqcnstats. The QCN attribute holds a set of parameters for management, and a set of statistics to provide informative data on Congestion-Control defined by this spec. Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 21:50:02 -05:00
Johan Hovold	2c3fbe3cf2	net: irda: fix wait_until_sent poll timeout In case an infinite timeout (0) is requested, the irda wait_until_sent implementation would use a zero poll timeout rather than the default 200ms. Note that wait_until_sent is currently never called with a 0-timeout argument due to a bug in tty_wait_until_sent. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Cc: stable <stable@vger.kernel.org> # v2.6.12 Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-03-07 03:44:14 +01:00
Alexander Duyck	88bae7149a	fib_trie: Add key vector to root, return parent key_vector in resize This change makes it so that the root of the trie contains a key_vector, by doing this we make room to essentially collapse the entire trie by at least one cache line as we can store the information about the tnode or leaf that is pointed to in the root. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:49:28 -05:00
Alexander Duyck	f23e59fbd7	fib_trie: Move parent from key_vector to tnode This change pulls the parent pointer from the key_vector and places it in the tnode structure. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:49:28 -05:00
Alexander Duyck	6e22d174ba	fib_trie: Pull empty_children and full_children into tnode This pulls the information about the child array out of the key_vector and places it in the tnode since that is where it is needed. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:49:28 -05:00
Alexander Duyck	56ca2adf6a	fib_trie: Move rcu from key_vector to tnode, add accessors. RCU is only needed once for the entire node, not once per key_vector so we can pull that out and move it to the tnode structure. In addition add accessors to be used inside the RCU functions so that we can more easily get from the key vector to either the tnode or the trie pointers. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:49:28 -05:00
Alexander Duyck	dc35dbeda3	fib_trie: Add tnode struct as a container for fields not needed in key_vector This change pulls the fields not explicitly needed in the key_vector and placed them in the new tnode structure. By doing this we will eventually be able to reduce the key_vector down to 16 bytes on 64 bit systems, and 12 bytes on 32 bit systems. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:49:28 -05:00
Alexander Duyck	2e1ac88a48	fib_trie: Rename tnode_child_length to child_length We are now checking the length of a key_vector instead of a tnode so it makes sense to probably just rename this to child_length since it would probably even be applicable to a leaf. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:49:28 -05:00
Alexander Duyck	754baf8dec	fib_trie: replace tnode_get_child functions with get_child macros I am replacing the tnode_get_child call with get_child since we are techically pulling the child out of a key_vector now and not a tnode. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:49:27 -05:00
Alexander Duyck	35c6edac19	fib_trie: Rename tnode to key_vector Rename the tnode to key_vector. The key_vector will be the eventual container for all of the information needed by either a leaf or a tnode. The final result should be much smaller than the 40 bytes currently needed for either one. This also updates the trie struct so that it contains an array of size 1 of tnode pointers. This is to bring the structure more inline with how an actual tnode itself is configured. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:49:27 -05:00
Alexander Duyck	8d8e810ca8	fib_trie: Return pointer to tnode pointer in resize/inflate/halve Resize related functions now all return a pointer to the pointer that references the object that was resized. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:49:27 -05:00
Alexander Duyck	72be72607a	fib_trie: Minor cleanups to fib_table_flush_external This change just does a couple of minor cleanups on fib_table_flush_external. Specifically it addresses the fact that resize was being called even though nothing was being removed from the table, and it drops an unecessary indent since we could just call continue on the inverse of the fi && flag check. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:49:27 -05:00
Robert Shearman	f8d54afc4c	mpls: Properly validate RTA_VIA payload length If the nla length is less than 2 then the nla data could be accessed beyond the accessible bounds. So ensure that the nla is big enough to at least read the via_family before doing so. Replace magic value of 2. Fixes: `03c0566542` ("mpls: Basic support for adding and removing routes") Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Robert Shearman <rshearma@brocade.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:19:06 -05:00
Fan Du	05cbc0db03	ipv4: Create probe timer for tcp PMTU as per RFC4821 As per RFC4821 7.3. Selecting Probe Size, a probe timer should be armed once probing has converged. Once this timer expired, probing again to take advantage of any path PMTU change. The recommended probing interval is 10 minutes per RFC1981. Probing interval could be sysctled by sysctl_tcp_probe_interval. Eric Dumazet suggested to implement pseudo timer based on 32bits jiffies tcp_time_stamp instead of using classic timer for such rare event. Signed-off-by: Fan Du <fan.du@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 14:57:42 -05:00
Fan Du	6b58e0a5f3	ipv4: Use binary search to choose tcp PMTU probe_size Current probe_size is chosen by doubling mss_cache, the probing process will end shortly with a sub-optimal mss size, and the link mtu will not be taken full advantage of, in return, this will make user to tweak tcp_base_mss with care. Use binary search to choose probe_size in a fine granularity manner, an optimal mss will be found to boost performance as its maxmium. In addition, introduce a sysctl_tcp_probe_threshold to control when probing will stop in respect to the width of search range. Test env: Docker instance with vxlan encapuslation(82599EB) iperf -c 10.0.0.24 -t 60 before this patch: 1.26 Gbits/sec After this patch: increase 26% 1.59 Gbits/sec Signed-off-by: Fan Du <fan.du@intel.com> Acked-by: John Heffner <johnwheffner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 14:57:41 -05:00
Eric W. Biederman	aaa4e70404	DECnet: Only use neigh_ops for adding the link layer header Other users users of the neighbour table use neigh->output as the method to decided when and which link-layer header to place on a packet. DECnet has been using neigh->output to decide which DECnet headers to place on a packet depending which neighbour the packet is destined for. The DECnet usage isn't totally wrong but it can run into problems if the neighbour output function is run for a second time as the teql driver and the bridge netfilter code can do. Therefore to avoid pathologic problems later down the line and make the neighbour code easier to understand by refactoring the decnet output code to only use a neighbour method to add a link layer header to a packet. This is done by moving the neigbhour operations lookup from dn_to_neigh_output to dn_neigh_output_packet. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 14:54:22 -05:00
Johan Hedberg	7a00ff445f	Bluetooth: Add mgmt_send_event() helper to send to any HCI channel Currently the mgmt_event() function is only capable of sending to HCI_CHANNEL_CONTROL. To void having to change all users of it, add a new mgmt_send_event() function that takes a channel parameter, and make the old mgmt_event() a wrapper that passes MGMT_CHANNEL_CONTROL to it. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-03-06 20:15:22 +01:00
Johan Hedberg	3b0602cd01	Bluetooth: Rename pending_cmd to mgmt_pending_cmd This patch renames the pending_cmd struct (used for tracking pending mgmt commands) to mgmt_pending_cmd, so that it can be moved to a more generic place and be used also by other modules using other HCI channels. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-03-06 20:15:21 +01:00
Johan Hedberg	2a1afb5ac8	Bluetooth: Rename cmd_complete() to mgmt_cmd_complete() This patch renames the cmd_complete() function to mgmt_cmd_complete() in preparation of making it a generic helper for other modules to use too. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-03-06 20:15:21 +01:00
Johan Hedberg	a69e8375a1	Bluetooth: Rename cmd_status() to mgmt_cmd_status() This patch renames the cmd_status() function to mgmt_cmd_status() in preparation of making it a generic helper for other modules to use too. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-03-06 20:15:21 +01:00
Johan Hedberg	b9a245fb12	Bluetooth: Move all mgmt command quirks to handler table In order to completely generalize the mgmt command handling we need to move away command-specific information from mgmt_control() into the actual command table. This patch adds a new 'flags' field to the handler entries which can now contain the following command specific information: - Command takes variable length parameters - Command doesn't target any specific HCI device - Command can be sent when the HCI device is unconfigured After this the mgmt_control() function is completely generic and can potentially be reused by new HCI channels. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-03-06 20:15:21 +01:00
Johan Hedberg	6d785aa345	Bluetooth: Convert mgmt to use HCI chan registration API This patch converts the existing mgmt code to use the newly introduced generic API for registering HCI channels with mgmt-like semantics. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-03-06 20:15:21 +01:00
Johan Hedberg	801c1e8da5	Bluetooth: Add mgmt HCI channel registration API This patch adds an API for registering HCI channels with mgmt-like semantics. For now the only user will be HCI_CHANNEL_CONTROL, but e.g. 6lowpan is intended to use this as well in the future. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-03-06 20:15:21 +01:00
Marcel Holtmann	93690c227a	Bluetooth: Introduce controller setting information for static address Currently it is not possible to determine if the static address is used by the controller. It is also not possible to determine if using a static on a dual-mode controller with disabled BR/EDR is possible or not. To address this issue, introduce a new setting called static-address. If support for this setting is signaled that means that the kernel supports using static addresses. And if used on dual-mode controllers with BR/EDR disabled it means that a configured static address can be used. In addition utilize the same setting for the list of current active settings that indicates if a static address is configured and if that address will be actually used. With this in mind the existing Set Static Address management command has been extended to return the current settings. That way the caller of that command can easily determine if the programmed address will be used or if extra steps are required. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-03-06 20:43:07 +02:00
Linus Torvalds	1b1bd56191	NFS client bugfixes for Linux 4.0 Highlights include: - Fix a regression in the NFSv4 open state recovery code - Fix a regression in the NFSv4 close code - Fix regressions and side-effects of the loop-back mounted NFS fixes in 3.18, that cause the NFS read() syscall to return EBUSY. - Fix regressions around the readdirplus code and how it interacts with the VFS lazy unmount changes that went into v3.18. - Fix issues with out-of-order RPC call replies replacing updated attributes with stale ones (particularly after a truncate()). - Fix an underflow checking issue with RPC/RDMA credits - Fix a number of issues with the NFSv4 delegation return/free code. - Fix issues around stale NFSv4.1 leases when doing a mount -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU+SJ6AAoJEGcL54qWCgDyVgQQAKNsF/O2O9ip2uAHZRZvM6TS Ev8c3Spj/FmRI1tlCcGi1zZ8uCSwvPQz3uAN0vOTMocUWjokT5sAgN5yBIxHasem 6YK7jxs9WHiP7MGyReaAFJwG/W6LZndnlNqPWPs9KiaWwKVXIsQ3uFm/Y0lr90Fi ew16DQm0DUd4Yvv42WJR9ay7UwUPvT7wmaGIVVK2hjQqr2lx02jspt5kfrC+vsMU OYU/0YDofb2ajs5krbah6tUHf3VDnSVrXP6if66IrukCM9S4AvowpnMJQ5QJALh+ cPlqHDm2ZzuIecpqZEgYLM73wQ2q+KBXTlDLcgYg6LjnqBEivwO9RDn6tfCwKTcS tCFohQc9iDOj9rTZ9EQlQME6u/FdxWncpxyTsMyBk7FlcLsOQRqio/FXZhbyJGuH gvIdIW3fPseijtejpYkxgabe6JL9NRvjv3SnOay7xHs9Vn4tRkFF2mkkQDZOG6HT atkxQp8kB3m9gMoeAmTLdTcJkcFdk6AKnNrcyJaa1GW4msmMuGZtq/6vayKJsBZb OIw788bDSOVpcVR+6SAC24/dutcl+WJHlSJShqvIrTKejBxPCc7IVSeg63FJsWTO sxfXdUr3wVJ1ooDFCspeBj+3zAimIq7qDmRRs85ekgEtxrgdUldhg/VwbDjvLjmb whXFpiCS7Ii0fVYtZvjz =qMB7 -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.0-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: "Highlights include: - Fix a regression in the NFSv4 open state recovery code - Fix a regression in the NFSv4 close code - Fix regressions and side-effects of the loop-back mounted NFS fixes in 3.18, that cause the NFS read() syscall to return EBUSY. - Fix regressions around the readdirplus code and how it interacts with the VFS lazy unmount changes that went into v3.18. - Fix issues with out-of-order RPC call replies replacing updated attributes with stale ones (particularly after a truncate()). - Fix an underflow checking issue with RPC/RDMA credits - Fix a number of issues with the NFSv4 delegation return/free code. - Fix issues around stale NFSv4.1 leases when doing a mount" * tag 'nfs-for-4.0-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (24 commits) NFSv4.1: Clear the old state by our client id before establishing a new lease NFSv4: Fix a race in NFSv4.1 server trunking discovery NFS: Don't write enable new pages while an invalidation is proceeding NFS: Fix a regression in the read() syscall NFSv4: Ensure we skip delegations that are already being returned NFSv4: Pin the superblock while we're returning the delegation NFSv4: Ensure we honour NFS_DELEGATION_RETURNING in nfs_inode_set_delegation() NFSv4: Ensure that we don't reap a delegation that is being returned NFS: Fix stateid used for NFS v4 closes NFSv4: Don't call put_rpccred() under the rcu_read_lock() NFS: Don't require a filehandle to refresh the inode in nfs_prime_dcache() NFSv3: Use the readdir fileid as the mounted-on-fileid NFS: Don't invalidate a submounted dentry in nfs_prime_dcache() NFSv4: Set a barrier in the update_changeattr() helper NFS: Fix nfs_post_op_update_inode() to set an attribute barrier NFS: Remove size hack in nfs_inode_attrs_need_update() NFSv4: Add attribute update barriers to delegreturn and pNFS layoutcommit NFS: Add attribute update barriers to NFS writebacks NFS: Set an attribute barrier on all updates NFS: Add attribute update barriers to nfs_setattr_update_inode() ...	2015-03-06 10:09:57 -08:00
Scott Feldman	e1315db17d	switchdev: fix CONFIG_IP_MULTIPLE_TABLES compile issue Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 12:43:54 -05:00
David S. Miller	23375a0fd5	ipv4: Fix unused variable warnings in fib_table_flush_external. net/ipv4/fib_trie.c: In function ‘fib_table_flush_external’: net/ipv4/fib_trie.c:1572:6: warning: unused variable ‘found’ [-Wunused-variable] int found = 0; ^ net/ipv4/fib_trie.c:1571:16: warning: unused variable ‘slen’ [-Wunused-variable] unsigned char slen; ^ Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:38:35 -05:00
Scott Feldman	8e05fd7166	fib: hook IPv4 fib for hardware offload Call into the switchdev driver any time an IPv4 fib entry is added/modified/deleted from the kernel's FIB. The switchdev driver may or may not install the route to the offload device. In the case where the driver tries to install the route and something goes wrong (device's routing table is full, etc), then all of the offloaded routes will be flushed from the device, route forwarding falls back to the kernel, and no more routes are offloading. We can refine this logic later. For now, use the simplist model of offloading routes up to the point of failure, and then on failure, undo everything and mark IPv4 offloading disabled. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:24:58 -05:00
Scott Feldman	b5d6fbdeed	switchdev: implement IPv4 fib ndo wrappers Flesh out ndo wrappers to call into device driver. To call into device driver, the wrapper must interate over route's nexthops to ensure all nexthop devs belong to the same switch device. Currently, there is no support for route's nexthops spanning offloaded and non-offloaded devices, or spanning ports of multiple offload devices. Since switch device ports may be stacked under virtual interfaces (bonds and/or bridges), and the route's nexthop may be on the virtual interface, the wrapper will traverse the nexthop dev down to the base dev. It's the base dev that's passed to the switchdev driver's ndo ops. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:24:58 -05:00
Scott Feldman	104616e74e	switchdev: don't support custom ip rules, for now Keep switchdev FIB offload model simple for now and don't allow custom ip rules. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:24:58 -05:00
Scott Feldman	5e8d90497d	switchdev: add IPv4 fib ndo ops wrappers Add IPv4 fib ndo wrapper funcs and stub them out for now. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:24:58 -05:00
Florian Fainelli	c86e59b9e6	net: dsa: extract dsa switch tree setup and removal Extract the core logic that setups a 'struct dsa_switch_tree' and removes it, update dsa_probe() and dsa_remove() to use the two helper functions. This will be useful to allow for other callers to setup this structure differently. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:18:20 -05:00
Florian Fainelli	5929903103	net: dsa: let switches specify their tagging protocol In order to support the new DSA device driver model, a dsa_switch should be able to advertise the type of tagging protocol supported by the underlying switch device. This also removes constraints on how tagging can be stacked to each other. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:18:20 -05:00
Florian Fainelli	df197195a5	net: dsa: split dsa_switch_setup into two functions Split the part of dsa_switch_setup() which is responsible for allocating and initializing a 'struct dsa_switch' and the part which is doing a given switch device setup and slave network device creation. This is a preliminary change to allow a separate caller of dsa_switch_setup_one() which may have externally initialized the dsa_switch structure, outside of dsa_switch_setup(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:18:20 -05:00
Florian Fainelli	b324c07ac4	net: dsa: allow deferred probing In preparation for allowing a different model to register DSA switches, update dsa_of_probe() and dsa_probe() to return -EPROBE_DEFER where appropriate. Failure to find a phandle or Device Tree property is still fatal, but looking up the internal device structure associated with a Device Tree node is something that might need to be delayed based on driver probe ordering. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:18:20 -05:00
Florian Fainelli	f1a26a062f	net: dsa: update dsa_of_{probe, remove} to use a device pointer In preparation for allowing a different mechanism to register DSA switch devices and driver, update dsa_of_probe and dsa_of_remove to take a struct device pointer since neither of these two functions uses the struct platform_device pointer. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:18:20 -05:00
Eric Dumazet	496127290f	inet_diag: remove duplicate code from inet_twsk_diag_dump() timewait sockets now share a common base with established sockets. inet_twsk_diag_dump() can use inet_diag_bc_sk() instead of duplicating code, granted that inet_diag_bc_sk() does proper userlocks initialization. twsk_build_assert() will catch any future changes that could break the assumptions. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-05 22:55:44 -05:00
Eric Dumazet	6c09fa09d4	tcp: align tcp_xmit_size_goal() on tcp_tso_autosize() With some mss values, it is possible tcp_xmit_size_goal() puts one segment more in TSO packet than tcp_tso_autosize(). We send then one TSO packet followed by one single MSS. It is not a serious bug, but we can do slightly better, especially for drivers using netif_set_gso_max_size() to lower gso_max_size. Using same formula avoids these corner cases and makes tcp_xmit_size_goal() a bit faster. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `605ad7f184` ("tcp: refine TSO autosizing") Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-05 22:31:12 -05:00
Erik Hugne	d0f91938be	tipc: add ip/udp media type The ip/udp bearer can be configured in a point-to-point mode by specifying both local and remote ip/hostname, or it can be enabled in multicast mode, where links are established to all tipc nodes that have joined the same multicast group. The multicast IP address is generated based on the TIPC network ID, but can be overridden by using another multicast address as remote ip. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-05 22:08:42 -05:00
Erik Hugne	948fa2d115	tipc: increase size of tipc discovery messages The payload area following the TIPC discovery message header is an opaque area defined by the media. INT_H_SIZE was enough for Ethernet/IB/IPv4 but needs to be expanded to carry IPv6 addressing information. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-05 22:08:42 -05:00
David S. Miller	9d73b42bbf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) Don't truncate ethernet protocol type to u8 in nft_compat, from Arturo Borrero. 2) Fix several problems in the addition/deletion of elements in nf_tables. 3) Fix module refcount leak in ip_vs_sync, from Julian Anastasov. 4) Fix a race condition in the abort path in the nf_tables transaction infrastructure. Basically aborted rules can show up as active rules until changes are unrolled, oneliner from Patrick McHardy. 5) Check for overflows in the data area of the rule, also from Patrick. 6) Fix off-by-one in the per-rule user data size field. This introduces a new nft_userdata structure that is placed at the beginning of the user data area that contains the length to save some bits from the rule and we only need one bit to indicate its presence, from Patrick. 7) Fix rule replacement error path, the replaced rule is deleted on error instead of leaving it in place. This has been fixed by relying on the abort path to undo the incomplete replacement. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-05 21:51:07 -05:00
Alexander Drozdov	3e32e733d1	ipv4: ip_check_defrag should not assume that skb_network_offset is zero ip_check_defrag() may be used by af_packet to defragment outgoing packets. skb_network_offset() of af_packet's outgoing packets is not zero. Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-05 21:43:48 -05:00
WANG Cong	33f8b9ecdb	net_sched: move tp->root allocation into fw_init() Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-05 21:30:44 -05:00
WANG Cong	a05c2d112c	net_sched: move tp->root allocation into route4_init() Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-05 21:30:44 -05:00
Stephen Rothwell	4b5edb2f4a	mpls: using vzalloc requires including vmalloc.h Fixes this build error: net/mpls/af_mpls.c: In function 'resize_platform_label_table': net/mpls/af_mpls.c:767:4: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration] labels = vzalloc(size); ^ Fixes: `7720c01f3f` ("mpls: Add a sysctl to control the size of the mpls label table") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-05 21:01:33 -05:00
Pablo Neira Ayuso	1cae565e8b	netfilter: nf_tables: limit maximum table name length to 32 bytes Set the same as we use for chain names, it should be enough. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-06 01:21:21 +01:00
Pablo Neira Ayuso	f04e599e20	netfilter: nf_tables: consolidate Kconfig options Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-06 01:21:15 +01:00
Patrick McHardy	354bf5a0d7	netfilter: nf_tables: consolidate tracing invocations * JUMP and GOTO are equivalent except for JUMP pushing the current context to the stack * RETURN and implicit RETURN (CONTINUE) are equivalent except that the logged rule number differs Result: nft_do_chain \| -112 1 function changed, 112 bytes removed, diff: -112 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-06 01:21:12 +01:00
Patrick McHardy	01ef16c2dd	netfilter: nf_tables: minor tracing cleanups The tracing code is squeezed between multiple related parts of the evaluation code, move it out. Also add an inline wrapper for the reoccuring test for skb->nf_trace. Small code savings in nft_do_chain(): nft_trace_packet \| -137 nft_do_chain \| -8 2 functions changed, 145 bytes removed, diff: -145 net/netfilter/nf_tables_core.c: __nft_trace_packet \| +137 1 function changed, 137 bytes added, diff: +137 net/netfilter/nf_tables_core.o: 3 functions changed, 137 bytes added, 145 bytes removed, diff: -8 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-06 01:21:11 +01:00
Pablo Neira Ayuso	43270b1bc5	netfilter: ipt_CLUSTERIP: deprecate it in favour of xt_cluster xt_cluster supersedes ipt_CLUSTERIP since it can be also used in gateway configurations (not only from the backend side). ipt_CLUSTER is also known to leak the netdev that it uses on device removal, which requires a rather large fix to workaround the problem: http://patchwork.ozlabs.org/patch/358629/ So let's deprecate this so we can probably kill code this in the future. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-06 01:21:05 +01:00
Jouni Malinen	842a9ae08a	bridge: Extend Proxy ARP design to allow optional rules for Wi-Fi This extends the design in commit `958501163d` ("bridge: Add support for IEEE 802.11 Proxy ARP") with optional set of rules that are needed to meet the IEEE 802.11 and Hotspot 2.0 requirements for ProxyARP. The previously added BR_PROXYARP behavior is left as-is and a new BR_PROXYARP_WIFI alternative is added so that this behavior can be configured from user space when required. In addition, this enables proxyarp functionality for unicast ARP requests for both BR_PROXYARP and BR_PROXYARP_WIFI since it is possible to use unicast as well as broadcast for these frames. The key differences in functionality: BR_PROXYARP: - uses the flag on the bridge port on which the request frame was received to determine whether to reply - block bridge port flooding completely on ports that enable proxy ARP BR_PROXYARP_WIFI: - uses the flag on the bridge port to which the target device of the request belongs - block bridge port flooding selectively based on whether the proxyarp functionality replied Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-05 14:52:23 -05:00
kbuild test robot	787fb2bd42	ax25: Fix the build when CONFIG_INET is disabled > > >> net/ax25/ax25_ip.c:225:26: error: unknown type name 'sturct' > netdev_tx_t ax25_ip_xmit(sturct sk_buff skb) > ^ > > vim +/sturct +225 net/ax25/ax25_ip.c > > 219 unsigned short type, const void daddr, > 220 const void saddr, unsigned int len) > 221 { > 222 return -AX25_HEADER_LEN; > 223 } > 224 > > 225 netdev_tx_t ax25_ip_xmit(sturct sk_buff skb) > 226 { > 227 kfree_skb(skb); > 228 return NETDEV_TX_OK; Ooops I misspelled struct... Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-05 13:17:39 -05:00
Jakub Pawlowski	82f8b651a9	Bluetooth: fix service discovery behaviour for empty uuids filter This patch fixes service discovery behaviour, when provided uuid filter is empty and HCI_QUIRK_STRICT_DUPLICATE_FILTER is set. Before this patch, empty uuid filter was unable to trigger scan restart, and that caused inconsistent behaviour in applications. Example: two DBus clients call BlueZ, one to find all devices with service abcd, second to find all devices with rssi smaller than -90. Sum of those filters, that is passed to mgmt_service_scan is empty filter, with no rssi or uuids set. That caused kernel not to restart scan when quirk was set. That was inconsistent with what happen when there's only one of those two filters set (scan is restarted and reports devices). To fix that, new variable hdev->discovery.result_filtering was introduced. It can indicate that filtered scan is running, no matter what uuid or rssi filter is set. Signed-off-by: Jakub Pawlowski <jpawlowski@google.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-03-05 09:50:50 +02:00
Jakub Pawlowski	2976cdeb27	Bluetooth: Refactor service discovery filter logic This patch refactor code responsible for filtering when service discovery method is used. Previously this code was mixed with mgmt_device found logic. Now when it's in one place whole logic can be greatly simplified. That includes removing no longer necessary length field and merging checks for eir and scan_rsp. Signed-off-by: Jakub Pawlowski <jpawlowski@google.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-03-05 09:50:50 +02:00
Jakub Pawlowski	48f86b7f26	Bluetooth: Move Service Discovery logic before refactoring This patch moves whole packet filering logic of service discovery into new function is_filter_match. It's done because logic inside mgmt_device_found is very complicated and needs some simplification. Also having whole logic in one place will allow to simplify it in the future. Signed-off-by: Jakub Pawlowski <jpawlowski@google.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-03-05 09:50:50 +02:00
Alexander Duyck	1de3d87bcd	fib_trie: Prevent allocating tnode if bits is too big for size_t This patch adds code to prevent us from attempting to allocate a tnode with a size larger than what can be represented by size_t. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:35:18 -05:00
Alexander Duyck	71e8b67d0f	fib_trie: Update last spot w/ idx >> n->bits code and explanation This change updates the fib_table_lookup function so that it is in sync with the fib_find_node function in terms of the explanation for the index check based on the bits value. I have also updated it from doing a mask to just doing a compare as I have found that seems to provide more options to the compiler as I have seen it turn this into a shift of the value and test under some circumstances. In addition I addressed one minor issue in which we kept computing the key ^ n->key when checking the fib aliases. I pulled the xor out of the loop in order to reduce the number of memory reads in the lookup. As a result we should save a couple cycles since the xor is only done once much earlier in the lookup. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:35:18 -05:00
Alexander Duyck	a7e5353123	fib_trie: Make fib_table rcu safe The fib_table was wrapped in several places with an rcu_read_lock/rcu_read_unlock however after looking over the code I found several spots where the tables were being accessed as just standard pointers without any protections. This change fixes that so that all of the proper protections are in place when accessing the table to take RCU replacement or removal of the table into account. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:35:18 -05:00
Alexander Duyck	41b489fd6c	fib_trie: move leaf and tnode to occupy the same spot in the key vector If we are going to compact the leaf and tnode we first need to make sure the fields are all in the same place. In that regard I am moving the leaf pointer which represents the fib_alias hash list to occupy what is currently the first key_vector pointer. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:35:18 -05:00
Alexander Duyck	d5d6487cb8	fib_trie: Update insert and delete to make use of tp from find_node This change makes it so that the insert and delete functions make use of the tnode pointer returned in the fib_find_node call. By doing this we will not have to rely on the parent pointer in the leaf which will be going away soon. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:35:18 -05:00
Alexander Duyck	d4a975e83f	fib_trie: Fib find node should return parent This change makes it so that the parent pointer is returned by reference in fib_find_node. By doing this I can use it to find the parent node when I am performing an insertion and I don't have to look for it again in fib_insert_node. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:35:17 -05:00
Alexander Duyck	8be33e955c	fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf This change makes it so that leaf_walk_rcu takes a tnode and a key instead of the trie and a leaf. The main idea behind this is to avoid using the leaf parent pointer as that can have additional overhead in the future as I am trying to reduce the size of a leaf down to 16 bytes on 64b systems and 12b on 32b systems. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:35:17 -05:00
Alexander Duyck	7289e6ddb6	fib_trie: Only resize tnodes once instead of on each leaf removal in fib_table_flush This change makes it so that we only call resize on the tnodes, instead of from each of the leaves. By doing this we can significantly reduce the amount of time spent resizing as we can update all of the leaves in the tnode first before we make any determinations about resizing. As a result we can simply free the tnode in the case that all of the leaves from a given tnode are flushed instead of resizing with each leaf removed. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:35:17 -05:00
Wu Fengguang	f0126539c7	mpls: rtm_mpls_policy[] can be static Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 16:39:45 -05:00
Lorenzo Colitti	9145736d48	net: ping: Return EAFNOSUPPORT when appropriate. 1. For an IPv4 ping socket, ping_check_bind_addr does not check the family of the socket address that's passed in. Instead, make it behave like inet_bind, which enforces either that the address family is AF_INET, or that the family is AF_UNSPEC and the address is 0.0.0.0. 2. For an IPv6 ping socket, ping_check_bind_addr returns EINVAL if the socket family is not AF_INET6. Return EAFNOSUPPORT instead, for consistency with inet6_bind. 3. Make ping_v4_sendmsg and ping_v6_sendmsg return EAFNOSUPPORT instead of EINVAL if an incorrect socket address structure is passed in. 4. Make IPv6 ping sockets be IPv6-only. The code does not support IPv4, and it cannot easily be made to support IPv4 because the protocol numbers for ICMP and ICMPv6 are different. This makes connect(::ffff:192.0.2.1) fail with EAFNOSUPPORT instead of making the socket unusable. Among other things, this fixes an oops that can be triggered by: int s = socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP); struct sockaddr_in6 sin6 = { .sin6_family = AF_INET6, .sin6_addr = in6addr_any, }; bind(s, (struct sockaddr *) &sin6, sizeof(sin6)); Change-Id: If06ca86d9f1e4593c0d6df174caca3487c57a241 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 15:46:51 -05:00
Pablo Neira Ayuso	59900e0a01	netfilter: nf_tables: fix error handling of rule replacement In general, if a transaction object is added to the list successfully, we can rely on the abort path to undo what we've done. This allows us to simplify the error handling of the rule replacement path in nf_tables_newrule(). This implicitly fixes an unnecessary removal of the old rule, which needs to be left in place if we fail to replace. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-04 18:46:08 +01:00
Patrick McHardy	86f1ec3231	netfilter: nf_tables: fix userdata length overflow The NFT_USERDATA_MAXLEN is defined to 256, however we only have a u8 to store its size. Introduce a struct nft_userdata which contains a length field and indicate its presence using a single bit in the rule. The length field of struct nft_userdata is also a u8, however we don't store zero sized data, so the actual length is udata->len + 1. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-04 18:46:06 +01:00
Patrick McHardy	9889840f59	netfilter: nf_tables: check for overflow of rule dlen field Check that the space required for the expressions doesn't exceed the size of the dlen field, which would lead to the iterators crashing. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-04 18:46:05 +01:00
Patrick McHardy	8670c3a55e	netfilter: nf_tables: fix transaction race condition A race condition exists in the rule transaction code for rules that get added and removed within the same transaction. The new rule starts out as inactive in the current and active in the next generation and is inserted into the ruleset. When it is deleted, it is additionally set to inactive in the next generation as well. On commit the next generation is begun, then the actions are finalized. For the new rule this would mean clearing out the inactive bit for the previously current, now next generation. However nft_rule_clear() clears out the bits for both generations, activating the rule in the current generation, where it should be deactivated due to being deleted. The rule will thus be active until the deletion is finalized, removing the rule from the ruleset. Similarly, when aborting a transaction for the same case, the undo of insertion will remove it from the RCU protected rule list, the deletion will clear out all bits. However until the next RCU synchronization after all operations have been undone, the rule is active on CPUs which can still see the rule on the list. Generally, there may never be any modifications of the current generations' inactive bit since this defeats the entire purpose of atomicity. Change nft_rule_clear() to only touch the next generations bit to fix this. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-04 18:46:04 +01:00
Eric W. Biederman	8de147dc8e	mpls: Multicast route table change notifications Unlike IPv4 this code notifies on all cases where mpls routes are added or removed and it never automatically removes routes. Avoiding both the userspace confusion that is caused by omitting route updates and the possibility of a flood of netlink traffic when an interface goes doew. For now reserved labels are handled automatically and userspace is not notified. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:26:06 -05:00
Eric W. Biederman	03c0566542	mpls: Netlink commands to add, remove, and dump routes This change adds two new netlink routing attributes: RTA_VIA and RTA_NEWDST. RTA_VIA specifies the specifies the next machine to send a packet to like RTA_GATEWAY. RTA_VIA differs from RTA_GATEWAY in that it includes the address family of the address of the next machine to send a packet to. Currently the MPLS code supports addresses in AF_INET, AF_INET6 and AF_PACKET. For AF_INET and AF_INET6 the destination mac address is acquired from the neighbour table. For AF_PACKET the destination mac_address is specified in the netlink configuration. I think raw destination mac address support with the family AF_PACKET will prove useful. There is MPLS-TP which is defined to operate on machines that do not support internet packets of any flavor. Further seem to be corner cases where it can be useful. At this point I don't care much either way. RTA_NEWDST specifies the destination address to forward the packet with. MPLS typically changes it's destination address at every hop. For a swap operation RTA_NEWDST is specified with a length of one label. For a push operation RTA_NEWDST is specified with two or more labels. For a pop operation RTA_NEWDST is not specified or equivalently an emtpy RTAN_NEWDST is specified. Those new netlink attributes are used to implement handling of rt-netlink RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE messages, to maintain the MPLS label table. rtm_to_route_config parses a netlink RTM_NEWROUTE or RTM_DELROUTE message, verify no unhandled attributes or unhandled values are present and sets up the data structures for mpls_route_add and mpls_route_del. I did my best to match up with the existing conventions with the caveats that MPLS addresses are all destination-specific-addresses, and so don't properly have a scope. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:26:06 -05:00
Eric W. Biederman	966bae3349	mpls: Functions for reading and wrinting mpls labels over netlink Reading and writing addresses in network byte order in netlink is traditional and I see no reason to change that. MPLS is interesting as effectively it has variabely length addresses (the MPLS label stack). To represent these variable length addresses in netlink I use a valid MPLS label stack (complete with stop bit). This achieves two things: a well defined existing format is used, and the data can be interpreted without looking at it's length. Not needed to look at the length to decode the variable length network representation allows existing userspace functions such as inet_ntop to be used without needed to change their prototype. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:26:06 -05:00
Eric W. Biederman	a2519929ab	mpls: Basic support for adding and removing routes mpls_route_add and mpls_route_del implement the basic logic for adding and removing Next Hop Label Forwarding Entries from the MPLS input label map. The addition and subtraction is done in a way that is consistent with how the existing routing table in Linux are maintained. Thus all of the work to deal with NLM_F_APPEND, NLM_F_EXCL, NLM_F_REPLACE, and NLM_F_CREATE. Cases that are not clearly defined such as changing the interpretation of the mpls reserved labels is not allowed. Because it seems like the right thing to do adding an MPLS route without specifying an input label and allowing the kernel to pick a free label table entry is supported. The implementation is currently less than optimal but that can be changed. As I don't have anything else to test with only ethernet and the loopback device are the only two device types currently supported for forwarding MPLS over. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:26:06 -05:00
Eric W. Biederman	7720c01f3f	mpls: Add a sysctl to control the size of the mpls label table This sysctl gives two benefits. By defaulting the table size to 0 mpls even when compiled in and enabled defaults to not forwarding any packets. This prevents unpleasant surprises for users. The other benefit is that as mpls labels are allocated locally a dense table a small dense label table may be used which saves memory and is extremely simple and efficient to implement. This sysctl allows userspace to choose the restrictions on the label table size userspace applications need to cope with. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:26:06 -05:00
Eric W. Biederman	0189197f44	mpls: Basic routing support This change adds a new Kconfig option MPLS_ROUTING. The core of this change is the code to look at an mpls packet received from another machine. Look that packet up in a routing table and forward the packet on. Support of MPLS over ATM is not considered or attempted here. This implemntation follows RFC3032 and implements the MPLS shim header that can pass over essentially any network. What RFC3021 refers to as the as the Incoming Label Map (ILM) I call net->mpls.platform_label[]. What RFC3031 refers to as the Next Label Hop Forwarding Entry (NHLFE) I call mpls_route. Though calling it the label fordwarding information base (lfib) might also be valid. Further the implemntation forwards packets as described in RFC3032. There is no need and given the original motivation for MPLS a strong discincentive to have a flexible label forwarding path. In essence the logic is the topmost label is read, looked up, removed, and replaced by 0 or more new lables and the sent out the specified interface to it's next hop. Quite a few optional features are not implemented here. Among them are generation of ICMP errors when the TTL is exceeded or the packet is larger than the next hop MTU (those conditions are detected and the packets are dropped instead of generating an icmp error). The traffic class field is always set to 0. The implementation focuses on IP over MPLS and does not handle egress of other kinds of protocols. Instead of implementing coordination with the neighbour table and sorting out how to input next hops in a different address family (for which there is value). I was lazy and implemented a next hop mac address instead. The code is simpler and there are flavor of MPLS such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is appropriate so a next hop by mac address would need to be implemented at some point. Two new definitions AF_MPLS and PF_MPLS are exposed to userspace. Decoding the mpls header must be done by first byeswapping a 32bit bit endian word into the local cpu endian and then bit shifting to extract the pieces. There is no C bit-field that can represent a wire format mpls header on a little endian machine as the low bits of the 20bit label wind up in the wrong half of third byte. Therefore internally everything is deal with in cpu native byte order except when writing to and reading from a packet. For management simplicity if a label is configured to forward out an interface that is down the packet is dropped early. Similarly if an network interface is removed rt_dev is updated to NULL (so no reference is preserved) and any packets for that label are dropped. Keeping the label entries in the kernel allows the kernel label table to function as the definitive source of which labels are allocated and which are not. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:26:06 -05:00
Eric W. Biederman	cec9166ca4	mpls: Refactor how the mpls module is built This refactoring is needed to allow more than just mpls gso support to be built into the mpls moddule. Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:26:06 -05:00
Eric W. Biederman	4fd3d7d9e8	neigh: Add helper function neigh_xmit For MPLS I am building the code so that either the neighbour mac address can be specified or we can have a next hop in ipv4 or ipv6. The kind of next hop we have is indicated by the neighbour table pointer. A neighbour table pointer of NULL is a link layer address. A non-NULL neighbour table pointer indicates which neighbour table and thus which address family the next hop address is in that we need to look up. The code either sends a packet directly or looks up the appropriate neighbour table entry and sends the packet. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:23:23 -05:00
Eric W. Biederman	60395a20ff	neigh: Factor out ___neigh_lookup_noref While looking at the mpls code I found myself writing yet another version of neigh_lookup_noref. We currently have __ipv4_lookup_noref and __ipv6_lookup_noref. So to make my work a little easier and to make it a smidge easier to verify/maintain the mpls code in the future I stopped and wrote ___neigh_lookup_noref. Then I rewote __ipv4_lookup_noref and __ipv6_lookup_noref in terms of this new function. I tested my new version by verifying that the same code is generated in ip_finish_output2 and ip6_finish_output2 where these functions are inlined. To get to ___neigh_lookup_noref I added a new neighbour cache table function key_eq. So that the static size of the key would be available. I also added __neigh_lookup_noref for people who want to to lookup a neighbour table entry quickly but don't know which neibhgour table they are going to look up. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:23:23 -05:00
Johannes Berg	2f56f6be47	bridge: fix bridge netlink RCU usage When the STP timer fires, it can call br_ifinfo_notify(), which in turn ends up in the new br_get_link_af_size(). This function is annotated to be using RTNL locking, which clearly isn't the case here, and thus lockdep warns: =============================== [ INFO: suspicious RCU usage. ] 3.19.0+ #569 Not tainted ------------------------------- net/bridge/br_private.h:204 suspicious rcu_dereference_protected() usage! Fix this by doing RCU locking here. Fixes: `b7853d73e3` ("bridge: add vlan info to bridge setlink and dellink notification messages") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:20:22 -05:00
David S. Miller	71a83a6db6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/rocker/rocker.c The rocker commit was two overlapping changes, one to rename the ->vport member to ->pport, and another making the bitmask expression use '1ULL' instead of plain '1'. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-03 21:16:48 -05:00
Linus Torvalds	a6c5170d1e	Merge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux Pull nfsd fixes from Bruce Fields: "Three miscellaneous bugfixes, most importantly the clp->cl_revoked bug, which we've seen several reports of people hitting" * 'for-4.0' of git://linux-nfs.org/~bfields/linux: sunrpc: integer underflow in rsc_parse() nfsd: fix clp->cl_revoked list deletion causing softlock in nfsd svcrpc: fix memory leak in gssp_accept_sec_context_upcall	2015-03-03 15:52:50 -08:00
Linus Torvalds	789d7f60cd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) If an IPVS tunnel is created with a mixed-family destination address, it cannot be removed. Fix from Alexey Andriyanov. 2) Fix module refcount underflow in netfilter's nft_compat, from Pablo Neira Ayuso. 3) Generic statistics infrastructure can reference variables sitting on a released function stack, therefore use dynamic allocation always. Fix from Ignacy Gawędzki. 4) skb_copy_bits() return value test is inverted in ip_check_defrag(). 5) Fix network namespace exit in openvswitch, we have to release all of the per-net vports. From Pravin B Shelar. 6) Fix signedness bug in CAIF's cfpkt_iterate(), from Dan Carpenter. 7) Fix rhashtable grow/shrink behavior, only expand during inserts and shrink during deletes. From Daniel Borkmann. 8) Netdevice names with semicolons should never be allowed, because they serve as a separator. From Matthew Thode. 9) Use {,__}set_current_state() where appropriate, from Fabian Frederick. 10) Revert byte queue limits support in r8169 driver, it's causing regressions we can't figure out. 11) tcp_should_expand_sndbuf() erroneously uses tp->packets_out to measure packets in flight, properly use tcp_packets_in_flight() instead. From Neal Cardwell. 12) Fix accidental removal of support for bluetooth in CSR based Intel wireless cards. From Marcel Holtmann. 13) We accidently added a behavioral change between native and compat tasks, wrt testing the MSG_CMSG_COMPAT bit. Just ignore it if the user happened to set it in a native binary as that was always the behavior we had. From Catalin Marinas. 14) Check genlmsg_unicast() return valud in hwsim netlink tx frame handling, from Bob Copeland. 15) Fix stale ->radar_required setting in mac80211 that can prevent starting new scans, from Eliad Peller. 16) Fix memory leak in nl80211 monitor, from Johannes Berg. 17) Fix race in TX index handling in xen-netback, from David Vrabel. 18) Don't enable interrupts in amx-xgbe driver until all software et al. state is ready for the interrupt handler to run. From Thomas Lendacky. 19) Add missing netlink_ns_capable() checks to rtnl_newlink(), from Eric W Biederman. 20) The amount of header space needed in macvtap was not calculated properly, fix it otherwise we splat past the beginning of the packet. From Eric Dumazet. 21) Fix bcmgenet TCP TX perf regression, from Jaedon Shin. 22) Don't raw initialize or mod timers, use setup_timer() and mod_timer() instead. From Vaishali Thakkar. 23) Fix software maintained statistics in bcmgenet and systemport drivers, from Florian Fainelli. 24) DMA descriptor updates in sh_eth need proper memory barriers, from Ben Hutchings. 25) Don't do UDP Fragmentation Offload on RAW sockets, from Michal Kubecek. 26) Openvswitch's non-masked set actions aren't constructed properly into netlink messages, fix from Joe Stringer. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits) openvswitch: Fix serialization of non-masked set actions. gianfar: Reduce logging noise seen due to phy polling if link is down ibmveth: Add function to enable live MAC address changes net: bridge: add compile-time assert for cb struct size udp: only allow UFO for packets from SOCK_DGRAM sockets sh_eth: Really fix padding of short frames on TX Revert "sh_eth: Enable Rx descriptor word 0 shift for r8a7790" sh_eth: Fix RX recovery on R-Car in case of RX ring underrun sh_eth: Ensure proper ordering of descriptor active bit write/read net/mlx4_en: Disbale GRO for incoming loopback/selftest packets net/mlx4_core: Fix wrong mask and error flow for the update-qp command net: systemport: fix software maintained statistics net: bcmgenet: fix software maintained statistics rxrpc: don't multiply with HZ twice rxrpc: terminate retrans loop when sending of skb fails net/hsr: Fix NULL pointer dereference and refcnt bugs when deleting a HSR interface. net: pasemi: Use setup_timer and mod_timer net: stmmac: Use setup_timer and mod_timer net: 8390: axnet_cs: Use setup_timer and mod_timer net: 8390: pcnet_cs: Use setup_timer and mod_timer ...	2015-03-03 15:30:07 -08:00
Joe Perches	1cea7e2c9f	l2tp: Use eth_<foo>_addr instead of memset Use the built-in function instead of memset. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-03 17:01:38 -05:00
Joe Perches	d2beae1078	wireless: Use eth_<foo>_addr instead of memset Use the built-in function instead of memset. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-03 17:01:38 -05:00
Joe Perches	c84a67a2fc	mac80211: Use eth_<foo>_addr instead of memset Use the built-in function instead of memset. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-03 17:01:38 -05:00
Joe Perches	afc130dd39	ethernet: Use eth_<foo>_addr instead of memset Use the built-in function instead of memset. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-03 17:01:38 -05:00
Joe Perches	211b85349c	bluetooth: Use eth_<foo>_addr instead of memset Use the built-in function instead of memset. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-03 17:01:37 -05:00
Joe Perches	19ffa562ec	atm: Use eth_<foo>_addr instead of memset Use the built-in function instead of memset. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-03 17:01:37 -05:00
Joe Perches	1a73de0719	appletalk: Use eth_<foo>_addr instead of memset Use the built-in function instead of memset. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-03 17:01:37 -05:00
Joe Perches	423049ab1e	8021q: Use eth_<foo>_addr instead of memset Use the built-in function instead of memset. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-03 17:01:37 -05:00
Eric W. Biederman	1d5da757da	ax25: Stop using magic neighbour cache operations. Before the ax25 stack calls dev_queue_xmit it always calls ax25_type_trans which sets skb->protocol to ETH_P_AX25. Which means that by looking at the protocol type it is possible to detect IP packets that have not been munged by the ax25 stack in ndo_start_xmit and call a function to munge them. Rename ax25_neigh_xmit to ax25_ip_xmit and tweak the return type and value to be appropriate for an ndo_start_xmit function. Update all of the ax25 devices to test the protocol type for ETH_P_IP and return ax25_ip_xmit as the first thing they do. This preserves the existing semantics of IP packet processing, but the timing will be a little different as the IP packets now pass through the qdisc layer before reaching the ax25 ip packet processing. Remove the now unnecessary ax25 neighbour table operations. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-03 14:44:41 -05:00
Joe Stringer	f4f8e73850	openvswitch: Fix serialization of non-masked set actions. Set actions consist of a regular OVS_KEY_ATTR_* attribute nested inside of a OVS_ACTION_ATTR_SET action attribute. When converting masked actions back to regular set actions, the inner attribute length was not changed, ie, double the length being serialized. This patch fixes the bug. Fixes: `83d2b9b` ("net: openvswitch: Support masked set actions.") Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-03 14:38:57 -05:00
Fabian Frederick	bcc90e3fb1	net/atm/signaling.c: remove WAIT_FOR_DEMON code WAIT_FOR_DEMON code is directly undefined at the beginning of signaling.c since initial git version and thus never compiled. This also removes buggy current->state direct access. Suggested-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-03 14:22:11 -05:00
Florian Westphal	71e168b151	net: bridge: add compile-time assert for cb struct size make build fail if structure no longer fits into ->cb storage. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-03 14:07:04 -05:00
Michal Kazior	aa75ebc275	mac80211: disable u-APSD queues by default Some APs experience problems when working with U-APSD. Decreasing the probability of that happening by using legacy mode for all ACs but VO isn't enough. Cisco 4410N originally forced us to enable VO by default only because it treated non-VO ACs as legacy. However some APs (notably Netgear R7000) silently reclassify packets to different ACs. Since u-APSD ACs require trigger frames for frame retrieval clients would never see some frames (e.g. ARP responses) or would fetch them accidentally after a long time. It makes little sense to enable u-APSD queues by default because it needs userspace applications to be aware of it to actually take advantage of the possible additional powersavings. Implicitly depending on driver autotrigger frame support doesn't make much sense. Cc: stable@vger.kernel.org Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-03-03 10:14:47 +01:00
Bob Copeland	d0c22119f5	mac80211: drop unencrypted frames in mesh fwding The mesh forwarding path was not checking that data frames were protected when running an encrypted network; add the necessary check. Cc: stable@vger.kernel.org Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-03-03 09:27:28 +01:00
Michal Kubeček	acf8dd0a9d	udp: only allow UFO for packets from SOCK_DGRAM sockets If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer checksum is to be computed on segmentation. However, in this case, skb->csum_start and skb->csum_offset are never set as raw socket transmit path bypasses udp_send_skb() where they are usually set. As a result, driver may access invalid memory when trying to calculate the checksum and store the result (as observed in virtio_net driver). Moreover, the very idea of modifying the userspace provided UDP header is IMHO against raw socket semantics (I wasn't able to find a document clearly stating this or the opposite, though). And while allowing CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit too intrusive change just to handle a corner case like this. Therefore disallowing UFO for packets from SOCK_DGRAM seems to be the best option. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 22:19:29 -05:00
Florian Westphal	72500bc11e	netfilter: bridge: rework reject handling bridge reject handling is not straightforward, there are many subtle differences depending on configuration. skb->dev is either the bridge port (PRE_ROUTING) or the bridge itself (INPUT), so we need to use indev instead. Also, checksum validation will only work reliably if we trim skb according to the l3 header size. While at it, add csum validation for ipv6 and skip existing tests if skb was already checked e.g. by GRO. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-03 02:10:51 +01:00
Florian Westphal	ee586bbc28	netfilter: reject: don't send icmp error if csum is invalid tcp resets are never emitted if the packet that triggers the reject/reset has an invalid checksum. For icmp error responses there was no such check. It allows to distinguish icmp response generated via iptables -I INPUT -p udp --dport 42 -j REJECT and those emitted by network stack (won't respond if csum is invalid, REJECT does). Arguably its possible to avoid this by using conntrack and only using REJECT with -m conntrack NEW/RELATED. However, this doesn't work when connection tracking is not in use or when using nf_conntrack_checksum=0. Furthermore, sending errors in response to invalid csums doesn't make much sense so just add similar test as in nf_send_reset. Validate csum if needed and only send the response if it is ok. Reference: http://bugzilla.redhat.com/show_bug.cgi?id=1169829 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-03 02:10:35 +01:00
Eric W. Biederman	435e8eb27e	neigh: Don't require a dst in neigh_resolve_output Having a dst helps a little bit for teql but is fundamentally unnecessary and there are code paths where a dst is not available that it would be nice to use the neighbour cache. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 16:43:41 -05:00
Eric W. Biederman	bdf53c5849	neigh: Don't require dst in neigh_hh_init - Add protocol to neigh_tbl so that dst->ops->protocol is not needed - Acquire the device from neigh->dev This results in a neigh_hh_init that will cache the samve values regardless of the packets flowing through it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 16:43:41 -05:00
Eric W. Biederman	59b2af26b9	arp: Kill arp_find There are no more callers so kill this function. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 16:43:41 -05:00
Eric W. Biederman	d476059e77	net: Kill dev_rebuild_header Now that there are no more users kill dev_rebuild_header and all of it's implementations. This is long overdue. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 16:43:41 -05:00
Eric W. Biederman	945db424bf	ax25: Stop depending on arp_find Have ax25_neigh_output perform ordinary arp resolution before calling ax25_neigh_xmit. Call dev_hard_header in ax25_neigh_output with a destination address so it will not fail, and the destination mac address will not need to be set in ax25_neigh_xmit. Remove arp_find from ax25_neigh_xmit (the ordinary arp resolution added to ax25_neigh_output removes the need for calling arp_find). Document how close ax25_neigh_output is to neigh_resolve_output. Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-hams@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 16:43:41 -05:00
Eric W. Biederman	abb7b755d9	ax25: Stop calling/abusing dev_rebuild_header - Rename ax25_rebuild_header to ax25_neigh_xmit and call it from ax25_neigh_output directly. The rename is to make it clear that this is not a rebuild_header operation. - Remove ax25_rebuild_header from ax25_header_ops. Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-hams@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 16:43:40 -05:00
Eric W. Biederman	def6775369	neigh: Move neigh_compat_output into ax25_ip.c The only caller is now is ax25_neigh_construct so move neigh_compat_output into ax25_ip.c make it static and rename it ax25_neigh_output. Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-hams@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 16:43:40 -05:00
Eric W. Biederman	21bfb8e933	arp: Remove special case to give AX25 it's open arp operations. The special case has been pushed out into ax25_neigh_construct so there is no need to keep this code in arp.c Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 16:43:40 -05:00
Eric W. Biederman	3b6a94bed0	ax25: Refactor to use private neighbour operations. AX25 already has it's own private arp cache operations to isolate it's abuse of dev_rebuild_header to transmit packets. Add a function ax25_neigh_construct that will allow all of the ax25 devices to force using these operations, so that the generic arp code does not need to. Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-hams@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 16:43:40 -05:00
Eric W. Biederman	46d4e47abe	ax25: Make ax25_header and ax25_rebuild_header static The only user is in ax25_ip.c so stop exporting these functions. Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-hams@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 16:43:40 -05:00
Eric W. Biederman	03ec2ac097	rose: Transmit packets in rose_xmit not rose_rebuild_header Patterned after the similar code in net/rom this turns out to be a trivial obviously correct transmformation. Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-hams@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 16:43:39 -05:00
Eric W. Biederman	b753af31ab	rose: Set the destination address in rose_header Not setting the destination address is a bug that I suspect causes no problems today, as only the arp code seems to call dev_hard_header and the description I have of rose is that it is expected to be used with a static neigbour table. I have derived the offset and the length of the rose destination address from rose_rebuild_header where arp_find calls neigh_ha_snapshot to set the destination address. Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-hams@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 16:43:39 -05:00
Eric W. Biederman	e18dbd0593	ax25: In ax25_rebuild_header add missing kfree_skb In the unlikely (impossible?) event that we attempt to transmit an ax25 packet over a non-ax25 device free the skb so we don't leak it. Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-hams@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 16:43:39 -05:00
David S. Miller	77f0379fa8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next A small batch with accumulated updates in nf-next, mostly IPVS updates, they are: 1) Add 64-bits stats counters to IPVS, from Julian Anastasov. 2) Move NETFILTER_XT_MATCH_ADDRTYPE out of NETFILTER_ADVANCED as docker seem to require this, from Anton Blanchard. 3) Use boolean instead of numeric value in set_match_v*(), from coccinelle via Fengguang Wu. 4) Allows rescheduling of new connections in IPVS when port reuse is detected, from Marcelo Ricardo Leitner. 5) Add missing bits to support arptables extensions from nft_compat, from Arturo Borrero. Patrick is preparing a large batch to enhance the set infrastructure, named expressions among other things, that should follow up soon after this batch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 14:55:05 -05:00
Daniel Borkmann	49b31e576a	filter: refactor common filter attach code into __sk_attach_prog Both sk_attach_filter() and sk_attach_bpf() are setting up sk_filter, charging skmem and attaching it to the socket after we got the eBPF prog up and ready. Lets refactor that into a common helper. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 14:51:38 -05:00
David S. Miller	70c836a4d1	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-03-02 Here's the first bluetooth-next pull request targeting the 4.1 kernel: - ieee802154/6lowpan cleanups - SCO routing to host interface support for the btmrvl driver - AMP code cleanups - Fixes to AMP HCI init sequence - Refactoring of the HCI callback mechanism - Added shutdown routine for Intel controllers in the btusb driver - New config option to enable/disable Bluetooth debugfs information - Fix for early data reception on L2CAP fixed channels Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 14:47:12 -05:00
Ying Xue	1b78414047	net: Remove iocb argument from sendmsg and recvmsg After TIPC doesn't depend on iocb argument in its internal implementations of sendmsg() and recvmsg() hooks defined in proto structure, no any user is using iocb argument in them at all now. Then we can drop the redundant iocb argument completely from kinds of implementations of both sendmsg() and recvmsg() in the entire networking stack. Cc: Christoph Hellwig <hch@lst.de> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 13:06:31 -05:00
Ying Xue	39a0295f90	tipc: Don't use iocb argument in socket layer Currently the iocb argument is used to idenfiy whether or not socket lock is hold before tipc_sendmsg()/tipc_send_stream() is called. But this usage prevents iocb argument from being dropped through sendmsg() at socket common layer. Therefore, in the commit we introduce two new functions called __tipc_sendmsg() and __tipc_send_stream(). When they are invoked, it assumes that their callers have taken socket lock, thereby avoiding the weird usage of iocb argument. Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 13:06:31 -05:00
Arturo Borrero	5f15893943	netfilter: nft_compat: add support for arptables extensions This patch adds support to arptables extensions from nft_compat. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-03-02 12:28:13 +01:00
Eyal Birger	744d5a3e9f	net: move skb->dropcount to skb->cb[] Commit `977750076d` ("af_packet: add interframe drop cmsg (v6)") unionized skb->mark and skb->dropcount in order to allow recording of the socket drop count while maintaining struct sk_buff size. skb->dropcount was introduced since there was no available room in skb->cb[] in packet sockets. However, its introduction led to the inability to export skb->mark, or any other aliased field to userspace if so desired. Moving the dropcount metric to skb->cb[] eliminates this problem at the expense of 4 bytes less in skb->cb[] for protocol families using it. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 00:19:30 -05:00
Eyal Birger	3bc3b96f3b	net: add common accessor for setting dropcount on packets As part of an effort to move skb->dropcount to skb->cb[], use a common function in order to set dropcount in struct sk_buff. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 00:19:30 -05:00
Eyal Birger	b4772ef879	net: use common macro for assering skb->cb[] available size in protocol families As part of an effort to move skb->dropcount to skb->cb[] use a common macro in protocol families using skb->cb[] for ancillary data to validate available room in skb->cb[]. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 00:19:30 -05:00
Eyal Birger	2472d7613b	net: packet: use sockaddr_ll fields as storage for skb original length in recvmsg path As part of an effort to move skb->dropcount to skb->cb[], 4 bytes of additional room are needed in skb->cb[] in packet sockets. Store the skb original length in the first two fields of sockaddr_ll (sll_family and sll_protocol) as they can be derived from the skb when needed. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 00:19:29 -05:00
Eyal Birger	2cfdf9fcb8	net: rxrpc: change call to sock_recv_ts_and_drops() on rxrpc recvmsg to sock_recv_timestamp() Commit `3b885787ea` ("net: Generalize socket rx gap / receive queue overflow cmsg") allowed receiving packet dropcount information as a socket level option. RXRPC sockets recvmsg function was changed to support this by calling sock_recv_ts_and_drops() instead of sock_recv_timestamp(). However, protocol families wishing to receive dropcount should call sock_queue_rcv_skb() or set the dropcount specifically (as done in packet_rcv()). This was not done for rxrpc and thus this feature never worked on these sockets. Formalizing this by not calling sock_recv_ts_and_drops() in rxrpc as part of an effort to move skb->dropcount into skb->cb[] Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 00:19:29 -05:00
Eyal Birger	6368c23577	net: bluetooth: compact struct bt_skb_cb by converting boolean fields to bit fields Convert boolean fields incoming and req_start to bit fields and move force_active in order save space in bt_skb_cb in an effort to use a portion of skb->cb[] for storing skb->dropcount. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 00:19:29 -05:00
Eyal Birger	49a6fe0557	net: bluetooth: compact struct bt_skb_cb by inlining struct hci_req_ctrl struct hci_req_ctrl is never used outside of struct bt_skb_cb; Inlining it frees 8 bytes on a 64 bit system in skb->cb[] allowing the addition of more ancillary data. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 00:19:29 -05:00
Daniel Borkmann	e2e9b6541d	cls_bpf: add initial eBPF support for programmable classifiers This work extends the "classic" BPF programmable tc classifier by extending its scope also to native eBPF code! This allows for user space to implement own custom, 'safe' C like classifiers (or whatever other frontend language LLVM et al may provide in future), that can then be compiled with the LLVM eBPF backend to an eBPF elf file. The result of this can be loaded into the kernel via iproute2's tc. In the kernel, they can be JITed on major archs and thus run in native performance. Simple, minimal toy example to demonstrate the workflow: #include <linux/ip.h> #include <linux/if_ether.h> #include <linux/bpf.h> #include "tc_bpf_api.h" __section("classify") int cls_main(struct sk_buff *skb) { return (0x800 << 16) \| load_byte(skb, ETH_HLEN + __builtin_offsetof(struct iphdr, tos)); } char __license[] __section("license") = "GPL"; The classifier can then be compiled into eBPF opcodes and loaded via tc, for example: clang -O2 -emit-llvm -c cls.c -o - \| llc -march=bpf -filetype=obj -o cls.o tc filter add dev em1 parent 1: bpf cls.o [...] As it has been demonstrated, the scope can even reach up to a fully fledged flow dissector (similarly as in samples/bpf/sockex2_kern.c). For tc, maps are allowed to be used, but from kernel context only, in other words, eBPF code can keep state across filter invocations. In future, we perhaps may reattach from a different application to those maps e.g., to read out collected statistics/state. Similarly as in socket filters, we may extend functionality for eBPF classifiers over time depending on the use cases. For that purpose, cls_bpf programs are using BPF_PROG_TYPE_SCHED_CLS program type, so we can allow additional functions/accessors (e.g. an ABI compatible offset translation to skb fields/metadata). For an initial cls_bpf support, we allow the same set of helper functions as eBPF socket filters, but we could diverge at some point in time w/o problem. I was wondering whether cls_bpf and act_bpf could share C programs, I can imagine that at some point, we introduce i) further common handlers for both (or even beyond their scope), and/or if truly needed ii) some restricted function space for each of them. Both can be abstracted easily through struct bpf_verifier_ops in future. The context of cls_bpf versus act_bpf is slightly different though: a cls_bpf program will return a specific classid whereas act_bpf a drop/non-drop return code, latter may also in future mangle skbs. That said, we can surely have a "classify" and "action" section in a single object file, or considered mentioned constraint add a possibility of a shared section. The workflow for getting native eBPF running from tc [1] is as follows: for f_bpf, I've added a slightly modified ELF parser code from Alexei's kernel sample, which reads out the LLVM compiled object, sets up maps (and dynamically fixes up map fds) if any, and loads the eBPF instructions all centrally through the bpf syscall. The resulting fd from the loaded program itself is being passed down to cls_bpf, which looks up struct bpf_prog from the fd store, and holds reference, so that it stays available also after tc program lifetime. On tc filter destruction, it will then drop its reference. Moreover, I've also added the optional possibility to annotate an eBPF filter with a name (e.g. path to object file, or something else if preferred) so that when tc dumps currently installed filters, some more context can be given to an admin for a given instance (as opposed to just the file descriptor number). Last but not least, bpf_prog_get() and bpf_prog_put() needed to be exported, so that eBPF can be used from cls_bpf built as a module. Thanks to `60a3b2253c` ("net: bpf: make eBPF interpreter images read-only") I think this is of no concern since anything wanting to alter eBPF opcode after verification stage would crash the kernel. [1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpf Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-01 14:05:19 -05:00
Daniel Borkmann	24701ecea7	ebpf: move read-only fields to bpf_prog and shrink bpf_prog_aux is_gpl_compatible and prog_type should be moved directly into bpf_prog as they stay immutable during bpf_prog's lifetime, are core attributes and they can be locked as read-only later on via bpf_prog_select_runtime(). With a bit of rearranging, this also allows us to shrink bpf_prog_aux to exactly 1 cacheline. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-01 14:05:19 -05:00
Daniel Borkmann	96be4325f4	ebpf: add sched_cls_type and map it to sk_filter's verifier ops As discussed recently and at netconf/netdev01, we want to prevent making bpf_verifier_ops registration available for modules, but have them at a controlled place inside the kernel instead. The reason for this is, that out-of-tree modules can go crazy and define and register any verfifier ops they want, doing all sorts of crap, even bypassing available GPLed eBPF helper functions. We don't want to offer such a shiny playground, of course, but keep strict control to ourselves inside the core kernel. This also encourages us to design eBPF user helpers carefully and generically, so they can be shared among various subsystems using eBPF. For the eBPF traffic classifier (cls_bpf), it's a good start to share the same helper facilities as we currently do in eBPF for socket filters. That way, we have BPF_PROG_TYPE_SCHED_CLS look like it's own type, thus one day if there's a good reason to diverge the set of helper functions from the set available to socket filters, we keep ABI compatibility. In future, we could place all bpf_prog_type_list at a central place, perhaps. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-01 14:05:19 -05:00
Daniel Borkmann	d4052c4aea	ebpf: remove CONFIG_BPF_SYSCALL ifdefs in socket filter code This gets rid of CONFIG_BPF_SYSCALL ifdefs in the socket filter code, now that the BPF internal header can deal with it. While going over it, I also changed eBPF related functions to a sk_filter prefix to be more consistent with the rest of the file. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-01 14:05:19 -05:00
Daniel Borkmann	a2c83fff58	ebpf: constify various function pointer structs We can move bpf_map_ops and bpf_verifier_ops and other structs into ro section, bpf_map_type_list and bpf_prog_type_list into read mostly. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-01 14:05:18 -05:00
Florian Westphal	765dd3bb44	rxrpc: don't multiply with HZ twice rxrpc_resend_timeout has an initial value of 4 * HZ; use it as-is. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-01 13:40:23 -05:00
Florian Westphal	c03ae533a9	rxrpc: terminate retrans loop when sending of skb fails Typo, 'stop' is never set to true. Seems intent is to not attempt to retransmit more packets after sendmsg returns an error. This change is based on code inspection only. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-01 13:40:23 -05:00
Arvid Brodin	56b08fdcf6	net/hsr: Fix NULL pointer dereference and refcnt bugs when deleting a HSR interface. To repeat: $ sudo ip link del hsr0 BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffff8187f495>] hsr_del_port+0x15/0xa0 etc... Bug description: As part of the hsr master device destruction, hsr_del_port() is called for each of the hsr ports. At each such call, the master device is updated regarding features and mtu. When the master device is freed before the slave interfaces, master will be NULL in hsr_del_port(), which led to a NULL pointer dereference. Additionally, dev_put() was called on the master device itself in hsr_del_port(), causing a refcnt error. A third bug in the same code path was that the rtnl lock was not taken before hsr_del_port() was called as part of hsr_dev_destroy(). The reporter (Nicolas Dichtel) also said: "hsr_netdev_notify() supposes that the port will always be available when the notification is for an hsr interface. It's wrong. For example, netdev_wait_allrefs() may resend NETDEV_UNREGISTER.". As a precaution against this, a check for port == NULL was added in hsr_dev_notify(). Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Fixes: `51f3c60531` ("net/hsr: Move slave init to hsr_slave.c.") Signed-off-by: Arvid Brodin <arvid.brodin@alten.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-01 13:40:23 -05:00
Eric Dumazet	cac5e65e8a	net: do not use rcu in rtnl_dump_ifinfo() We did a failed attempt in the past to only use rcu in rtnl dump operations (commit `e67f88dd12` "net: dont hold rtnl mutex during netlink dump callbacks") Now that dumps are holding RTNL anyway, there is no need to also use rcu locking, as it forbids any scheduling ability, like GFP_KERNEL allocations that controlling path should use instead of GFP_ATOMIC whenever possible. This should fix following splat Cong Wang reported : [ INFO: suspicious RCU usage. ] 3.19.0+ #805 Tainted: G W include/linux/rcupdate.h:538 Illegal context switch in RCU read-side critical section! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by ip/771: #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff8182b8f4>] netlink_dump+0x21/0x26c #1: (rcu_read_lock){......}, at: [<ffffffff817d785b>] rcu_read_lock+0x0/0x6e stack backtrace: CPU: 3 PID: 771 Comm: ip Tainted: G W 3.19.0+ #805 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 0000000000000001 ffff8800d51e7718 ffffffff81a27457 0000000029e729e6 ffff8800d6108000 ffff8800d51e7748 ffffffff810b539b ffffffff820013dd 00000000000001c8 0000000000000000 ffff8800d7448088 ffff8800d51e7758 Call Trace: [<ffffffff81a27457>] dump_stack+0x4c/0x65 [<ffffffff810b539b>] lockdep_rcu_suspicious+0x107/0x110 [<ffffffff8109796f>] rcu_preempt_sleep_check+0x45/0x47 [<ffffffff8109e457>] ___might_sleep+0x1d/0x1cb [<ffffffff8109e67d>] __might_sleep+0x78/0x80 [<ffffffff814b9b1f>] idr_alloc+0x45/0xd1 [<ffffffff810cb7ab>] ? rcu_read_lock_held+0x3b/0x3d [<ffffffff814b9f9d>] ? idr_for_each+0x53/0x101 [<ffffffff817c1383>] alloc_netid+0x61/0x69 [<ffffffff817c14c3>] __peernet2id+0x79/0x8d [<ffffffff817c1ab7>] peernet2id+0x13/0x1f [<ffffffff817d8673>] rtnl_fill_ifinfo+0xa8d/0xc20 [<ffffffff810b17d9>] ? __lock_is_held+0x39/0x52 [<ffffffff817d894f>] rtnl_dump_ifinfo+0x149/0x213 [<ffffffff8182b9c2>] netlink_dump+0xef/0x26c [<ffffffff8182bcba>] netlink_recvmsg+0x17b/0x2c5 [<ffffffff817b0adc>] __sock_recvmsg+0x4e/0x59 [<ffffffff817b1b40>] sock_recvmsg+0x3f/0x51 [<ffffffff817b1f9a>] ___sys_recvmsg+0xf6/0x1d9 [<ffffffff8115dc67>] ? handle_pte_fault+0x6e1/0xd3d [<ffffffff8100a3a0>] ? native_sched_clock+0x35/0x37 [<ffffffff8109f45b>] ? sched_clock_local+0x12/0x72 [<ffffffff8109f6ac>] ? sched_clock_cpu+0x9e/0xb7 [<ffffffff810cb7ab>] ? rcu_read_lock_held+0x3b/0x3d [<ffffffff811abde8>] ? __fcheck_files+0x4c/0x58 [<ffffffff811ac556>] ? __fget_light+0x2d/0x52 [<ffffffff817b376f>] __sys_recvmsg+0x42/0x60 [<ffffffff817b379f>] SyS_recvmsg+0x12/0x1c Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `0c7aecd4bd` ("netns: add rtnl cmd to add and get peer netns ids") Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-01 00:07:00 -05:00
David S. Miller	32034e0580	A few patches have accumulated, among them the fix for Linus's four-way-handshake problem. The others are various small fixes for problems all over, nothing really stands out. -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJU8HQ5AAoJEDBSmw7B7bqrGeUQAK7O/UliGPxPNSMs1Mmb+Fj/ GT51sGhZLYXUj2nMMFTTPuhR5qXdgSsBaonitNQaAmJtMtmO6R21rcd74NDB29i7 1xHgtdxJDyDYh6zWUX6+IETV/eRHuS7vlfiiBY/PHAymAQZVa1doC5nN0yCfPHkd XanD4HRXEZppiw+OVh3VkgvxQZkhqmExQDdYZYWGVm8b3cspqa0l/e9WpLrU/Z19 15liupuhINcDAl5KAsR0oaBy/vUsETA6+H9K4fzZG6bemlaD5T+nkNqjlEPbw806 sMxiWZ7jrI1n5v2KiNHkKTJakzHsz08zb9nooA9swusSImflr8BUm3lNz+IVsqtL zBvtMEiGwNlGkSLdHieICBktXw7/Lw2VUSKgWlRj1LZSXfd3kISXrwdR46NOr93e cBzaGn7H8YrVM2Sl/EraRimYL9womICPr6+flE+YAeRnZzsAFrBd9qegc2N3LTt+ NtTqS/JbGAKyOS84JCge+1omGOKPIuQGBOtEIy1U0zlxibZzEzCulrKSeGjQ8o3Q dfrMYEwpicK4ADh7wxeslD6QlgkcQ/Ft7agkP+ps4m5ngFXo3swup+hxNMDoyyi3 BYiOGDum6SBzrOK8bN4hv3jEHK7e3fo68NeLhJ34Ap4K7jY9kOi8sk7WY8NerZBI 19oVt0+0yyzKVlaI7IQE =H3+7 -----END PGP SIGNATURE----- Merge tag 'mac80211-for-davem-2015-02-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A few patches have accumulated, among them the fix for Linus's four-way-handshake problem. The others are various small fixes for problems all over, nothing really stands out. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-28 23:33:53 -05:00
Eric Dumazet	74abc20ced	tcp: cleanup static functions tcp_fastopen_create_child() is static and should not be exported. tcp4_gso_segment() and tcp6_gso_segment() should be static. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-28 16:56:51 -05:00
Eric W. Biederman	06615bed60	net: Verify permission to link_net in newlink When applicable verify that the caller has permisson to the underlying network namespace for a newly created network device. Similary checks exist for the network namespace a network device will be created in. Fixes: `317f4810e4` ("rtnl: allow to create device with IFLA_LINK_NETNSID set") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-28 15:14:44 -05:00
Eric W. Biederman	505ce4154a	net: Verify permission to dest_net in newlink When applicable verify that the caller has permision to create a network device in another network namespace. This check is already present when moving a network device between network namespaces in setlink so all that is needed is to duplicate that check in newlink. This change almost backports cleanly, but there are context conflicts as the code that follows was added in v4.0-rc1 Fixes: `b51642f6d7` net: Enable a userns root rtnl calls that are safe for unprivilged users Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-28 15:14:44 -05:00
Eric Dumazet	a0ea700e40	tcp: tso: allow CA_CWR state in tcp_tso_should_defer() Another TCP issue is triggered by ECN. Under pressure, receiver gets ECN marks, and send back ACK packets with ECE TCP flag. Senders enter CA_CWR state. In this state, tcp_tso_should_defer() is short cut : if (icsk->icsk_ca_state != TCP_CA_Open) goto send_now; This means that about all ACK packets we receive are triggering a partial send, and because cwnd is kept small, we can only send a small amount of data for each incoming ACK, which in return generate more ACK packets. Allowing CA_Open and CA_CWR states to enable TSO defer in tcp_tso_should_defer() brings performance back : TSO autodefer has more chance to defer under pressure. This patch increases TSO and LRO/GRO efficiency back to normal levels, and does not impact overall ECN behavior. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-28 15:10:39 -05:00
Eric Dumazet	50c8339e92	tcp: tso: restore IW10 after TSO autosizing With sysctl_tcp_min_tso_segs being 4, it is very possible that tcp_tso_should_defer() decides not sending last 2 MSS of initial window of 10 packets. This also applies if autosizing decides to send X MSS per GSO packet, and cwnd is not a multiple of X. This patch implements an heuristic based on age of first skb in write queue : If it was sent very recently (less than half srtt), we can predict that no ACK packet will come in less than half rtt, so deferring might cause an under utilization of our window. This is visible on initial send (IW10) on web servers, but more generally on some RPC, as the last part of the message might need an extra RTT to get delivered. Tested: Ran following packetdrill test // A simple server-side test that sends exactly an initial window (IW10) // worth of packets. `sysctl -e -q net.ipv4.tcp_min_tso_segs=4` 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +.1 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6> +.1 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 +0 write(4, ..., 14600) = 14600 +0 > . 1:5841(5840) ack 1 win 457 +0 > . 5841:11681(5840) ack 1 win 457 // Following packet should be sent right now. +0 > P. 11681:14601(2920) ack 1 win 457 +.1 < . 1:1(0) ack 14601 win 257 +0 close(4) = 0 +0 > F. 14601:14601(0) ack 1 +.1 < F. 1:1(0) ack 14602 win 257 +0 > . 14602:14602(0) ack 2 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-28 15:10:39 -05:00
Eric Dumazet	5f852eb536	tcp: tso: remove tp->tso_deferred TSO relies on ability to defer sending a small amount of packets. Heuristic is to wait for future ACKS in hope to send more packets at once. Current algorithm uses a per socket tso_deferred field as a pseudo timer. This pseudo timer relies on future ACK, but there is no guarantee we receive them in time. Fix would be to use a real timer, but cost of such timer is probably too expensive for typical cases. This patch changes the logic to test the time of last transmit, because we should not add bursts of more than 1ms for any given flow. We've used this patch for about two years at Google, before FQ/pacing as it would reduce a fair amount of bursts. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-28 15:10:39 -05:00
Erik Hugne	d76a436d50	tipc: make media address offset a common define With the exception of infiniband media which does not use media offsets, the media address is always located at offset 4 in the media info field as defined by the protocol, so we move the definition to the generic bearer.h Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 18:18:48 -05:00
Erik Hugne	91e2eb5684	tipc: rename media/msg related definitions The TIPC_MEDIA_ADDR_SIZE and TIPC_MEDIA_ADDR_OFFSET names are misleading, as they actually define the size and offset of the whole media info field and not the address part. This patch does not have any functional changes. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 18:18:48 -05:00
Erik Hugne	afaa3f65f6	tipc: purge links when bearer is disabled If a bearer is disabled by manual intervention, all links over that bearer should be purged, indicated with the 'shutting_down' flag. Otherwise tipc will get confused if a new bearer is enabled using a different media type. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 18:18:47 -05:00
Erik Hugne	7fe8097cef	tipc: fix nullpointer bug when subscribing to events If a subscription request is sent to a topology server connection, and any error occurs (malformed request, oom or limit reached) while processing this request, TIPC should terminate the subscriber connection. While doing so, it tries to access fields in an already freed (or never allocated) subscription element leading to a nullpointer exception. We fix this by removing the subscr_terminate function and terminate the connection immediately upon any subscription failure. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 18:18:47 -05:00
Erik Hugne	3622c36f37	tipc: only create header copy for name distr messages The TIPC name distributor pushes topology updates to the cluster neighbors. Currently this is done in a unicast manner, and the skb holding the update is cloned for each cluster member. This is unnecessary, as we only modify the destnode field in the header so we change it to do pskb_copy instead. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 18:18:47 -05:00
Alexander Duyck	79e5ad2ceb	fib_trie: Remove leaf_info At this point the leaf_info hash is redundant. By adding the suffix length to the fib_alias hash list we no longer have need of leaf_info as we can determine the prefix length from fa_slen. So we can compress things by dropping the leaf_info structure from fib_trie and instead directly connect the leaves to the fib_alias hash list. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 16:37:07 -05:00
Alexander Duyck	9b6ebad5c3	fib_trie: Add slen to fib alias Make use of an empty spot in the alias to store the suffix length so that we don't need to pull that information from the leaf_info structure. This patch also makes a slight change to the user statistics. Instead of incrementing semantic_match_miss once per leaf_info miss we now just increment it once per leaf if a match was not found. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 16:37:07 -05:00
Alexander Duyck	5786ec6054	fib_trie: Replace plen with slen in leaf_info This replaces the prefix length variable in the leaf_info structure with a suffix length value, or host identifier length in bits. By doing this it makes it easier to sort out since the tnodes and leaf are carrying this value as well since it is compatible with the ->pos field in tnodes. I also cleaned up one spot that had some list manipulation that could be simplified. I basically updated it so that we just use hlist_add_head_rcu instead of calling hlist_add_before_rcu on the first node in the list. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 16:37:06 -05:00
Alexander Duyck	56315f9e6e	fib_trie: Convert fib_alias to hlist from list There isn't any advantage to having it as a list and by making it an hlist we make the fib_alias more compatible with the list_info in terms of the type of list used. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 16:37:06 -05:00
Madhu Challa	93a714d6b5	multicast: Extend ip address command to enable multicast group join/leave on Joining multicast group on ethernet level via "ip maddr" command would not work if we have an Ethernet switch that does igmp snooping since the switch would not replicate multicast packets on ports that did not have IGMP reports for the multicast addresses. Linux vxlan interfaces created via "ip link add vxlan" have the group option that enables then to do the required join. By extending ip address command with option "autojoin" we can get similar functionality for openvswitch vxlan interfaces as well as other tunneling mechanisms that need to receive multicast traffic. The kernel code is structured similar to how the vxlan driver does a group join / leave. example: ip address add 224.1.1.10/24 dev eth5 autojoin ip address del 224.1.1.10/24 dev eth5 Signed-off-by: Madhu Challa <challa@noironetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 16:25:25 -05:00
Madhu Challa	46a4dee074	igmp v6: add __ipv6_sock_mc_join and __ipv6_sock_mc_drop Based on the igmp v4 changes from Eric Dumazet. 959d10f6bbf6("igmp: add __ip_mc_{join\|leave}_group()") These changes are needed to perform igmp v6 join/leave while RTNL is held. Make ipv6_sock_mc_join and ipv6_sock_mc_drop wrappers around __ipv6_sock_mc_join and __ipv6_sock_mc_drop to avoid proliferation of work queues. Signed-off-by: Madhu Challa <challa@noironetworks.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 16:25:24 -05:00
Daniel Borkmann	4c4b52d9b2	rhashtable: remove indirection for grow/shrink decision functions Currently, all real users of rhashtable default their grow and shrink decision functions to rht_grow_above_75() and rht_shrink_below_30(), so that there's currently no need to have this explicitly selectable. It can/should be generic and private inside rhashtable until a real use case pops up. Since we can make this private, we'll save us this additional indirection layer and can improve insertion/deletion time as well. Reference: http://patchwork.ozlabs.org/patch/443040/ Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 16:06:02 -05:00

... 3 4 5 6 7 ...

36939 Commits (d95797252a9d967d462d9581fb72546d6a92e14b)