redonkable/alistair23-linux

Author	SHA1	Message	Date
Martin KaFai Lau	5973fb1e24	ipv6: Check expire on DST_NOCACHE route Since the expires of the DST_NOCACHE rt can be set during the ip6_rt_update_pmtu(), we also need to consider the expires value when doing ip6_dst_check(). This patches creates __rt6_check_expired() to only check the expire value (if one exists) of the current rt. In rt6_dst_from_check(), it adds __rt6_check_expired() as one of the condition check. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-15 17:12:37 -05:00
Martin KaFai Lau	0d3f6d297b	ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 tree The original bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1272571 The setup has a IPv4 GRE tunnel running in a IPSec. The bug happens when ndisc starts sending router solicitation at the gre interface. The simplified oops stack is like: __lock_acquire+0x1b2/0x1c30 lock_acquire+0xb9/0x140 _raw_write_lock_bh+0x3f/0x50 __ip6_ins_rt+0x2e/0x60 ip6_ins_rt+0x49/0x50 ~~~~~~~~ __ip6_rt_update_pmtu.part.54+0x145/0x250 ip6_rt_update_pmtu+0x2e/0x40 ~~~~~~~~ ip_tunnel_xmit+0x1f1/0xf40 __gre_xmit+0x7a/0x90 ipgre_xmit+0x15a/0x220 dev_hard_start_xmit+0x2bd/0x480 __dev_queue_xmit+0x696/0x730 dev_queue_xmit+0x10/0x20 neigh_direct_output+0x11/0x20 ip6_finish_output2+0x21f/0x770 ip6_finish_output+0xa7/0x1d0 ip6_output+0x56/0x190 ~~~~~~~~ ndisc_send_skb+0x1d9/0x400 ndisc_send_rs+0x88/0xc0 ~~~~~~~~ The rt passed to ip6_rt_update_pmtu() is created by icmp6_dst_alloc() and it is not managed by the fib6 tree, so its rt6i_table == NULL. When __ip6_rt_update_pmtu() creates a RTF_CACHE clone, the newly created clone also has rt6i_table == NULL and it causes the ip6_ins_rt() oops. During pmtu update, we only want to create a RTF_CACHE clone from a rt which is currently managed (or owned) by the fib6 tree. It means either rt->rt6i_node != NULL or rt is a RTF_PCPU clone. It is worth to note that rt6i_table may not be NULL even it is not (yet) managed by the fib6 tree (e.g. addrconf_dst_alloc()). Hence, rt6i_node is a better check instead of rt6i_table. Fixes: `45e4fd2668` ("ipv6: Only create RTF_CACHE routes after encountering pmtu") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reported-by: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu> Cc: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-15 17:12:37 -05:00
David S. Miller	73186df8d7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Minor overlapping changes in net/ipv4/ipmr.c, in 'net' we were fixing the "BH-ness" of the counter bumps whilst in 'net-next' the functions were modified to take an explicit 'net' parameter. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-03 13:41:45 -05:00
Matthias Schiffer	ec13ad1d70	ipv6: fix crash on ICMPv6 redirects with prohibited/blackholed source There are other error values besides ip6_null_entry that can be returned by ip6_route_redirect(): fib6_rule_action() can also result in ip6_blk_hole_entry and ip6_prohibit_entry if such ip rules are installed. Only checking for ip6_null_entry in rt6_do_redirect() causes ip6_ins_rt() to be called with rt->rt6i_table == NULL in these cases, making the kernel crash. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 16:30:15 -05:00
David S. Miller	ba3e2084f2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/ipv6/xfrm6_output.c net/openvswitch/flow_netlink.c net/openvswitch/vport-gre.c net/openvswitch/vport-vxlan.c net/openvswitch/vport.c net/openvswitch/vport.h The openvswitch conflicts were overlapping changes. One was the egress tunnel info fix in 'net' and the other was the vport ->send() op simplification in 'net-next'. The xfrm6_output.c conflicts was also a simplification overlapping a bug fix. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-24 06:54:12 -07:00
David Ahern	d46a9d678e	net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr set `741a11d9e4` ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set") adds the RT6_LOOKUP_F_IFACE flag to make device index mismatch fatal if oif is given. Hajime reported that this change breaks the Mobile IPv6 use case that wants to force the message through one interface yet use the source address from another interface. Handle this case by only adding the flag if oif is set and saddr is not set. Fixes: `741a11d9e4` ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set") Cc: Hajime Tazaki <thehajime@gmail.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:36:19 -07:00
David Ahern	f1900fb5ec	net: Really fix vti6 with oif in dst lookups `6e28b00082` ("net: Fix vti use case with oif in dst lookups for IPv6") is missing the checks on FLOWI_FLAG_SKIP_NH_OIF. Add them. Fixes: `42a7b32b73` ("xfrm: Add oif to dst lookups") Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 19:04:54 -07:00
David S. Miller	26440c835f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/asix_common.c net/ipv4/inet_connection_sock.c net/switchdev/switchdev.c In the inet_connection_sock.c case the request socket hashing scheme is completely different in net-next. The other two conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-20 06:08:27 -07:00
Martin KaFai Lau	0a1f596200	ipv6: Initialize rt6_info properly in ip6_blackhole_route() ip6_blackhole_route() does not initialize the newly allocated rt6_info properly. This patch: 1. Call rt6_info_init() to initialize rt6i_siblings and rt6i_uncached 2. The current rt->dst._metrics init code is incorrect: - 'rt->dst._metrics = ort->dst._metris' is not always safe - Not sure what dst_copy_metrics() is trying to do here considering ip6_rt_blackhole_cow_metrics() always returns NULL Fix: - Always do dst_copy_metrics() - Replace ip6_rt_blackhole_cow_metrics() with dst_cow_metrics_generic() 3. Mask out the RTF_PCPU bit from the newly allocated blackhole route. This bug triggers an oops (reported by Phil Sutter) in rt6_get_cookie(). It is because RTF_PCPU is set while rt->dst.from is NULL. Fixes: `d52d3997f8` ("ipv6: Create percpu rt6_info") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reported-by: Phil Sutter <phil@nwl.cc> Tested-by: Phil Sutter <phil@nwl.cc> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Julian Anastasov <ja@ssi.bg> Cc: Phil Sutter <phil@nwl.cc> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-16 00:39:16 -07:00
Martin KaFai Lau	ebfa45f0d9	ipv6: Move common init code for rt6_info to a new function rt6_info_init() Introduce rt6_info_init() to do the common init work for 'struct rt6_info' (after calling dst_alloc). It is a prep work to fix the rt6_info init logic in the ip6_blackhole_route(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Julian Anastasov <ja@ssi.bg> Cc: Phil Sutter <phil@nwl.cc> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-16 00:39:14 -07:00
David Ahern	ca254490c8	net: Add VRF support to IPv6 stack As with IPv4 support for VRFs added to IPv6 stack by replacing hardcoded table ids with possibly device specific ones and manipulating the oif in the flowi6. The flow flags are used to skip oif compare in nexthop lookups if the device is enslaved to a VRF via the L3 master device. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:55:08 -07:00
Eric W. Biederman	e332bc67cf	ipv6: Don't call with rt6_uncached_list_flush_dev As originally written rt6_uncached_list_flush_dev makes no sense when called with dev == NULL as it attempts to flush all uncached routes regardless of network namespace when dev == NULL. Which is simply incorrect behavior. Furthermore at the point rt6_ifdown is called with dev == NULL no more network devices exist in the network namespace so even if the code in rt6_uncached_list_flush_dev were to attempt something sensible it would be meaningless. Therefore remove support in rt6_uncached_list_flush_dev for handling network devices where dev == NULL, and only call rt6_uncached_list_flush_dev when rt6_ifdown is called with a network device. Fixes: `8d0b94afdc` ("ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: Martin KaFai Lau <kafai@fb.com> Tested-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:52:40 -07:00
Roopa Prabhu	8c5b83f0f2	ipv6 route: use err pointers instead of returning pointer by reference This patch makes ip6_route_info_create return err pointer instead of returning the rt pointer by reference as suggested by Dave Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:47:34 -07:00
Eric W. Biederman	ede2059dba	dst: Pass net into dst->output The network namespace is already passed into dst_output pass it into dst->output lwt->output and friends. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:27:03 -07:00
Eric W. Biederman	9f8955cc46	ipv6: Merge __ip6_local_out and __ip6_local_out_sk Only __ip6_local_out_sk has callers so rename __ip6_local_out_sk __ip6_local_out and remove the previous __ip6_local_out. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:58 -07:00
Eric W. Biederman	4ebdfba73c	dst: Pass a sk into .local_out For consistency with the other similar methods in the kernel pass a struct sock into the dst_ops .local_out method. Simplifying the socket passing case is needed a prequel to passing a struct net reference into .local_out. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:55 -07:00
David S. Miller	f6d3125fa3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/dsa/slave.c net/dsa/slave.c simply had overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-02 07:21:25 -07:00
David Ahern	741a11d9e4	net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set Wolfgang reported that IPv6 stack is ignoring oif in output route lookups: With ipv6, ip -6 route get always returns the specific route. $ ip -6 r 2001:db8:e2::1 dev enp2s0 proto kernel metric 256 2001:db8:e2::/64 dev enp2s0 metric 1024 2001:db8:e3::1 dev enp3s0 proto kernel metric 256 2001:db8:e3::/64 dev enp3s0 metric 1024 fe80::/64 dev enp3s0 proto kernel metric 256 default via 2001:db8:e3::255 dev enp3s0 metric 1024 $ ip -6 r get 2001:db8:e2::100 2001:db8:e2::100 from :: dev enp2s0 src 2001:db8:e3::1 metric 0 cache $ ip -6 r get 2001:db8:e2::100 oif enp3s0 2001:db8:e2::100 from :: dev enp2s0 src 2001:db8:e3::1 metric 0 cache The stack does consider the oif but a mismatch in rt6_device_match is not considered fatal because RT6_LOOKUP_F_IFACE is not set in the flags. Cc: Wolfgang Nothdurft <netdev@linux-dude.de> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 15:01:10 -07:00
David Ahern	17fb0b2b90	net: Remove redundant oif checks in rt6_device_match The oif has already been checked that it is non-zero; the 2 additional checks on oif within that if (oif) {...} block are redundant. CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:30:24 -07:00
David S. Miller	4963ed48f2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/ipv4/arp.c The net/ipv4/arp.c conflict was one commit adding a new local variable while another commit was deleting one. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-26 16:08:27 -07:00
Jiri Benc	38cf595b19	ipv6: remove unused neigh parameter from ndisc functions Since commit `12fd84f438` ("ipv6: Remove unused neigh argument for icmp6_dst_alloc() and its callers."), the neigh parameter of ndisc_send_na and ndisc_send_ns is unused. CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-24 12:26:08 -07:00
Tom Herbert	644d0e6569	ipv6 Use get_hash_from_flowi6 for rt6 hash In rt6_info_hash_nhsfn replace the custom hashing over flowi6 that is using xor with a call to common function get_hash_from_flowi6. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-23 14:21:09 -07:00
Nikola Forró	0315e38270	net: Fix behaviour of unreachable, blackhole and prohibit routes Man page of ip-route(8) says following about route types: unreachable - these destinations are unreachable. Packets are dis‐ carded and the ICMP message host unreachable is generated. The local senders get an EHOSTUNREACH error. blackhole - these destinations are unreachable. Packets are dis‐ carded silently. The local senders get an EINVAL error. prohibit - these destinations are unreachable. Packets are discarded and the ICMP message communication administratively prohibited is generated. The local senders get an EACCES error. In the inet6 address family, this was correct, except the local senders got ENETUNREACH error instead of EHOSTUNREACH in case of unreachable route. In the inet address family, all three route types generated ICMP message net unreachable, and the local senders got ENETUNREACH error. In both address families all three route types now behave consistently with documentation. Signed-off-by: Nikola Forró <nforro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-20 21:45:08 -07:00
Roopa Prabhu	37a1d3611c	ipv6: include NLM_F_REPLACE in route replace notifications This patch adds NLM_F_REPLACE flag to ipv6 route replace notifications. This makes nlm_flags in ipv6 replace notifications consistent with ipv4. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 15:00:27 -07:00
Martin KaFai Lau	8e3d5be736	ipv6: Avoid double dst_free It is a prep work to get dst freeing from fib tree undergo a rcu grace period. The following is a common paradigm: if (ip6_del_rt(rt)) dst_free(rt) which means, if rt cannot be deleted from the fib tree, dst_free(rt) now. 1. We don't know the ip6_del_rt(rt) failure is because it was not managed by fib tree (e.g. DST_NOCACHE) or it had already been removed from the fib tree. 2. If rt had been managed by the fib tree, ip6_del_rt(rt) failure means dst_free(rt) has been called already. A second dst_free(rt) is not always obviously safe. The rt may have been destroyed already. 3. If rt is a DST_NOCACHE, dst_free(rt) should not be called. 4. It is a stopper to make dst freeing from fib tree undergo a rcu grace period. This patch is to use a DST_NOCACHE flag to indicate a rt is not managed by the fib tree. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-15 14:53:05 -07:00
Wu Fengguang	52fe51f852	ipv6: fix ifnullfree.cocci warnings net/ipv6/route.c:2946:3-8: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values. NULL check before some freeing functions is not needed. Based on checkpatch warning "kfree(NULL) is safe this check is probably not required" and kfreeaddr.cocci by Julia Lawall. Generated by: scripts/coccinelle/free/ifnullfree.cocci CC: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-09 17:21:01 -07:00
Roopa Prabhu	6b9ea5a64e	ipv6: fix multipath route replace error recovery Problem: The ecmp route replace support for ipv6 in the kernel, deletes the existing ecmp route too early, ie when it installs the first nexthop. If there is an error in installing the subsequent nexthops, its too late to recover the already deleted existing route leaving the fib in an inconsistent state. This patch reduces the possibility of this by doing the following: a) Changes the existing multipath route add code to a two stage process: build rt6_infos + insert them ip6_route_add rt6_info creation code is moved into ip6_route_info_create. b) This ensures that most errors are caught during building rt6_infos and we fail early c) Separates multipath add and del code. Because add needs the special two stage mode in a) and delete essentially does not care. d) In any event if the code fails during inserting a route again, a warning is printed (This should be unlikely) Before the patch: $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 /* Try replacing the route with a duplicate nexthop / $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1 RTNETLINK answers: File exists $ip -6 route show / previously added ecmp route 3000:1000:1000:1000::2 dissappears from * kernel / After the patch: $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 / Try replacing the route with a duplicate nexthop */ $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1 RTNETLINK answers: File exists $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 Fixes: `2759647247` ("ipv6: fix ECMP route replacement") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-09 14:07:50 -07:00
Daniel Borkmann	c3a8d94746	tcp: use dctcp if enabled on the route to the initiator Currently, the following case doesn't use DCTCP, even if it should: A responder has f.e. Cubic as system wide default, but for a specific route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The initiating host then uses DCTCP as congestion control, but since the initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok, and we have to fall back to Reno after 3WHS completes. We were thinking on how to solve this in a minimal, non-intrusive way without bloating tcp_ecn_create_request() needlessly: lets cache the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0) is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows to only do a single metric feature lookup inside tcp_ecn_create_request(). Joint work with Florian Westphal. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-31 12:34:00 -07:00
Daniel Borkmann	b8d3e4163a	fib, fib6: reject invalid feature bits Feature bits that are invalid should not be accepted by the kernel, only the lower 4 bits may be configured, but not the remaining ones. Even from these 4, 2 of them are unused. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-31 12:34:00 -07:00
Daniel Borkmann	1bb14807bc	net: fib6: reduce identation in ip6_convert_metrics Reduce the identation a bit, there's no need to artificically have it increased. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-31 12:34:00 -07:00
Jiri Benc	46fa062ad6	ip_tunnels: convert the mode field of ip_tunnel_info to flags The mode field holds a single bit of information only (whether the ip_tunnel_info struct is for rx or tx). Change the mode field to bit flags. This allows more mode flags to be added. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-29 13:07:54 -07:00
Tom Herbert	127eb7cd3c	lwt: Add cfg argument to build_state Add cfg and family arguments to lwt build state functions. cfg is a void pointer and will either be a pointer to a fib_config or fib6_config structure. The family parameter indicates which one (either AF_INET or AF_INET6). LWT encpasulation implementation may use the fib configuration to build the LWT state. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-24 10:34:40 -07:00
David S. Miller	dc25b25897	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/qmi_wwan.c Overlapping additions of new device IDs to qmi_wwan.c Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-21 11:44:04 -07:00
Jiri Benc	904af04d30	ipv6: route: extend flow representation with tunnel key Use flowi_tunnel in flowi6 similarly to what is done with IPv4. This complements commit `1b7179d3ad` ("route: Extend flow representation with tunnel key"). Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-20 15:42:37 -07:00
Jiri Benc	ab450605b3	ipv6: ndisc: inherit metadata dst when creating ndisc requests If output device wants to see the dst, inherit the dst of the original skb in the ndisc request. This is an IPv6 counterpart of commit `0accfc268f` ("arp: Inherit metadata dst when creating ARP requests"). Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-20 15:42:37 -07:00
Jiri Benc	06e9d040ba	ipv6: drop metadata dst in ip6_route_input The fix in commit `48fb6b5545` is incomplete, as now ip6_route_input can be called with non-NULL dst if it's a metadata dst and the reference is leaked. Drop the reference. Fixes: `48fb6b5545` ("ipv6: fix crash over flow-based vxlan device") Fixes: `ee122c79d4` ("vxlan: Flow based tunneling") CC: Wei-Chun Chao <weichunc@plumgrid.com> CC: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-20 15:42:36 -07:00
Jiri Benc	61adedf3e3	route: move lwtunnel state to dst_entry Currently, the lwtunnel state resides in per-protocol data. This is a problem if we encapsulate ipv6 traffic in an ipv4 tunnel (or vice versa). The xmit function of the tunnel does not know whether the packet has been routed to it by ipv4 or ipv6, yet it needs the lwtstate data. Moving the lwtstate data to dst_entry makes such inter-protocol tunneling possible. As a bonus, this brings a nice diffstat. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-20 15:42:36 -07:00
Tom Herbert	2536862311	lwt: Add support to redirect dst.input This patch adds the capability to redirect dst input in the same way that dst output is redirected by LWT. Also, save the original dst.input and and dst.out when setting up lwtunnel redirection. These can be called by the client as a pass- through. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 21:33:05 -07:00
Martin KaFai Lau	9c7370a166	ipv6: Fix a potential deadlock when creating pcpu rt rt6_make_pcpu_route() is called under read_lock(&table->tb6_lock). rt6_make_pcpu_route() calls ip6_rt_pcpu_alloc(rt) which then calls dst_alloc(). dst_alloc() _may_ call ip6_dst_gc() which takes the write_lock(&tabl->tb6_lock). A visualized version: read_lock(&table->tb6_lock); rt6_make_pcpu_route(); => ip6_rt_pcpu_alloc(); => dst_alloc(); => ip6_dst_gc(); => write_lock(&table->tb6_lock); /* oops */ The fix is to do a read_unlock first before calling ip6_rt_pcpu_alloc(). A reported stack: [141625.537638] INFO: rcu_sched self-detected stall on CPU { 27} (t=60000 jiffies g=4159086 c=4159085 q=2139) [141625.547469] Task dump for CPU 27: [141625.550881] mtr R running task 0 22121 22081 0x00000008 [141625.558069] 0000000000000000 ffff88103f363d98 ffffffff8106e488 000000000000001b [141625.565641] ffffffff81684900 ffff88103f363db8 ffffffff810702b0 0000000008000000 [141625.573220] ffffffff81684900 ffff88103f363de8 ffffffff8108df9f ffff88103f375a00 [141625.580803] Call Trace: [141625.583345] <IRQ> [<ffffffff8106e488>] sched_show_task+0xc1/0xc6 [141625.589650] [<ffffffff810702b0>] dump_cpu_task+0x35/0x39 [141625.595144] [<ffffffff8108df9f>] rcu_dump_cpu_stacks+0x6a/0x8c [141625.601320] [<ffffffff81090606>] rcu_check_callbacks+0x1f6/0x5d4 [141625.607669] [<ffffffff810940c8>] update_process_times+0x2a/0x4f [141625.613925] [<ffffffff8109fbee>] tick_sched_handle+0x32/0x3e [141625.619923] [<ffffffff8109fc2f>] tick_sched_timer+0x35/0x5c [141625.625830] [<ffffffff81094a1f>] __hrtimer_run_queues+0x8f/0x18d [141625.632171] [<ffffffff81094c9e>] hrtimer_interrupt+0xa0/0x166 [141625.638258] [<ffffffff8102bf2a>] local_apic_timer_interrupt+0x4e/0x52 [141625.645036] [<ffffffff8102c36f>] smp_apic_timer_interrupt+0x39/0x4a [141625.651643] [<ffffffff8140b9e8>] apic_timer_interrupt+0x68/0x70 [141625.657895] <EOI> [<ffffffff81346ee8>] ? dst_destroy+0x7c/0xb5 [141625.664188] [<ffffffff813d45b5>] ? fib6_flush_trees+0x20/0x20 [141625.670272] [<ffffffff81082b45>] ? queue_write_lock_slowpath+0x60/0x6f [141625.677140] [<ffffffff8140aa33>] _raw_write_lock_bh+0x23/0x25 [141625.683218] [<ffffffff813d4553>] __fib6_clean_all+0x40/0x82 [141625.689124] [<ffffffff813d45b5>] ? fib6_flush_trees+0x20/0x20 [141625.695207] [<ffffffff813d6058>] fib6_clean_all+0xe/0x10 [141625.700854] [<ffffffff813d60d3>] fib6_run_gc+0x79/0xc8 [141625.706329] [<ffffffff813d0510>] ip6_dst_gc+0x85/0xf9 [141625.711718] [<ffffffff81346d68>] dst_alloc+0x55/0x159 [141625.717105] [<ffffffff813d09b5>] __ip6_dst_alloc.isra.32+0x19/0x63 [141625.723620] [<ffffffff813d1830>] ip6_pol_route+0x36a/0x3e8 [141625.729441] [<ffffffff813d18d6>] ip6_pol_route_output+0x11/0x13 [141625.735700] [<ffffffff813f02c8>] fib6_rule_action+0xa7/0x1bf [141625.741698] [<ffffffff813d18c5>] ? ip6_pol_route_input+0x17/0x17 [141625.748043] [<ffffffff81357c48>] fib_rules_lookup+0xb5/0x12a [141625.754050] [<ffffffff81141628>] ? poll_select_copy_remaining+0xf9/0xf9 [141625.761002] [<ffffffff813f0535>] fib6_rule_lookup+0x37/0x5c [141625.766914] [<ffffffff813d18c5>] ? ip6_pol_route_input+0x17/0x17 [141625.773260] [<ffffffff813d008c>] ip6_route_output+0x7a/0x82 [141625.779177] [<ffffffff813c44c8>] ip6_dst_lookup_tail+0x53/0x112 [141625.785437] [<ffffffff813c45c3>] ip6_dst_lookup_flow+0x2a/0x6b [141625.791604] [<ffffffff813ddaab>] rawv6_sendmsg+0x407/0x9b6 [141625.797423] [<ffffffff813d7914>] ? do_ipv6_setsockopt.isra.8+0xd87/0xde2 [141625.804464] [<ffffffff8139d4b4>] inet_sendmsg+0x57/0x8e [141625.810028] [<ffffffff81329ba3>] sock_sendmsg+0x2e/0x3c [141625.815588] [<ffffffff8132be57>] SyS_sendto+0xfe/0x143 [141625.821063] [<ffffffff813dd551>] ? rawv6_setsockopt+0x5e/0x67 [141625.827146] [<ffffffff8132c9f8>] ? sock_common_setsockopt+0xf/0x11 [141625.833660] [<ffffffff8132c08c>] ? SyS_setsockopt+0x81/0xa2 [141625.839565] [<ffffffff8140ac17>] entry_SYSCALL_64_fastpath+0x12/0x6a Fixes: `d52d3997f8` ("pv6: Create percpu rt6_info") Signed-off-by: Martin KaFai Lau <kafai@fb.com> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 14:28:03 -07:00
Martin KaFai Lau	a73e419563	ipv6: Add rt6_make_pcpu_route() It is a prep work for fixing a potential deadlock when creating a pcpu rt. The current rt6_get_pcpu_route() will also create a pcpu rt if one does not exist. This patch moves the pcpu rt creation logic into another function, rt6_make_pcpu_route(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 14:28:03 -07:00
Martin KaFai Lau	ad70686289	ipv6: Remove un-used argument from ip6_dst_alloc() After `4b32b5ad31` ("ipv6: Stop rt6_info from using inet_peer's metrics"), ip6_dst_alloc() does not need the 'table' argument. This patch cleans it up. Signed-off-by: Martin KaFai Lau <kafai@fb.com> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 14:28:03 -07:00
Andy Gospodarek	35103d1117	net: ipv6 sysctl option to ignore routes when nexthop link is down Like the ipv4 patch with a similar title, this adds a sysctl to allow the user to change routing behavior based on whether or not the interface associated with the nexthop was an up or down link. The default setting preserves the current behavior, but anyone that enables it will notice that nexthops on down interfaces will no longer be selected: net.ipv6.conf.all.ignore_routes_with_linkdown = 0 net.ipv6.conf.default.ignore_routes_with_linkdown = 0 net.ipv6.conf.lo.ignore_routes_with_linkdown = 0 ... When the above sysctls are set, not only will link status be reported to userspace, but an indication that a nexthop is dead and will not be used is also reported. 1000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium 1000::/8 via 8000::2 dev p8p1 metric 1024 pref medium 7000::/8 dev p7p1 proto kernel metric 256 dead linkdown pref medium 8000::/8 dev p8p1 proto kernel metric 256 pref medium 9000::/8 via 8000::2 dev p8p1 metric 2048 pref medium 9000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium fe80::/64 dev p7p1 proto kernel metric 256 dead linkdown pref medium fe80::/64 dev p8p1 proto kernel metric 256 pref medium This also adds devconf support and notification when sysctl values change. v2: drop use of rt6i_nhflags since it is not needed right now Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 21:27:19 -07:00
Andy Gospodarek	cea45e208d	net: track link status of ipv6 nexthops Add support to track current link status of ipv6 nexthops to match recent changes that added support for ipv4 nexthops. This takes a simple approach to track linkdown status for next-hops and simply checks the dev for the dst entry and sets proper flags that to be used in the netlink message. v2: drop use of rt6i_nhflags since it is not needed right now Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 21:27:18 -07:00
David S. Miller	182ad468e7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/cavium/Kconfig The cavium conflict was overlapping dependency changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 16:23:11 -07:00
Florian Westphal	330567b71d	ipv6: don't reject link-local nexthop on other interface `48ed7b26fa` ("ipv6: reject locally assigned nexthop addresses") is too strict; it rejects following corner-case: ip -6 route add default via fe80::1:2:3 dev eth1 [ where fe80::1:2:3 is assigned to a local interface, but not eth1 ] Fix this by restricting search to given device if nh is linklocal. Joint work with Hannes Frederic Sowa. Fixes: `48ed7b26fa` ("ipv6: reject locally assigned nexthop addresses") Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-10 13:29:22 -07:00
Martin KaFai Lau	8d6c31bf57	ipv6: Avoid rt6_probe() taking writer lock in the fast path The patch checks neigh->nud_state before acquiring the writer lock. Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF. 40 udpflood processes and a /64 gateway route are used. The gateway has NUD_PERMANENT. Each of them is run for 30s. At the end, the total number of finished sendto(): Before: 55M After: 95M Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Julian Anastasov <ja@ssi.bg> CC: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-27 01:08:25 -07:00
Martin KaFai Lau	990edb428c	ipv6: Re-arrange code in rt6_probe() It is a prep work for the next patch to remove write_lock from rt6_probe(). 1. Reduce the number of if(neigh) check. From 4 to 1. 2. Bring the write_(un)lock() closer to the operations that the lock is protecting. Hopefully, the above make rt6_probe() more readable. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Julian Anastasov <ja@ssi.bg> Cc: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-27 01:08:25 -07:00
Nicolas Dichtel	5a6228a0b4	lwtunnel: change prototype of lwtunnel_state_get() It saves some lines and simplify a bit the code when the state is returning by this function. It's also useful to handle a NULL entry. To avoid too long lines, I've also renamed lwtunnel_state_get() and lwtunnel_state_put() to lwtstate_get() and lwtstate_put(). CC: Thomas Graf <tgraf@suug.ch> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-27 01:02:49 -07:00
Nicolas Dichtel	d943659508	ipv6: copy lwtstate in ip6_rt_copy_init() We need to copy this field (ip6_rt_cache_alloc() and ip6_rt_pcpu_alloc() use ip6_rt_copy_init() to build a dst). CC: Thomas Graf <tgraf@suug.ch> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Fixes: `19e42e4515` ("ipv6: support for fib route lwtunnel encap attributes") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-27 01:02:49 -07:00
Nicolas Dichtel	6673a9f4e3	ipv6: use lwtunnel_output6() only if flag redirect is set This function make sense only when LWTUNNEL_STATE_OUTPUT_REDIRECT is set. The check is already done in IPv4. CC: Thomas Graf <tgraf@suug.ch> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Fixes: `74a0f2fe8e` ("ipv6: rt6_info output redirect to tunnel output") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-27 01:00:51 -07:00
Roopa Prabhu	74a0f2fe8e	ipv6: rt6_info output redirect to tunnel output This is similar to ipv4 redirect of dst output to lwtunnel output function for encapsulation and xmit. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:04 -07:00
Roopa Prabhu	19e42e4515	ipv6: support for fib route lwtunnel encap attributes This patch adds support in ipv6 fib functions to parse Netlink RTA encap attributes and attach encap state data to rt6_info. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:04 -07:00
Markus Elfring	87775312a8	net-ipv6: Delete an unnecessary check before the function call "free_percpu" The free_percpu() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-03 09:27:42 -07:00
Martin KaFai Lau	d52d3997f8	ipv6: Create percpu rt6_info After the patch 'ipv6: Only create RTF_CACHE routes after encountering pmtu exception', we need to compensate the performance hit (bouncing dst->__refcnt). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:35 -04:00
Martin KaFai Lau	83a09abd1a	ipv6: Break up ip6_rt_copy() This patch breaks up ip6_rt_copy() into ip6_rt_copy_init() and ip6_rt_cache_alloc(). In the later patch, we need to create a percpu rt6_info copy. Hence, refactor the common rt6_info init codes to ip6_rt_copy_init(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:34 -04:00
Martin KaFai Lau	8d0b94afdc	ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister This patch keeps track of the DST_NOCACHE routes in a list and replaces its dev with loopback during the iface down/unregister event. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:34 -04:00
Martin KaFai Lau	3da59bd945	ipv6: Create RTF_CACHE clone when FLOWI_FLAG_KNOWN_NH is set This patch always creates RTF_CACHE clone with DST_NOCACHE when FLOWI_FLAG_KNOWN_NH is set so that the rt6i_dst is set to the fl6->daddr. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Tested-by: Julian Anastasov <ja@ssi.bg> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:34 -04:00
Martin KaFai Lau	45e4fd2668	ipv6: Only create RTF_CACHE routes after encountering pmtu exception This patch creates a RTF_CACHE routes only after encountering a pmtu exception. After ip6_rt_update_pmtu() has inserted the RTF_CACHE route to the fib6 tree, the rt->rt6i_node->fn_sernum is bumped which will fail the ip6_dst_check() and trigger a relookup. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:33 -04:00
Martin KaFai Lau	8b9df26577	ipv6: Combine rt6_alloc_cow and rt6_alloc_clone A prep work for creating RTF_CACHE on exception only. After this patch, the same condition (rt->rt6i_flags & (RTF_NONEXTHOP \| RTF_GATEWAY)) is checked twice. This redundancy will be removed in the later patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:33 -04:00
Martin KaFai Lau	2647a9b070	ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST When creating a RTF_CACHE route, RTF_ANYCAST is set based on rt6i_dst. Also, rt6i_gateway is always set to the nexthop while the nexthop could be a gateway or the rt6i_dst.addr. After removing the rt6i_dst and rt6i_src dependency in the last patch, we also need to stop the caller from depending on rt6i_gateway and RTF_ANYCAST. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:33 -04:00
David S. Miller	36583eb54d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/cadence/macb.c drivers/net/phy/phy.c include/linux/skbuff.h net/ipv4/tcp.c net/switchdev/switchdev.c Switchdev was a case of RTNH_H_{EXTERNAL --> OFFLOAD} renaming overlapping with net-next changes of various sorts. phy.c was a case of two changes, one adding a local variable to a function whilst the second was removing one. tcp.c overlapped a deadlock fix with the addition of new tcp_info statistic values. macb.c involved the addition of two zyncq device entries. skbuff.h involved adding back ipv4_daddr to nf_bridge_info whilst net-next changes put two other existing members of that struct into a union. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-23 01:22:35 -04:00
Florian Westphal	48ed7b26fa	ipv6: reject locally assigned nexthop addresses ip -6 addr add dead::1/128 dev eth0 sleep 5 ip -6 route add default via dead::1/128 -> fails ip -6 addr add dead::1/128 dev eth0 ip -6 route add default via dead::1/128 -> succeeds reason is that if (nonsensensical) route above is added, dead::1 is still subject to DAD, so the route lookup will pick eth0 as outdev due to the prefix route that is added before DAD work is started. Add explicit test that checks if nexthop gateway is a local address. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1167969 Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-21 23:23:38 -04:00
Michal Kubeček	2759647247	ipv6: fix ECMP route replacement When replacing an IPv6 multipath route with "ip route replace", i.e. NLM_F_CREATE \| NLM_F_REPLACE, fib6_add_rt2node() replaces only first matching route without fixing its siblings, resulting in corrupted siblings linked list; removing one of the siblings can then end in an infinite loop. IPv6 ECMP implementation is a bit different from IPv4 so that route replacement cannot work in exactly the same way. This should be a reasonable approximation: 1. If the new route is ECMP-able and there is a matching ECMP-able one already, replace it and all its siblings (if any). 2. If the new route is ECMP-able and no matching ECMP-able route exists, replace first matching non-ECMP-able (if any) or just add the new one. 3. If the new route is not ECMP-able, replace first matching non-ECMP-able route (if any) or add the new route. We also need to remove the NLM_F_REPLACE flag after replacing old route(s) by first nexthop of an ECMP route so that each subsequent nexthop does not replace previous one. Fixes: `51ebd31815` ("ipv6: add support of equal cost multipath (ECMP)") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-20 12:02:26 -04:00
Michal Kubeček	35f1b4e96b	ipv6: do not delete previously existing ECMP routes if add fails If adding a nexthop of an IPv6 multipath route fails, comment in ip6_route_multipath() says we are going to delete all nexthops already added. However, current implementation deletes even the routes it hasn't even tried to add yet. For example, running ip route add 1234:5678::/64 \ nexthop via fe80::aa dev dummy1 \ nexthop via fe80::bb dev dummy1 \ nexthop via fe80::cc dev dummy1 twice results in removing all routes first command added. Limit the second (delete) run to nexthops that succeeded in the first (add) run. Fixes: `51ebd31815` ("ipv6: add support of equal cost multipath (ECMP)") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-20 12:02:25 -04:00
David S. Miller	b04096ff33	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Four minor merge conflicts: 1) qca_spi.c renamed the local variable used for the SPI device from spi_device to spi, meanwhile the spi_set_drvdata() call got moved further up in the probe function. 2) Two changes were both adding new members to codel params structure, and thus we had overlapping changes to the initializer function. 3) 'net' was making a fix to sk_release_kernel() which is completely removed in 'net-next'. 4) In net_namespace.c, the rtnl_net_fill() call for GET operations had the command value fixed, meanwhile 'net-next' adjusted the argument signature a bit. This also matches example merge resolutions posted by Stephen Rothwell over the past two days. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-13 14:31:43 -04:00
Markus Stenberg	e16e888b52	ipv6: Fixed source specific default route handling. If there are only IPv6 source specific default routes present, the host gets -ENETUNREACH on e.g. connect() because ip6_dst_lookup_tail calls ip6_route_output first, and given source address any, it fails, and ip6_route_get_saddr is never called. The change is to use the ip6_route_get_saddr, even if the initial ip6_route_output fails, and then doing ip6_route_output _again_ after we have appropriate source address available. Note that this is '99% fix' to the problem; a correct fix would be to do route lookups only within addrconf.c when picking a source address, and never call ip6_route_output before source address has been populated. Signed-off-by: Markus Stenberg <markus.stenberg@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 15:58:41 -04:00
Martin KaFai Lau	7035870d12	ipv6: Check RTF_LOCAL on rt->rt6i_flags instead of rt->dst.flags In my earlier commit: `653437d02f` ("ipv6: Stop /128 route from disappearing after pmtu update"), there was a horrible typo. Instead of checking RTF_LOCAL on rt->rt6i_flags, it was checked on rt->dst.flags. This patch fixes it. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hajime Tazaki <tazaki@sfc.wide.ad.jp> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-03 21:49:27 -04:00
Martin KaFai Lau	afc4eef80c	ipv6: Remove DST_METRICS_FORCE_OVERWRITE and _rt6i_peer _rt6i_peer is no longer needed after the last patch, 'ipv6: Stop rt6_info from using inet_peer's metrics'. DST_METRICS_FORCE_OVERWRITE is added by commit `e5fd387ad5` ("ipv6: do not overwrite inetpeer metrics prematurely"). Since inetpeer is no longer used for metrics, this bit is also not needed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Michal Kubeček <mkubecek@suse.cz> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-01 20:57:06 -04:00
Martin KaFai Lau	4b32b5ad31	ipv6: Stop rt6_info from using inet_peer's metrics inet_peer is indexed by the dst address alone. However, the fib6 tree could have multiple routing entries (rt6_info) for the same dst. For example, 1. A /128 dst via multiple gateways. 2. A RTF_CACHE route cloned from a /128 route. In the above cases, all of them will share the same metrics and step on each other. This patch will steer away from inet_peer's metrics and use dst_cow_metrics_generic() for everything. Change Highlights: 1. Remove rt6_cow_metrics() which currently acquires metrics from inet_peer for DST_HOST route (i.e. /128 route). 2. Add rt6i_pmtu to take care of the pmtu update to avoid creating a full size metrics just to override the RTAX_MTU. 3. After (2), the RTF_CACHE route can also share the metrics with its dst.from route, by: dst_init_metrics(&cache_rt->dst, dst_metrics_ptr(cache_rt->dst.from), true); 4. Stop creating RTF_CACHE route by cloning another RTF_CACHE route. Instead, directly clone from rt->dst. [ Currently, cloning from another RTF_CACHE is only possible during rt6_do_redirect(). Also, the old clone is removed from the tree immediately after the new clone is added. ] In case of cloning from an older redirect RTF_CACHE, it should work as before. In case of cloning from an older pmtu RTF_CACHE, this patch will forget the pmtu and re-learn it (if there is any) from the redirected route. The _rt6i_peer and DST_METRICS_FORCE_OVERWRITE will be removed in the next cleanup patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-01 20:57:06 -04:00
Martin KaFai Lau	653437d02f	ipv6: Stop /128 route from disappearing after pmtu update This patch is mostly from Steffen Klassert <steffen.klassert@secunet.com>. I only removed the (rt6->rt6i_dst.plen == 128) check from ip6_rt_update_pmtu() because the (rt6->rt6i_flags & RTF_CACHE) test has already implied it. This patch: 1. Create RTF_CACHE route for /128 non local route 2. After (1), all routes that allow pmtu update should have a RTF_CACHE clone. Hence, stop updating MTU for any non RTF_CACHE route. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-01 20:57:06 -04:00
Steffen Klassert	9fbdcfaf97	ipv6: Extend the route lookups to low priority metrics. We search only for routes with highest priority metric in find_rr_leaf(). However if one of these routes is marked as invalid, we may fail to find a route even if there is a appropriate route with lower priority. Then we loose connectivity until the garbage collector deletes the invalid route. This typically happens if a host route expires afer a pmtu event. Fix this by searching also for routes with a lower priority metric. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-01 20:57:06 -04:00
Martin KaFai Lau	1f56a01f4e	ipv6: Consider RTF_CACHE when searching the fib6 tree It is a prep work for the later bug-fix patch which will stop /128 route from disappearing after pmtu update. The later bug-fix patch will allow a /128 route and its RTF_CACHE clone both exist at the same fib6_node. To do this, we need to prepare the existing fib6 tree search to expect RTF_CACHE for /128 route. Note that the fn->leaf is sorted by rt6i_metric. Hence, RTF_CACHE (if there is any) is always at the front. This property leads to the following: 1. When doing ip6_route_del(), it should honor the RTF_CACHE flag which the caller is used to ask for deleting clone or non-clone. The rtm_to_fib6_config() should also check the RTM_F_CLONED and then set RTF_CACHE accordingly so that: - 'ip -6 r del...' will make ip6_route_del() to delete a route and all its clones. Note that its clones is flushed by fib6_del() - 'ip -6 r flush table cache' will make ip6_route_del() to only delete clone(s). 2. Exclude RTF_CACHE from addrconf_get_prefix_route() which should not configure on a cloned route. 3. No change is need for rt6_device_match() since it currently could return a RTF_CACHE clone route, so the later bug-fix patch will not affect it. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-01 20:57:06 -04:00
Jiri Benc	67b61f6c13	netlink: implement nla_get_in_addr and nla_get_in6_addr Those are counterparts to nla_put_in_addr and nla_put_in6_addr. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-31 13:58:35 -04:00
Jiri Benc	930345ea63	netlink: implement nla_put_in_addr and nla_put_in6_addr IP addresses are often stored in netlink attributes. Add generic functions to do that. For nla_put_in_addr, it would be nicer to pass struct in_addr but this is not used universally throughout the kernel, in way too many places __be32 is used to store IPv4 address. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-31 13:58:35 -04:00
Ian Morris	63159f29be	ipv6: coding style: comparison for equality with NULL The ipv6 code uses a mixture of coding styles. In some instances check for NULL pointer is done as x == NULL and sometimes as !x. !x is preferred according to checkpatch and this patch makes the code consistent by adopting the latter form. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-31 13:51:54 -04:00
Lubomir Rintel	c78ba6d64c	ipv6: expose RFC4191 route preference via rtnetlink This makes it possible to retain the route preference when RAs are handled in userspace. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-11 23:28:09 -04:00
Eric W. Biederman	ddb3b6033c	net: Remove protocol from struct dst_ops After my change to neigh_hh_init to obtain the protocol from the neigh_table there are no more users of protocol in struct dst_ops. Remove the protocol field from dst_ops and all of it's initializers. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 16:06:10 -04:00
Martin KaFai Lau	3b4711757d	ipv6: fix ipv6_cow_metrics for non DST_HOST case ipv6_cow_metrics() currently assumes only DST_HOST routes require dynamic metrics allocation from inetpeer. The assumption breaks when ndisc discovered router with RTAX_MTU and RTAX_HOPLIMIT metric. Refer to ndisc_router_discovery() in ndisc.c and note that dst_metric_set() is called after the route is created. This patch creates the metrics array (by calling dst_cow_metrics_generic) in ipv6_cow_metrics(). Test: radvd.conf: interface qemubr0 { AdvLinkMTU 1300; AdvCurHopLimit 30; prefix fd00:face:face:face::/64 { AdvOnLink on; AdvAutonomous on; AdvRouterAddr off; }; }; Before: [root@qemu1 ~]# ip -6 r show \| egrep -v unreachable fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec fe80::/64 dev eth0 proto kernel metric 256 default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec After: [root@qemu1 ~]# ip -6 r show \| egrep -v unreachable fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec mtu 1300 fe80::/64 dev eth0 proto kernel metric 256 mtu 1300 default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec mtu 1300 hoplimit 30 Fixes: `8e2ec63917` (ipv6: don't use inetpeer to store metrics for routes.) Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-14 20:26:16 -08:00
Michael Büsch	662f5533c4	rt6_probe_deferred: Do not depend on struct ordering rt6_probe allocates a struct __rt6_probe_work and schedules a work handler rt6_probe_deferred. But rt6_probe_deferred kfree's the struct work_struct instead of struct __rt6_probe_work. This works, because struct work_struct is the first element of struct __rt6_probe_work. Change it to kfree struct __rt6_probe_work to not implicitly depend on struct work_struct being the first element. This does not affect the generated code. Signed-off-by: Michael Buesch <m@bues.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 13:00:43 -08:00
David S. Miller	95f873f2ff	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: arch/arm/boot/dts/imx6sx-sdb.dts net/sched/cls_bpf.c Two simple sets of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 16:59:56 -08:00
Martin KaFai Lau	b0a1ba5992	ipv6: Fix __ip6_route_redirect In my last commit (`a3c00e4`: ipv6: Remove BACKTRACK macro), the changes in __ip6_route_redirect is incorrect. The following case is missed: 1. The for loop tries to find a valid gateway rt. If it fails to find one, rt will be NULL. 2. When rt is NULL, it is set to the ip6_null_entry. 3. The newly added 'else if', from `a3c00e4`, will stop the backtrack from happening. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-25 22:09:51 -08:00
Hagen Paul Pfeifer	9d289715eb	ipv6: stop sending PTB packets for MTU < 1280 Reduce the attack vector and stop generating IPv6 Fragment Header for paths with an MTU smaller than the minimum required IPv6 MTU size (1280 byte) - called atomic fragments. See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1] for more information and how this "feature" can be misused. [1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00 Signed-off-by: Fernando Gont <fgont@si6networks.com> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-19 14:52:07 -05:00
Johannes Berg	053c095a82	netlink: make nlmsg_end() and genlmsg_end() void Contrary to common expectations for an "int" return, these functions return only a positive value -- if used correctly they cannot even return 0 because the message header will necessarily be in the skb. This makes the very common pattern of if (genlmsg_end(...) < 0) { ... } be a whole bunch of dead code. Many places also simply do return nlmsg_end(...); and the caller is expected to deal with it. This also commonly (at least for me) causes errors, because it is very common to write if (my_function(...)) /* error condition */ and if my_function() does "return nlmsg_end()" this is of course wrong. Additionally, there's not a single place in the kernel that actually needs the message length returned, and if anyone needs it later then it'll be very easy to just use skb->len there. Remove this, and make the functions void. This removes a bunch of dead code as described above. The patch adds lines because I did - return nlmsg_end(...); + nlmsg_end(...); + return 0; I could have preserved all the function's return values by returning skb->len, but instead I've audited all the places calling the affected functions and found that none cared. A few places actually compared the return value with <= 0 in dump functionality, but that could just be changed to < 0 with no change in behaviour, so I opted for the more efficient version. One instance of the error I've made numerous times now is also present in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't check for <0 or <=0 and thus broke out of the loop every single time. I've preserved this since it will (I think) have caused the messages to userspace to be formatted differently with just a single message for every SKB returned to userspace. It's possible that this isn't needed for the tools that actually use this, but I don't even know what they are so couldn't test that changing this behaviour would be acceptable. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-18 01:03:45 -05:00
Daniel Borkmann	ea69763999	net: tcp: add RTAX_CC_ALGO fib handling This patch adds the minimum necessary for the RTAX_CC_ALGO congestion control metric to be set up and dumped back to user space. While the internal representation of RTAX_CC_ALGO is handled as a u32 key, we avoided to expose this implementation detail to user space, thus instead, we chose the netlink attribute that is being exchanged between user space to be the actual congestion control algorithm name, similarly as in the setsockopt(2) API in order to allow for maximum flexibility, even for 3rd party modules. It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as it should have been stored in RTAX_FEATURES instead, we first thought about reusing it for the congestion control key, but it brings more complications and/or confusion than worth it. Joint work with Florian Westphal. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:55:24 -05:00
Florian Westphal	e715b6d3a5	net: fib6: convert cfg metric to u32 outside of table write lock Do the nla validation earlier, outside the write lock. This is needed by followup patch which needs to be able to call request_module (which can sleep) if needed. Joint work with Daniel Borkmann. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:55:24 -05:00
Martin KaFai Lau	367efcb932	ipv6: Avoid redoing fib6_lookup() with reachable = 0 by saving fn This patch save the fn before doing rt6_backtrack. Hence, without redo-ing the fib6_lookup(), saved_fn can be used to redo rt6_select() with RT6_LOOKUP_F_REACHABLE off. Some minor changes I think make sense to review as a single patch: * Remove the 'out:' goto label. * Remove the 'reachable' variable. Only use the 'strict' variable instead. After this patch, "failing ip6_ins_rt()" should be the only case that requires a redo of fib6_lookup(). Cc: David Miller <davem@davemloft.net> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-24 00:14:39 -04:00
Martin KaFai Lau	94c77bb41d	ipv6: Avoid redoing fib6_lookup() for RTF_CACHE hit case When there is a RTF_CACHE hit, no need to redo fib6_lookup() with reachable=0. Cc: David Miller <davem@davemloft.net> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-24 00:14:39 -04:00
Martin KaFai Lau	a3c00e46ef	ipv6: Remove BACKTRACK macro It is the prep work to reduce the number of calls to fib6_lookup(). The BACKTRACK macro could be hard-to-read and error-prone due to its side effects (mainly goto). This patch is to: 1. Replace BACKTRACK macro with a function (fib6_backtrack) with the following return values: * If it is backtrack-able, returns next fn for retry. * If it reaches the root, returns NULL. 2. The caller needs to decide if a backtrack is needed (by testing rt == net->ipv6.ip6_null_entry). 3. Rename the goto labels in ip6_pol_route() to make the next few patches easier to read. Cc: David Miller <davem@davemloft.net> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-24 00:14:39 -04:00
David S. Miller	739e4a758e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/r8152.c net/netfilter/nfnetlink.c Both r8152 and nfnetlink conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-02 11:25:43 -07:00
Hannes Frederic Sowa	705f1c869d	ipv6: remove rt6i_genid Eric Dumazet noticed that all no-nonexthop or no-gateway routes which are already marked DST_HOST (e.g. input routes routes) will always be invalidated during sk_dst_check. Thus per-socket dst caching absolutely had no effect and early demuxing had no effect. Thus this patch removes rt6i_genid: fn_sernum already gets modified during add operations, so we only must ensure we mutate fn_sernum during ipv6 address remove operations. This is a fairly cost extensive operations, but address removal should not happen that often. Also our mtu update functions do the same and we heard no complains so far. xfrm policy changes also cause a call into fib6_flush_trees. Also plug a hole in rt6_info (no cacheline changes). I verified via tracing that this change has effect. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Martin Lau <kafai@fb.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-30 14:00:48 -04:00
Ian Morris	4c83acbc56	ipv6: White-space cleansing : gaps between function and symbol export This patch makes no changes to the logic of the code but simply addresses coding style issues as detected by checkpatch. Both objdump and diff -w show no differences. This patch removes some blank lines between the end of a function definition and the EXPORT_SYMBOL_GPL macro in order to prevent checkpatch warning that EXPORT_SYMBOL must immediately follow a function. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 22:37:52 -07:00
Ian Morris	67ba4152e8	ipv6: White-space cleansing : Line Layouts This patch makes no changes to the logic of the code but simply addresses coding style issues as detected by checkpatch. Both objdump and diff -w show no differences. A number of items are addressed in this patch: * Multiple spaces converted to tabs * Spaces before tabs removed. * Spaces in pointer typing cleansed (char )foo etc. Remove space after sizeof * Ensure spacing around comparators such as if statements. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 22:37:52 -07:00
David S. Miller	54e5c4def0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/bonding/bond_alb.c drivers/net/ethernet/altera/altera_msgdma.c drivers/net/ethernet/altera/altera_sgdma.c net/ipv6/xfrm6_output.c Several cases of overlapping changes. The xfrm6_output.c has a bug fix which overlaps the renaming of skb->local_df to skb->ignore_df. In the Altera TSE driver cases, the register access cleanups in net-next overlapped with bug fixes done in net. Similarly a bug fix to send ALB packets in the bonding driver using the right source address overlaps with cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-24 00:32:30 -04:00
Li RongQing	1495664355	ipv6: slight optimization in ip6_dst_gc entries is always greater than rt_max_size here, since if entries is less than rt_max_size, the fib6_run_gc function will be skipped Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-21 15:52:23 -04:00
Lorenzo Colitti	2e47b29195	net: ipv6: make "ip -6 route get mark xyz" work. Currently, "ip -6 route get mark xyz" ignores the mark passed in by userspace. Make it honour the mark, just like IPv4 does. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 16:50:30 -04:00
Duan Jiong	be7a010d6f	ipv6: update Destination Cache entries when gateway turn into host RFC 4861 states in 7.2.5: The IsRouter flag in the cache entry MUST be set based on the Router flag in the received advertisement. In those cases where the IsRouter flag changes from TRUE to FALSE as a result of this update, the node MUST remove that router from the Default Router List and update the Destination Cache entries for all destinations using that neighbor as a router as specified in Section 7.3.3. This is needed to detect when a node that is used as a router stops forwarding packets due to being configured as a host. Currently, when dealing with NA Message which IsRouter flag changes from TRUE to FALSE, the kernel only removes router from the Default Router List, and don't update the Destination Cache entries. Now in order to update those Destination Cache entries, i introduce function rt6_clean_tohost(). Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 23:26:27 -04:00
Lorenzo Colitti	1b3c61dc1a	net: Use fwmark reflection in PMTU discovery. Currently, routing lookups used for Path PMTU Discovery in absence of a socket or on unmarked sockets use a mark of 0. This causes PMTUD not to work when using routing based on netfilter fwmark mangling and fwmark ip rules, such as: iptables -j MARK --set-mark 17 ip rule add fwmark 17 lookup 100 This patch causes these route lookups to use the fwmark from the received ICMP error when the fwmark_reflect sysctl is enabled. This allows the administrator to make PMTUD work by configuring appropriate fwmark rules to mark the inbound ICMP packets. Black-box tested using user-mode linux by pointing different fwmarks at routing tables egressing on different interfaces, and using iptables mangling to mark packets inbound on each interface with the interface's fwmark. ICMPv4 and ICMPv6 PMTU discovery work as expected when mark reflection is enabled and fail when it is disabled. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 18:35:09 -04:00
Julian Anastasov	e374c618b1	net: ipv6: more places need LOOPBACK_IFINDEX for flowi6_iif To properly match iif in ip rules we have to provide LOOPBACK_IFINDEX in flowi6_iif, not 0. Some ip6mr_fib_lookup and fib6_rule_lookup callers need such fix. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-28 14:47:03 -04:00
Eric Dumazet	aad88724c9	ipv4: add a sock pointer to dst->output() path. In the dst->output() path for ipv4, the code assumes the skb it has to transmit is attached to an inet socket, specifically via ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the provider of the packet is an AF_PACKET socket. The dst->output() method gets an additional 'struct sock *sk' parameter. This needs a cascade of changes so that this parameter can be propagated from vxlan to final consumer. Fixes: `8f646c922d` ("vxlan: keep original skb ownership") Reported-by: lucien xin <lucien.xin@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-15 13:47:15 -04:00
Eric Dumazet	30f78d8ebf	ipv6: Limit mtu to 65575 bytes Francois reported that setting big mtu on loopback device could prevent tcp sessions making progress. We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets. We must limit the IPv6 MTU to (65535 + 40) bytes in theory. Tested: ifconfig lo mtu 70000 netperf -H ::1 Before patch : Throughput : 0.05 Mbits After patch : Throughput : 35484 Mbits Reported-by: Francois WELLENREITER <f.wellenreiter@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-14 12:39:59 -04:00
Wang Yufen	60ea37f7a5	ipv6: reuse rt6_need_strict Move the whole rt6_need_strict as static inline into ip6_route.h, so that it can be reused Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-31 16:16:16 -04:00
Michal Kubeček	e5fd387ad5	ipv6: do not overwrite inetpeer metrics prematurely If an IPv6 host route with metrics exists, an attempt to add a new route for the same target with different metrics fails but rewrites the metrics anyway: 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000 12sp0:~ # ip -6 route show fe80::/64 dev eth0 proto kernel metric 256 fec0::1 dev eth0 metric 1024 rto_min lock 1s 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500 RTNETLINK answers: File exists 12sp0:~ # ip -6 route show fe80::/64 dev eth0 proto kernel metric 256 fec0::1 dev eth0 metric 1024 rto_min lock 1.5s This is caused by all IPv6 host routes using the metrics in their inetpeer (or the shared default). This also holds for the new route created in ip6_route_add() which shares the metrics with the already existing route and thus ip6_route_add() rewrites the metrics even if the new route ends up not being used at all. Another problem is that old metrics in inetpeer can reappear unexpectedly for a new route, e.g. 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000 12sp0:~ # ip route del fec0::1 12sp0:~ # ip route add fec0::1 dev eth0 12sp0:~ # ip route change fec0::1 dev eth0 hoplimit 10 12sp0:~ # ip -6 route show fe80::/64 dev eth0 proto kernel metric 256 fec0::1 dev eth0 metric 1024 hoplimit 10 rto_min lock 1s Resolve the first problem by moving the setting of metrics down into fib6_add_rt2node() to the point we are sure we are inserting the new route into the tree. Second problem is addressed by introducing new flag DST_METRICS_FORCE_OVERWRITE which is set for a new host route in ip6_route_add() and makes ipv6_cow_metrics() always overwrite the metrics in inetpeer (even if they are not "new"); it is reset after that. v5: use a flag in _metrics member rather than one in flags v4: fix a typo making a condition always true (thanks to Hannes Frederic Sowa) v3: rewritten based on David Miller's idea to move setting the metrics (and allocation in non-host case) down to the point we already know the route is to be inserted. Also rebased to net-next as it is quite late in the cycle. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 15:09:07 -04:00
Sabrina Dubroca	c88507fbad	ipv6: don't set DST_NOCOUNT for remotely added routes DST_NOCOUNT should only be used if an authorized user adds routes locally. In case of routes which are added on behalf of router advertisments this flag must not get used as it allows an unlimited number of routes getting added remotely. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-06 17:29:27 -05:00
David S. Miller	56a4342dfe	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c net/ipv6/ip6_tunnel.c net/ipv6/ip6_vti.c ipv6 tunnel statistic bug fixes conflicting with consolidation into generic sw per-cpu net stats. qlogic conflict between queue counting bug fix and the addition of multiple MAC address support. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 17:37:45 -05:00
Li RongQing	0c3584d589	ipv6: remove prune parameter for fib6_clean_all since the prune parameter for fib6_clean_all always is 0, remove it. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 03:30:35 -05:00
stephen hemminger	e82435341f	ipv6: namespace cleanups Running 'make namespacecheck' shows: net/ipv6/route.o ipv6_route_table_template rt6_bind_peer net/ipv6/icmp.o icmpv6_route_lookup ipv6_icmp_table_template This addresses some of those warnings by: * make icmpv6_route_lookup static * move inline's out of ip6_route.h since only used into route.c * move rt6_bind_peer into route.c Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-01 23:46:09 -05:00
Li RongQing	24f5b855e1	ipv6: always set the new created dst's from in ip6_rt_copy ip6_rt_copy only sets dst.from if ort has flag RTF_ADDRCONF and RTF_DEFAULT. but the prefix routes which did get installed by hand locally can have an expiration, and no any flag combination which can ensure a potential from does never expire, so we should always set the new created dst's from. This also fixes the new created dst is always expired since the ort, which is created by RA, maybe has RTF_EXPIRES and RTF_ADDRCONF, but no RTF_DEFAULT. Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-19 18:35:21 -05:00
David S. Miller	143c905494	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/intel/i40e/i40e_main.c drivers/net/macvtap.c Both minor merge hassles, simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-18 16:42:06 -05:00
Jiri Benc	7e98056964	ipv6: router reachability probing RFC 4191 states in 3.5: When a host avoids using any non-reachable router X and instead sends a data packet to another router Y, and the host would have used router X if router X were reachable, then the host SHOULD probe each such router X's reachability by sending a single Neighbor Solicitation to that router's address. A host MUST NOT probe a router's reachability in the absence of useful traffic that the host would have sent to the router if it were reachable. In any case, these probes MUST be rate-limited to no more than one per minute per router. Currently, when the neighbour corresponding to a router falls into NUD_FAILED, it's never considered again. Introduce a new rt6_nud_state value, RT6_NUD_FAIL_PROBE, which suggests the route should not be used but should be probed with a single NS. The probe is ratelimited by the existing code. To better distinguish meanings of the failure values, rename RT6_NUD_FAIL_SOFT to RT6_NUD_FAIL_DO_RR. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-11 16:02:58 -05:00
Hannes Frederic Sowa	a3300ef4bb	ipv6: don't count addrconf generated routes against gc limit Brett Ciphery reported that new ipv6 addresses failed to get installed because the addrconf generated dsts where counted against the dst gc limit. We don't need to count those routes like we currently don't count administratively added routes. Because the max_addresses check enforces a limit on unbounded address generation first in case someone plays with router advertisments, we are still safe here. Reported-by: Brett Ciphery <brett.ciphery@windriver.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 21:00:39 -05:00
Kamala R	7150aede5d	IPv6: Fixed support for blackhole and prohibit routes The behaviour of blackhole and prohibit routes has been corrected by setting the input and output pointers of the dst variable appropriately. For blackhole routes, they are set to dst_discard and to ip6_pkt_discard and ip6_pkt_discard_out respectively for prohibit routes. ipv6: ip6_pkt_prohibit(_out) should not depend on CONFIG_IPV6_MULTIPLE_TABLES We need ip6_pkt_prohibit(_out) available without CONFIG_IPV6_MULTIPLE_TABLES Signed-off-by: Kamala R <kamala@aristanetworks.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-02 17:14:27 -05:00
Duan Jiong	f104a567e6	ipv6: use rt6_get_dflt_router to get default router in rt6_route_rcv As the rfc 4191 said, the Router Preference and Lifetime values in a ::/0 Route Information Option should override the preference and lifetime values in the Router Advertisement header. But when the kernel deals with a ::/0 Route Information Option, the rt6_get_route_info() always return NULL, that means that overriding will not happen, because those default routers were added without flag RTF_ROUTEINFO in rt6_add_dflt_router(). In order to deal with that condition, we should call rt6_get_dflt_router when the prefix length is 0. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-08 15:16:04 -05:00
Duan Jiong	249a3630c4	ipv6: drop the judgement in rt6_alloc_cow() Now rt6_alloc_cow() is only called by ip6_pol_route() when rt->rt6i_flags doesn't contain both RTF_NONEXTHOP and RTF_GATEWAY, and rt->rt6i_flags hasn't been changed in ip6_rt_copy(). So there is no neccessary to judge whether rt->rt6i_flags contains RTF_GATEWAY or not. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-05 22:17:05 -05:00
David S. Miller	394efd19d5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/emulex/benet/be.h drivers/net/netconsole.c net/bridge/br_private.h Three mostly trivial conflicts. The net/bridge/br_private.h conflict was a function signature (argument addition) change overlapping with the extern removals from Joe Perches. In drivers/net/netconsole.c we had one change adjusting a printk message whilst another changed "printk(KERN_INFO" into "pr_info(". Lastly, the emulex change was a new inline function addition overlapping with Joe Perches's extern removals. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-04 13:48:30 -05:00
Duan Jiong	ba4865027c	ipv6: remove the unnecessary statement in find_match() After reading the function rt6_check_neigh(), we can know that the RT6_NUD_FAIL_SOFT can be returned only when the IS_ENABLE(CONFIG_IPV6_ROUTER_PREF) is false. so in function find_match(), there is no need to execute the statement !IS_ENABLED(CONFIG_IPV6_ROUTER_PREF). Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-30 17:06:51 -04:00
Hannes Frederic Sowa	e3bc10bd95	ipv6: ip6_dst_check needs to check for expired dst_entries On receiving a packet too big icmp error we check if our current cached dst_entry in the socket is still valid. This validation check did not care about the expiration of the (cached) route. The error path I traced down: The socket receives a packet too big mtu notification. It still has a valid dst_entry and thus issues the ip6_rt_pmtu_update on this dst_entry, setting RTF_EXPIRE and updates the dst.expiration value (which could fail because of not up-to-date expiration values, see previous patch). In some seldom cases we race with a) the ip6_fib gc or b) another routing lookup which would result in a recreation of the cached rt6_info from its parent non-cached rt6_info. While copying the rt6_info we reinitialize the metrics store by copying it over from the parent thus invalidating the just installed pmtu update (both dsts use the same key to the inetpeer storage). The dst_entry with the just invalidated metrics data would just get its RTF_EXPIRES flag cleared and would continue to stay valid for the socket. We should have not issued the pmtu update on the already expired dst_entry in the first placed. By checking the expiration on the dst entry and doing a relookup in case it is out of date we close the race because we would install a new rt6_info into the fib before we issue the pmtu update, thus closing this race. Not reliably updating the dst.expire value was fixed by the patch "ipv6: reset dst.expires value when clearing expire flag". Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com> Reported-by: Valentijn Sessink <valentyn@blub.net> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Tested-by: Valentijn Sessink <valentyn@blub.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-25 19:26:59 -04:00
David S. Miller	c3fa32b976	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/qmi_wwan.c include/net/dst.h Trivial merge conflicts, both were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-23 16:49:34 -04:00
Hannes Frederic Sowa	c2f17e827b	ipv6: probe routes asynchronous in rt6_probe Routes need to be probed asynchronous otherwise the call stack gets exhausted when the kernel attemps to deliver another skb inline, like e.g. xt_TEE does, and we probe at the same time. We update neigh->updated still at once, otherwise we would send to many probes. Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-21 18:56:22 -04:00
Julian Anastasov	550bab42f8	ipv6: fill rt6i_gateway with nexthop address Make sure rt6i_gateway contains nexthop information in all routes returned from lookup or when routes are directly attached to skb for generated ICMP packets. The effect of this patch should be a faster version of rt6_nexthop() and the consideration of local addresses as nexthop. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-21 18:37:01 -04:00
Li RongQing	fbadadd90c	ipv6: Not need to set fl6.flowi6_flags as zero setting fl6.flowi6_flags as zero after memset is redundant, Remove it. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 19:14:11 -04:00
Hannes Frederic Sowa	8d2ca1d7b5	ipv6: avoid high order memory allocations for /proc/net/ipv6_route Dumping routes on a system with lots rt6_infos in the fibs causes up to 11-order allocations in seq_file (which fail). While we could switch there to vmalloc we could just implement the streaming interface for /proc/net/ipv6_route. This patch switches /proc/net/ipv6_route from single_open_net to seq_open_net. loff_t *pos tracks dst entries. Also kill never used struct rt6_proc_arg and now unused function fib6_clean_all_ro. Cc: Ben Greear <greearb@candelatech.com> Cc: Patrick McHardy <kaber@trash.net> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-27 17:32:16 -04:00
Duan Jiong	b55b76b221	ipv6:introduce function to find route for redirect RFC 4861 says that the IP source address of the Redirect is the same as the current first-hop router for the specified ICMP Destination Address, so the gateway should be taken into consideration when we find the route for redirect. There was once a check in commit `a6279458c5` ("NDISC: Search over all possible rules on receipt of redirect.") and the check went away in commit `b94f1c0904` ("ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect()"). The bug is only "exploitable" on layer-2 because the source address of the redirect is checked to be a valid link-local address but it makes spoofing a lot easier in the same L2 domain nonetheless. Thanks very much for Hannes's help. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 12:44:31 -04:00
Cong Wang	3ce9b35ff6	ipv6: move ip6_dst_hoplimit() into core kernel It will be used by vxlan, and may not be inlined. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:29:59 -04:00
David S. Miller	b05930f5d1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c include/linux/inetdevice.h The inetdevice.h conflict involves moving the IPV4_DEVCONF values into a UAPI header, overlapping additions of some new entries. The iwlwifi conflict is a context overlap. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-26 16:37:08 -04:00
Duan Jiong	c92a59eca8	ipv6: handle Redirect ICMP Message with no Redirected Header option rfc 4861 says the Redirected Header option is optional, so the kernel should not drop the Redirect Message that has no Redirected Header option. In this patch, the function ip6_redirect_no_header() is introduced to deal with that condition. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>	2013-08-22 20:08:21 -07:00
David S. Miller	0e76a3a587	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Merge net into net-next to setup some infrastructure Eric Dumazet needs for usbnet changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-03 21:36:46 -07:00
Michal Kubeček	49a18d86f6	ipv6: update ip6_rt_last_gc every time GC is run As pointed out by Eric Dumazet, net->ipv6.ip6_rt_last_gc should hold the last time garbage collector was run so that we should update it whenever fib6_run_gc() calls fib6_clean_all(), not only if we got there from ip6_dst_gc(). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-01 14:16:20 -07:00
Michal Kubeček	2ac3ac8f86	ipv6: prevent fib6_run_gc() contention On a high-traffic router with many processors and many IPv6 dst entries, soft lockup in fib6_run_gc() can occur when number of entries reaches gc_thresh. This happens because fib6_run_gc() uses fib6_gc_lock to allow only one thread to run the garbage collector but ip6_dst_gc() doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc() returns. On a system with many entries, this can take some time so that in the meantime, other threads pass the tests in ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for the lock. They then have to run the garbage collector one after another which blocks them for quite long. Resolve this by replacing special value ~0UL of expire parameter to fib6_run_gc() by explicit "force" parameter to choose between spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with force=false if gc_thresh is reached but not max_size. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-01 14:16:20 -07:00
fan.du	ca4c3fc24e	net: split rt_genid for ipv4 and ipv6 Current net name space has only one genid for both IPv4 and IPv6, it has below drawbacks: - Add/delete an IPv4 address will invalidate all IPv6 routing table entries. - Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table entries even when the policy is only applied for one address family. Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6 separately in a fine granularity. Signed-off-by: Fan Du <fan.du@windriver.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-31 14:56:36 -07:00
fan.du	86a37def2b	net ipv6: Remove rebundant rt6i_nsiblings initialization Seting rt->rt6i_nsiblings to zero is rebundant, because above memset zeroed the rest of rt excluding the first dst memember. Signed-off-by: Fan Du <fan.du@windriver.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-24 14:24:56 -07:00
Hannes Frederic Sowa	afc154e978	ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF This is a follow-up patch to `3630d40067` ("ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available"). Since the removal of rt->n in rt6_info we can end up with a dst == NULL in rt6_check_neigh. In case the kernel is not compiled with CONFIG_IPV6_ROUTER_PREF we should also select a route with unkown NUD state but we must not avoid doing round robin selection on routes with the same target. So introduce and pass down a boolean ``do_rr'' to indicate when we should update rt->rr_ptr. As soon as no route is valid we do backtracking and do a lookup on a higher level in the fib trie. v2: a) Improved rt6_check_neigh logic (no need to create neighbour there) and documented return values. v3: a) Introduce enum rt6_nud_state to get rid of the magic numbers (thanks to David Miller). b) Update and shorten commit message a bit to actualy reflect the source. Reported-by: Pierre Emeriaud <petrus.lt@gmail.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-11 11:51:10 -07:00
Hannes Frederic Sowa	1eb4f75828	ipv6: in case of link failure remove route directly instead of letting it expire We could end up expiring a route which is part of an ecmp route set. Doing so would invalidate the rt->rt6i_nsiblings calculations and could provoke the following panic: [ 80.144667] ------------[ cut here ]------------ [ 80.145172] kernel BUG at net/ipv6/ip6_fib.c:733! [ 80.145172] invalid opcode: 0000 [#1] SMP [ 80.145172] Modules linked in: 8021q nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables +snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer virtio_balloon snd soundcore i2c_piix4 i2c_core virtio_net virtio_blk [ 80.145172] CPU: 1 PID: 786 Comm: ping6 Not tainted 3.10.0+ #118 [ 80.145172] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 80.145172] task: ffff880117fa0000 ti: ffff880118770000 task.ti: ffff880118770000 [ 80.145172] RIP: 0010:[<ffffffff815f3b5d>] [<ffffffff815f3b5d>] fib6_add+0x75d/0x830 [ 80.145172] RSP: 0018:ffff880118771798 EFLAGS: 00010202 [ 80.145172] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011350e480 [ 80.145172] RDX: ffff88011350e238 RSI: 0000000000000004 RDI: ffff88011350f738 [ 80.145172] RBP: ffff880118771848 R08: ffff880117903280 R09: 0000000000000001 [ 80.145172] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011350f680 [ 80.145172] R13: ffff880117903280 R14: ffff880118771890 R15: ffff88011350ef90 [ 80.145172] FS: 00007f02b5127740(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000 [ 80.145172] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 80.145172] CR2: 00007f981322a000 CR3: 00000001181b1000 CR4: 00000000000006e0 [ 80.145172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 80.145172] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 80.145172] Stack: [ 80.145172] 0000000000000001 ffff880100000000 ffff880100000000 ffff880117903280 [ 80.145172] 0000000000000000 ffff880119a4cf00 0000000000000400 00000000000007fa [ 80.145172] 0000000000000000 0000000000000000 0000000000000000 ffff88011350f680 [ 80.145172] Call Trace: [ 80.145172] [<ffffffff815eeceb>] ? rt6_bind_peer+0x4b/0x90 [ 80.145172] [<ffffffff815ed985>] __ip6_ins_rt+0x45/0x70 [ 80.145172] [<ffffffff815eee35>] ip6_ins_rt+0x35/0x40 [ 80.145172] [<ffffffff815ef1e4>] ip6_pol_route.isra.44+0x3a4/0x4b0 [ 80.145172] [<ffffffff815ef34a>] ip6_pol_route_output+0x2a/0x30 [ 80.145172] [<ffffffff81616077>] fib6_rule_action+0xd7/0x210 [ 80.145172] [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30 [ 80.145172] [<ffffffff81553026>] fib_rules_lookup+0xc6/0x140 [ 80.145172] [<ffffffff81616374>] fib6_rule_lookup+0x44/0x80 [ 80.145172] [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30 [ 80.145172] [<ffffffff815edea3>] ip6_route_output+0x73/0xb0 [ 80.145172] [<ffffffff815dfdf3>] ip6_dst_lookup_tail+0x2c3/0x2e0 [ 80.145172] [<ffffffff813007b1>] ? list_del+0x11/0x40 [ 80.145172] [<ffffffff81082a4c>] ? remove_wait_queue+0x3c/0x50 [ 80.145172] [<ffffffff815dfe4d>] ip6_dst_lookup_flow+0x3d/0xa0 [ 80.145172] [<ffffffff815fda77>] rawv6_sendmsg+0x267/0xc20 [ 80.145172] [<ffffffff815a8a83>] inet_sendmsg+0x63/0xb0 [ 80.145172] [<ffffffff8128eb93>] ? selinux_socket_sendmsg+0x23/0x30 [ 80.145172] [<ffffffff815218d6>] sock_sendmsg+0xa6/0xd0 [ 80.145172] [<ffffffff81524a68>] SYSC_sendto+0x128/0x180 [ 80.145172] [<ffffffff8109825c>] ? update_curr+0xec/0x170 [ 80.145172] [<ffffffff81041d09>] ? kvm_clock_get_cycles+0x9/0x10 [ 80.145172] [<ffffffff810afd1e>] ? __getnstimeofday+0x3e/0xd0 [ 80.145172] [<ffffffff8152509e>] SyS_sendto+0xe/0x10 [ 80.145172] [<ffffffff8164efd9>] system_call_fastpath+0x16/0x1b [ 80.145172] Code: fe ff ff 41 f6 45 2a 06 0f 85 ca fe ff ff 49 8b 7e 08 4c 89 ee e8 94 ef ff ff e9 b9 fe ff ff 48 8b 82 28 05 00 00 e9 01 ff ff ff <0f> 0b 49 8b 54 24 30 0d 00 00 40 00 89 83 14 01 00 00 48 89 53 [ 80.145172] RIP [<ffffffff815f3b5d>] fib6_add+0x75d/0x830 [ 80.145172] RSP <ffff880118771798> [ 80.387413] ---[ end trace 02f20b7a8b81ed95 ]--- [ 80.390154] Kernel panic - not syncing: Fatal exception in interrupt Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-10 19:45:39 -07:00
Hannes Frederic Sowa	3630d40067	ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available After the removal of rt->n we do not create a neighbour entry at route insertion time (rt6_bind_neighbour is gone). As long as no neighbour is created because of "useful traffic" we skip this routing entry because rt6_check_neigh cannot pick up a valid neighbour (neigh == NULL) and thus returns false. This change was introduced by commit `887c95cc1d` ("ipv6: Complete neighbour entry removal from dst_entry.") To quote RFC4191: "If the host has no information about the router's reachability, then the host assumes the router is reachable." and also: "A host MUST NOT probe a router's reachability in the absence of useful traffic that the host would have sent to the router if it were reachable." So, just assume the router is reachable and let's rt6_probe do the rest. We don't need to create a neighbour on route insertion time. If we don't compile with CONFIG_IPV6_ROUTER_PREF (RFC4191 support) a neighbour is only valid if its nud_state is NUD_VALID. I did not find any references that we should probe the router on route insertion time via the other RFCs. So skip this route in that case. v2: a) use IS_ENABLED instead of #ifdefs (thanks to Sergei Shtylyov) Reported-by: Pierre Emeriaud <petrus.lt@gmail.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-03 17:50:51 -07:00
Nicolas Dichtel	52bd4c0c15	ipv6: fix ecmp lookup when oif is specified There is no reason to skip ECMP lookup when oif is specified, but this implies to check oif given by user when selecting another route. When the new route does not match oif requirement, we simply keep the initial one. Spotted-by: dingzhi <zhi.ding@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-01 13:27:38 -07:00
Joe Perches	fe2c6338fd	net: Convert uses of typedef ctl_table to struct ctl_table Reduce the uses of this unnecessary typedef. Done via perl script: $ git grep --name-only -w ctl_table net \| \ xargs perl -p -i -e '\ sub trim { my ($local) = @_; $local =~ s/(^\s+\|\s+$)//g; return $local; } \ s/\b(?<!struct\s)ctl_table\b(\s\\s*\|\s+\w+)/"struct ctl_table " . trim($1)/ge' Reflow the modified lines that now exceed 80 columns. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-13 02:36:09 -07:00
Simon Horman	29a3cad5c6	ipv6: Correct comparisons and calculations using skb->tail and skb-transport_header This corrects an regression introduced by "net: Use 16bits for *_headers fields of struct skbuff" when NET_SKBUFF_DATA_USES_OFFSET is not set. In that case skb->tail will be a pointer whereas skb->transport_header will be an offset from head. This is corrected by using wrappers that ensure that comparisons and calculations are always made using pointers. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-28 23:49:07 -07:00
Jiri Pirko	351638e7de	net: pass info struct via netdevice notifier So far, only net_device * could be passed along with netdevice notifier event. This patch provides a possibility to pass custom structure able to provide info that event listener needs to know. Signed-off-by: Jiri Pirko <jiri@resnulli.us> v2->v3: fix typo on simeth shortened dev_getter shortened notifier_info struct name v1->v2: fix notifier_call parameter in call_netdevice_notifier() Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-28 13:11:01 -07:00
Thomas Graf	661d2967b3	rtnetlink: Remove passing of attributes into rtnl_doit functions With decnet converted, we can finally get rid of rta_buf and its computations around it. It also gets rid of the minimal header length verification since all message handlers do that explicitly anyway. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-22 10:31:16 -04:00
Lorenzo Colitti	3e8b0ac3e4	net: ipv6: Don't purge default router if accept_ra=2 Setting net.ipv6.conf.<interface>.accept_ra=2 causes the kernel to accept RAs even when forwarding is enabled. However, enabling forwarding purges all default routes on the system, breaking connectivity until the next RA is received. Fix this by not purging default routes on interfaces that have accept_ra=2. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-04 14:12:07 -05:00
YOSHIFUJI Hideaki / 吉藤英明	ecd9883724	ipv6: fix race condition regarding dst->expires and dst->from. Eric Dumazet wrote: \| Some strange crashes happen in rt6_check_expired(), with access \| to random addresses. \| \| At first glance, it looks like the RTF_EXPIRES and \| stuff added in commit `1716a96101` \| (ipv6: fix problem with expired dst cache) \| are racy : same dst could be manipulated at the same time \| on different cpus. \| \| At some point, our stack believes rt->dst.from contains a dst pointer, \| while its really a jiffie value (as rt->dst.expires shares the same area \| of memory) \| \| rt6_update_expires() should be fixed, or am I missing something ? \| \| CC Neil because of https://bugzilla.redhat.com/show_bug.cgi?id=892060 Because we do not have any locks for dst_entry, we cannot change essential structure in the entry; e.g., we cannot change reference to other entity. To fix this issue, split 'from' and 'expires' field in dst_entry out of union. Once it is 'from' is assigned in the constructor, keep the reference until the very last stage of the life time of the object. Of course, it is unsafe to change 'from', so make rt6_set_from simple just for fresh entries. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Neil Horman <nhorman@tuxdriver.com> CC: Gao Feng <gaofeng@cn.fujitsu.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reported-by: Steinar H. Gunderson <sesse@google.com> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-20 15:11:45 -05:00
Gao feng	ece31ffd53	net: proc: change proc_net_remove to remove_proc_entry proc_net_remove is only used to remove proc entries that under /proc/net,it's not a general function for removing proc entries of netns. if we want to remove some proc entries which under /proc/net/stat/, we still need to call remove_proc_entry. this patch use remove_proc_entry to replace proc_net_remove. we can remove proc_net_remove after this patch. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-18 14:53:08 -05:00
Gao feng	d4beaa66ad	net: proc: change proc_net_fops_create to proc_create Right now, some modules such as bonding use proc_create to create proc entries under /proc/net/, and other modules such as ipv4 use proc_net_fops_create. It looks a little chaos.this patch changes all of proc_net_fops_create to proc_create. we can remove proc_net_fops_create after this patch. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-18 14:53:08 -05:00
YOSHIFUJI Hideaki / 吉藤英明	b820bb6b99	ndisc: Do not try to update "updated" time if neighbour has already gone. Commit `2152caea` ("ipv6: Do not depend on rt->n in rt6_probe().") introduce a bug to try to update "updated" time in neighbour structure. Update the "updated" time only if neighbour is available. Bug was found by Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-21 15:41:41 -05:00
YOSHIFUJI Hideaki / 吉藤英明	12fd84f438	ipv6: Remove unused neigh argument for icmp6_dst_alloc() and its callers. Because of rt->n removal, we do not need neigh argument any more. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-18 14:41:13 -05:00
YOSHIFUJI Hideaki / 吉藤英明	887c95cc1d	ipv6: Complete neighbour entry removal from dst_entry. CC: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明	2152caea71	ipv6: Do not depend on rt->n in rt6_probe(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明	145a36217a	ipv6: Do not depend on rt->n in rt6_check_neigh(). CC: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明	c440f1609b	ipv6: Do not depend on rt->n in ip6_pol_route(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明	dd0cbf29b1	ipv6 route: Dump gateway based on RTF_GATEWAY flag and rt->rt6i_gateway. Do not depend on rt->n. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:18 -05:00
YOSHIFUJI Hideaki / 吉藤英明	8e022ee63f	ndisc: Remove tbl argument for __ipv6_neigh_lookup(). We can refer to nd_tbl directly. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:18 -05:00
YOSHIFUJI Hideaki / 吉藤英明	7ff74a596b	ndisc: Update neigh->updated with write lock. neigh->nud_state and neigh->updated are under protection of neigh->lock. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:18 -05:00
YOSHIFUJI Hideaki / 吉藤英明	6059283378	ipv6 netevent: Remove old_neigh from netevent_redirect. The only user is cxgb3 driver. old_neigh is used to check device change, but it must not happen on redirect. In this sense, we can remove old_neigh argument. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-14 15:04:59 -05:00
YOSHIFUJI Hideaki / 吉藤英明	c08977bb2b	ipv6 route: Use ipv6_addr_hash() in rt6_info_hash_nhsfn(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明	6502ca527f	ipv6: Introduce ip6_flowinfo() to extract flowinfo (tclass + flowlabel). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-13 20:17:13 -05:00
YOSHIFUJI Hideaki / 吉藤英明	71bcdba06d	ndisc: Use struct rd_msg for redirect message. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-06 21:08:38 -08:00
Paul Marks	a5a81f0b90	ipv6: Fix default route failover when CONFIG_IPV6_ROUTER_PREF=n I believe this commit from 2008 was incorrect: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=398bcbebb6f721ac308df1e3d658c0029bb74503 When CONFIG_IPV6_ROUTER_PREF is disabled, the kernel should follow RFC4861 section 6.3.6: if no route is NUD_VALID, then traffic should be sprayed across all routers (indirectly triggering NUD) until one of them becomes NUD_VALID. However, the following experiment demonstrates that this does not work: 1) Connect to an IPv6 network. 2) Change the router's MAC (and link-local) address. The kernel will lock onto the first router and never try the new one, even if the first becomes unreachable. This patch fixes the problem by allowing rt6_check_neigh() to return 0; if all routers return 0, then rt6_select() will fall back to round-robin behavior. This patch should have no effect when CONFIG_IPV6_ROUTER_PREF=y. Note that rt6_check_neigh() is only used in a boolean context, so I've changed its return type accordingly. Signed-off-by: Paul Marks <pmarks@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-03 15:34:47 -05:00
Eric W. Biederman	b51642f6d7	net: Enable a userns root rtnl calls that are safe for unprivilged users - Only allow moving network devices to network namespaces you have CAP_NET_ADMIN privileges over. - Enable creating/deleting/modifying interfaces - Enable adding/deleting addresses - Enable adding/setting/deleting neighbour entries - Enable adding/removing routes - Enable adding/removing fib rules - Enable setting the forwarding state - Enable adding/removing ipv6 address labels - Enable setting bridge parameter Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-18 20:33:36 -05:00
Eric W. Biederman	af31f412c7	net: Allow userns root to control ipv6 Allow an unpriviled user who has created a user namespace, and then created a network namespace to effectively use the new network namespace, by reducing capable(CAP_NET_ADMIN) and capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns, CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls. Settings that merely control a single network device are allowed. Either the network device is a logical network device where restrictions make no difference or the network device is hardware NIC that has been explicity moved from the initial network namespace. In general policy and network stack state changes are allowed while resource control is left unchanged. Allow the SIOCSIFADDR ioctl to add ipv6 addresses. Allow the SIOCDIFADDR ioctl to delete ipv6 addresses. Allow the SIOCADDRT ioctl to add ipv6 routes. Allow the SIOCDELRT ioctl to delete ipv6 routes. Allow creation of ipv6 raw sockets. Allow setting the IPV6_JOIN_ANYCAST socket option. Allow setting the IPV6_FL_A_RENEW parameter of the IPV6_FLOWLABEL_MGR socket option. Allow setting the IPV6_TRANSPARENT socket option. Allow setting the IPV6_HOPOPTS socket option. Allow setting the IPV6_RTHDRDSTOPTS socket option. Allow setting the IPV6_DSTOPTS socket option. Allow setting the IPV6_IPSEC_POLICY socket option. Allow setting the IPV6_XFRM_POLICY socket option. Allow sending packets with the IPV6_2292HOPOPTS control message. Allow sending packets with the IPV6_2292DSTOPTS control message. Allow sending packets with the IPV6_RTHDRDSTOPTS control message. Allow setting the multicast routing socket options on non multicast routing sockets. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, and SIOCDELTUNNEL ioctls for setting up, changing and deleting tunnels over ipv6. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, SIOCDELTUNNEL ioctls for setting up, changing and deleting ipv6 over ipv4 tunnels. Allow the SIOCADDPRL, SIOCDELPRL, SIOCCHGPRL ioctls for adding, deleting, and changing the potential router list for ISATAP tunnels. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-18 20:32:45 -05:00
Eric W. Biederman	dfc47ef863	net: Push capable(CAP_NET_ADMIN) into the rtnl methods - In rtnetlink_rcv_msg convert the capable(CAP_NET_ADMIN) check to ns_capable(net->user-ns, CAP_NET_ADMIN). Allowing unprivileged users to make netlink calls to modify their local network namespace. - In the rtnetlink doit methods add capable(CAP_NET_ADMIN) so that calls that are not safe for unprivileged users are still protected. Later patches will remove the extra capable calls from methods that are safe for unprivilged users. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-18 20:32:44 -05:00
Eric W. Biederman	464dc801c7	net: Don't export sysctls to unprivileged users In preparation for supporting the creation of network namespaces by unprivileged users, modify all of the per net sysctl exports and refuse to allow them to unprivileged users. This makes it safe for unprivileged users in general to access per net sysctls, and allows sysctls to be exported to unprivileged users on an individual basis as they are deemed safe. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-18 20:30:55 -05:00
Li RongQing	c75ea26040	ipv6: remove obsolete comments in route.c Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-14 18:51:45 -05:00
David S. Miller	d4185bbf62	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c Minor conflict between the BCM_CNIC define removal in net-next and a bug fix added to net. Based upon a conflict resolution patch posted by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-10 18:32:51 -05:00
Li RongQing	a4477c4ddb	ipv6: remove rt6i_peer_genid from rt6_info and its handler 6431cbc25f(Create a mechanism for upward inetpeer propagation into routes) introduces these codes, but this mechanism is never enabled since rt6i_peer_genid always is zero whether it is not assigned or assigned by rt6_peer_genid(). After `5943634fc5` (ipv4: Maintain redirect and PMTU info in struct rtable again), the ipv4 related codes of this mechanism has been removed, I think we maybe able to remove them now. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-08 21:16:08 -05:00
Amerigo Wang	94e187c015	ipv6: introduce ip6_rt_put() As suggested by Eric, we could introduce a helper function for ipv6 too, to avoid checking if rt is NULL before dst_release(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-03 14:59:05 -04:00
Nicolas Dichtel	1a72418bd7	ipv6/multipath: remove flag NLM_F_EXCL after the first nexthop fib6_add_rt2node() will reject the nexthop if this flag is set, so we perform the check only for the first nexthop. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:38:19 -04:00
Li RongQing	14edd87dc6	ipv6: Set default hoplimit as zero. Commit a02e4b7dae4551(Demark default hoplimit as zero) only changes the hoplimit checking condition and default value in ip6_dst_hoplimit, not zeros all hoplimit default value. Keep the zeroing ip6_template_metrics[RTAX_HOPLIMIT - 1] to force it as const, cause as a37e6e344910(net: force dst_default_metrics to const section) Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-24 23:14:17 -04:00
Nicolas Dichtel	b3ce5ae1fb	ipv6: fix sparse warnings in rt6_info_hash_nhsfn() Adding by commit `51ebd31815` which adds the support of ECMP for IPv6. Spotted-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-23 13:03:45 -04:00
Nicolas Dichtel	51ebd31815	ipv6: add support of equal cost multipath (ECMP) Each nexthop is added like a single route in the routing table. All routes that have the same metric/weight and destination but not the same gateway are considering as ECMP routes. They are linked together, through a list called rt6i_siblings. ECMP routes can be added in one shot, with RTA_MULTIPATH attribute or one after the other (in both case, the flag NLM_F_EXCL should not be set). The patch is based on a previous work from Luc Saillard <luc.saillard@6wind.com>. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-23 02:38:32 -04:00
Gao feng	6825a26c2d	ipv6: release reference of ip6_null_entry's dst entry in __ip6_del_rt as we hold dst_entry before we call __ip6_del_rt, so we should alse call dst_release not only return -ENOENT when the rt6_info is ip6_null_entry. and we already hold the dst entry, so I think it's safe to call dst_release out of the write-read lock. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-04 16:00:07 -04:00
David S. Miller	6a06e5e1bb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/team/team.c drivers/net/usb/qmi_wwan.c net/batman-adv/bat_iv_ogm.c net/ipv4/fib_frontend.c net/ipv4/route.c net/l2tp/l2tp_netlink.c The team, fib_frontend, route, and l2tp_netlink conflicts were simply overlapping changes. qmi_wwan and bat_iv_ogm were of the "use HEAD" variety. With help from Antonio Quartulli. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-28 14:40:49 -04:00
Li RongQing	3fd91fb358	ipv6: recursive check rt->dst.from when call rt6_check_expired If dst cache dst_a copies from dst_b, and dst_b copies from dst_c, check if dst_a is expired or not, we should not end with dst_a->dst.from, dst_b, we should check dst_c. CC: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-19 15:35:33 -04:00
Nicolas Dichtel	2c20cbd7e3	ipv6: use DST_* macro to set obselete field Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-18 15:57:04 -04:00
Nicolas Dichtel	6f3118b571	ipv6: use net->rt_genid to check dst validity IPv6 dst should take care of rt_genid too. When a xfrm policy is inserted or deleted, all dst should be invalidated. To force the validation, dst entries should be created with ->obsolete set to DST_OBSOLETE_FORCE_CHK. This was already the case for all functions calling ip6_dst_alloc(), except for ip6_rt_copy(). As a consequence, we can remove the specific code in inet6_connection_sock. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-18 15:57:03 -04:00
Li RongQing	5744dd9b71	ipv6: replace write lock with read lock when get route info geting route info does not write rt->rt6i_table, so replace write lock with read lock Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-13 16:53:46 -04:00
Eric Dumazet	fb0af4c74f	ipv6: route templates can be const We kmemdup() templates, so they can be const. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-13 16:52:55 -04:00
Amerigo Wang	fdd6681d92	ipv6: remove some useless RCU read lock After this commit: commit `97cac0821a` Author: David S. Miller <davem@davemloft.net> Date: Mon Jul 2 22:43:47 2012 -0700 ipv6: Store route neighbour in rt6_info struct. we no longer use RCU to protect route neighbour. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-10 16:31:18 -04:00
Eric W. Biederman	15e473046c	netlink: Rename pid to portid to avoid confusion It is a frequent mistake to confuse the netlink port identifier with a process identifier. Try to reduce this confusion by renaming fields that hold port identifiers portid instead of pid. I have carefully avoided changing the structures exported to userspace to avoid changing the userspace API. I have successfully built an allyesconfig kernel with this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-10 15:30:41 -04:00
Nicolas Dichtel	b4949ab269	ipv6: fix handling of throw routes It's the same problem that previous fix about blackhole and prohibit routes. When adding a throw route, it was handled like a classic route. Moreover, it was only possible to add this kind of routes by specifying an interface. Before the patch: $ ip route add throw 2001::2/128 RTNETLINK answers: No such device $ ip route add throw 2001::2/128 dev eth0 $ ip -6 route \| grep 2001::2 2001::2 dev eth0 metric 1024 After: $ ip route add throw 2001::2/128 $ ip -6 route \| grep 2001::2 throw 2001::2 dev lo metric 1024 error -11 Reported-by: Markus Stenberg <markus.stenberg@iki.fi> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-07 14:17:10 -04:00
Nicolas Dichtel	ef2c7d7b59	ipv6: fix handling of blackhole and prohibit routes When adding a blackhole or a prohibit route, they were handling like classic routes. Moreover, it was only possible to add this kind of routes by specifying an interface. Bug already reported here: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498498 Before the patch: $ ip route add blackhole 2001::1/128 RTNETLINK answers: No such device $ ip route add blackhole 2001::1/128 dev eth0 $ ip -6 route \| grep 2001 2001::1 dev eth0 metric 1024 After: $ ip route add blackhole 2001::1/128 $ ip -6 route \| grep 2001 blackhole 2001::1 dev lo metric 1024 error -22 v2: wrong patch v3: add a field fc_type in struct fib6_config to store RTN_* type Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-05 17:49:28 -04:00
Pavel Emelyanov	1fb9489bf1	net: Loopback ifindex is constant now As pointed out, there are places, that access net->loopback_dev->ifindex and after ifindex generation is made per-net this value becomes constant equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use it where appropriate. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-09 16:18:07 -07:00
Li Wei	8253947e2c	ipv6: fix incorrect route 'expires' value passed to userspace When userspace use RTM_GETROUTE to dump route table, with an already expired route entry, we always got an 'expires' value(2147157) calculated base on INT_MAX. The reason of this problem is in the following satement: rt->dst.expires - jiffies < INT_MAX gcc promoted the type of both sides of '<' to unsigned long, thus a small negative value would be considered greater than INT_MAX. With the help of Eric Dumazet, do the out of bound checks in rtnl_put_cacheinfo(), _after_ conversion to clock_t. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-29 23:18:31 -07:00
David S. Miller	f5b0a87436	net: Document dst->obsolete better. Add a big comment explaining how the field works, and use defines instead of magic constants for the values assigned to it. Suggested by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-20 13:31:21 -07:00
David S. Miller	a6ff1a2f1e	Merge branch 'nexthop_exceptions' These patches implement the final mechanism necessary to really allow us to go without the route cache in ipv4. We need a place to have long-term storage of PMTU/redirect information which is independent of the routes themselves, yet does not get us back into a situation where we have to write to metrics or anything like that. For this we use an "next-hop exception" table in the FIB nexthops. The one thing I desperately want to avoid is having to create clone routes in the FIB trie for this purpose, because that is very expensive. However, I'm willing to entertain such an idea later if this current scheme proves to have downsides that the FIB trie variant would not have. In order to accomodate this any such scheme, we need to be able to produce a full flow key at PMTU/redirect time. That required an adjustment of the interface call-sites used to propagate these events. For a PMTU/redirect with a fully specified socket, we pass that socket and use it to produce the flow key. Otherwise we use a passed in SKB to formulate the key. There are two cases that need to be distinguished, ICMP message processing (in which case the IP header is at skb->data) and output packet processing (mostly tunnels, and in all such cases the IP header is at ip_hdr(skb)). We also have to make the code able to handle the case where the dst itself passed into the dst_ops->{update_pmtu,redirect} method is invalidated. This matters for calls from sockets that have cached that route. We provide a inet{,6} helper function for this purpose, and edit SCTP specially since it caches routes at the transport rather than socket level. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-17 10:48:26 -07:00
David S. Miller	6700c2709c	net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() This will be used so that we can compose a full flow key. Even though we have a route in this context, we need more. In the future the routes will be without destination address, source address, etc. keying. One ipv4 route will cover entire subnets, etc. In this environment we have to have a way to possess persistent storage for redirects and PMTU information. This persistent storage will exist in the FIB tables, and that's why we'll need to be able to rebuild a full lookup flow key here. Using that flow key will do a fib_lookup() and create/update the persistent entry. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-17 03:29:28 -07:00
Denis Ovsienko	f0396f60d7	ipv6: fix RTPROT_RA markup of RA routes w/nexthops Userspace implementations of network routing protocols sometimes need to tell RA-originated IPv6 routes from other kernel routes to make proper routing decisions. This makes most sense for RA routes with nexthops, namely, default routes and Route Information routes. The intended mean of preserving RA route origin in a netlink message is through indicating RTPROT_RA as protocol code. Function rt6_fill_node() tried to do that for default routes, but its test condition was taken wrong. This change is modeled after the original mailing list posting by Jeff Haran. It fixes the test condition for default route case and sets the same behaviour for Route Information case (both types use nexthops). Handling of the 3rd RA route type, Prefix Information, is left unchanged, as it stands for interface connected routes (without nexthops). Signed-off-by: Denis Ovsienko <infrastation@yandex.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-16 22:55:54 -07:00
Steffen Klassert	8104891b86	ipv6: Initialize the struct rt6_info behind the dst_enty field We start initializing the struct rt6_info at the first field behind the struct dst_enty. This is error prone because it might leave a new field uninitialized. So start initializing the struct rt6_info right behind the dst_entry. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-14 00:29:12 -07:00
David S. Miller	b587ee3ba2	net: Add dummy dst_ops->redirect method where needed. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-12 00:39:24 -07:00
David S. Miller	b94f1c0904	ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect(). And delete rt6_redirect(), since it is no longer used. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-12 00:33:37 -07:00
David S. Miller	3a5ad2ee5e	ipv6: Add ip6_redirect() and ip6_sk_redirect() helper functions. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-12 00:08:07 -07:00
David S. Miller	6e157b6ac6	ipv6: Pull main logic of rt6_redirect() into rt6_do_redirect(). Hook it into dst_ops->redirect as well. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-12 00:05:02 -07:00
David S. Miller	e8599ff4b1	ipv6: Move bulk of redirect handling into rt6_redirect(). This sets things up so that we can have the protocol error handlers call down into the ipv6 route code for redirects just as ipv4 already does. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-11 23:43:53 -07:00
David S. Miller	87a50699cb	rtnetlink: Remove ts/tsage args to rtnl_put_cacheinfo(). Nobody provides non-zero values any longer. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-10 22:40:13 -07:00
David S. Miller	3e12939a2a	inet: Kill FLOWI_FLAG_PRECOW_METRICS. No longer needed. TCP writes metrics, but now in it's own special cache that does not dirty the route metrics. Therefore there is no longer any reason to pre-cow metrics in this way. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-10 22:40:12 -07:00
David S. Miller	81166dd6fa	tcp: Move timestamps from inetpeer to metrics cache. With help from Lin Ming. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-10 22:40:08 -07:00
Steffen Klassert	a2de86f63c	ipv6: Initialize the neighbour pointer of rt6_info on allocation git commit `97cac082` (ipv6: Store route neighbour in rt6_info struct) added a neighbour pointer to rt6_info. Currently we don't initialize this pointer at allocation time. We assume this pointer to be valid if it is not a null pointer, so initialize it on allocation. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-05 14:20:07 -07:00
David S. Miller	97cac0821a	ipv6: Store route neighbour in rt6_info struct. This makes for a simplified conversion away from dst_get_neighbour(). All code outside of ipv6 will use neigh lookups via dst_neigh_lookup(). Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-05 02:41:58 -07:00
David S. Miller	1d248b1cf4	net: Pass neighbours and dest address into NETEVENT_REDIRECT events. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-05 02:21:55 -07:00
David S. Miller	f894cbf847	net: Add optional SKB arg to dst_ops->neigh_lookup(). Causes the handler to use the daddr in the ipv4/ipv6 header when the route gateway is unspecified (local subnet). Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-05 01:04:01 -07:00
David McCullough	4dc27d1cf3	net/ipv6/route.c: packets originating on device match lo Fix to allow IPv6 packets originating locally to match rules with the "iff" set to "lo". This allows IPv6 rule matching work the same as it does for IPv4. From the iproute2 man page: iif NAME select the incoming device to match. If the interface is loop‐ back, the rule only matches packets originating from this host. This means that you may create separate routing tables for for‐ warded and local packets and, hence, completely segregate them. Signed-off-by: David McCullough <david_mccullough@mcafee.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-25 23:54:32 -07:00
David S. Miller	e486463e82	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/qmi_wwan.c net/batman-adv/translation-table.c net/ipv6/route.c qmi_wwan.c resolution provided by Bjørn Mork. batman-adv conflict is dealing merely with the changes of global function names to have a proper subsystem prefix. ipv6's route.c conflict is merely two side-by-side additions of network namespace methods. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-25 15:50:32 -07:00
Thomas Graf	d189634eca	ipv6: Move ipv6 proc file registration to end of init order /proc/net/ipv6_route reflects the contents of fib_table_hash. The proc handler is installed in ip6_route_net_init() whereas fib_table_hash is allocated in fib6_net_init() _after_ the proc handler has been installed. This opens up a short time frame to access fib_table_hash with its pants down. Move the registration of the proc files to a later point in the init order to avoid the race. Tested :-) Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-18 18:38:50 -07:00
David S. Miller	aee289baaa	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/ipv6/route.c Pull in 'net' again to get the revert of Thomas's change which introduced regressions. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-16 01:23:04 -07:00
David S. Miller	e8803b6c38	Revert "ipv6: Prevent access to uninitialized fib_table_hash via /proc/net/ipv6_route" This reverts commit `2a0c451ade`. It causes crashes, because now ip6_null_entry is used before it is initialized. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-16 01:12:19 -07:00
David S. Miller	42ae66c80d	ipv6: Fix types of ip6_update_pmtu(). The mtu should be a __be32, not the mark. Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-15 20:01:57 -07:00
David S. Miller	7e52b33bd5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/ipv6/route.c This deals with a merge conflict between the net-next addition of the inetpeer network namespace ops, and Thomas Graf's bug fix in `2a0c451ade` which makes sure we don't register /proc/net/ipv6_route before it is actually safe to do so. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-15 15:51:55 -07:00
Thomas Graf	2a0c451ade	ipv6: Prevent access to uninitialized fib_table_hash via /proc/net/ipv6_route /proc/net/ipv6_route reflects the contents of fib_table_hash. The proc handler is installed in ip6_route_net_init() whereas fib_table_hash is allocated in fib6_net_init() _after_ the proc handler has been installed. This opens up a short time frame to access fib_table_hash with its pants down. fib6_init() as a whole can't be moved to an earlier position as it also registers the rtnetlink message handlers which should be registered at the end. Therefore split it into fib6_init() which is run early and fib6_init_late() to register the rtnetlink message handlers. Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-15 15:30:15 -07:00
David S. Miller	81aded2467	ipv6: Handle PMTU in ICMP error handlers. One tricky issue on the ipv6 side vs. ipv4 is that the ICMP callouts to handle the error pass the 32-bit info cookie in network byte order whereas ipv4 passes it around in host byte order. Like the ipv4 side, we have two helper functions. One for when we have a socket context and one for when we do not. ip6ip6 tunnels are not handled here, because they handle PMTU events by essentially relaying another ICMP packet-too-big message back to the original sender. This patch allows us to get rid of rt6_do_pmtu_disc(). It handles all kinds of situations that simply cannot happen when we do the PMTU update directly using a fully resolved route. In fact, the "plen == 128" check in ip6_rt_update_pmtu() can very likely be removed or changed into a BUG_ON() check. We should never have a prefixed ipv6 route when we get there. Another piece of strange history here is that TCP and DCCP, unlike in ipv4, never invoke the update_pmtu() method from their ICMP error handlers. This is incredibly astonishing since this is the context where we have the most accurate context in which to make a PMTU update, namely we have a fully connected socket and associated cached socket route. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-15 14:54:11 -07:00
David S. Miller	7b34ca2ac7	inet: Avoid potential NULL peer dereference. We handle NULL in rt{,6}_set_peer but then our caller will try to pass that NULL pointer into inet_putpeer() which isn't ready for it. Fix this by moving the NULL check one level up, and then remove the now unnecessary NULL check from inetpeer_ptr_set_peer(). Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-11 04:13:57 -07:00
David S. Miller	8b96d22d7a	inet: Use FIB table peer roots in routes. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-11 02:10:54 -07:00
David S. Miller	97bab73f98	inet: Hide route peer accesses behind helpers. We encode the pointer(s) into an unsigned long with one state bit. The state bit is used so we can store the inetpeer tree root to use when resolving the peer later. Later the peer roots will be per-FIB table, and this change works to facilitate that. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-11 02:08:47 -07:00
David S. Miller	c0efc887dc	inet: Pass inetpeer root into inet_getpeer*() interfaces. Otherwise we reference potentially non-existing members when ipv6 is disabled. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-09 19:12:36 -07:00
David S. Miller	2b823f7258	ipv6: Do not mark ipv6_inetpeer_ops as __net_initdata. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-09 19:00:16 -07:00
David S. Miller	56a6b248eb	inet: Consolidate inetpeer_invalidate_tree() interfaces. We only need one interface for this operation, since we always know which inetpeer root we want to flush. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-09 16:32:41 -07:00
David S. Miller	c3426b4719	inet: Initialize per-netns inetpeer roots in net/ipv{4,6}/route.c Instead of net/ipv4/inetpeer.c Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-09 16:27:05 -07:00
David S. Miller	fbfe95a42e	inet: Create and use rt{,6}_get_peer_create(). There's a lot of places that open-code rt{,6}_get_peer() only because they want to set 'create' to one. So add an rt{,6}_get_peer_create() for their sake. There were also a few spots open-coding plain rt{,6}_get_peer() and those are transformed here as well. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-08 23:24:18 -07:00
Gao feng	54db0cc2ba	inetpeer: add parameter net for inet_getpeer_v4,v6 add struct net as a parameter of inet_getpeer_v[4,6], use net to replace &init_net. and modify some places to provide net for inet_getpeer_v[4,6] Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-08 14:27:23 -07:00
Eric Dumazet	a50feda546	ipv6: bool/const conversions phase2 Mostly bool conversions, some inline removals and const additions. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-19 01:08:16 -04:00
Joe Perches	f32138319c	net: ipv6: Standardize prefixes for message logging Add #define pr_fmt(fmt) as appropriate. Add "IPv6: " to appropriate files. Convert printk(KERN_<LEVEL> to pr_<level> (but not KERN_DEBUG). Standardize on "%s: " not "%s(): " when emitting __func__. Use "%s: ", __func__ instead of embedding function name. Coalesce formats, align arguments. ADDRCONF output is now prefixed with "IPv6: " Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-16 01:01:03 -04:00
Joe Perches	e87cc4728f	net: Convert net_ratelimit uses to net_<level>_ratelimited Standardize the net core ratelimited logging functions. Coalesce formats, align arguments. Change a printk then vprintk sequence to use printf extension %pV. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-15 13:45:03 -04:00
David S. Miller	56845d78ce	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/atheros/atlx/atl1.c drivers/net/ethernet/atheros/atlx/atl1.h Resolved a conflict between a DMA error bug fix and NAPI support changes in the atl1 driver. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-15 13:19:04 -04:00
Eric Dumazet	95c9617472	net: cleanup unsigned to unsigned int Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-15 12:44:40 -04:00
Gao feng	1716a96101	ipv6: fix problem with expired dst cache If the ipv6 dst cache which copy from the dst generated by ICMPV6 RA packet. this dst cache will not check expire because it has no RTF_EXPIRES flag. So this dst cache will always be used until the dst gc run. Change the struct dst_entry,add a union contains new pointer from and expires. When rt6_info.rt6i_flags has no RTF_EXPIRES flag,the dst.expires has no use. we can use this field to point to where the dst cache copy from. The dst.from is only used in IPV6. rt6_check_expired check if rt6_info.dst.from is expired. ip6_rt_copy only set dst.from when the ort has flag RTF_ADDRCONF and RTF_DEFAULT.then hold the ort. ip6_dst_destroy release the ort. Add some functions to operate the RTF_EXPIRES flag and expires(from) together. and change the code to use these new adding functions. Changes from v5: modify ip6_route_add and ndisc_router_discovery to use new adding functions. Only set dst.from when the ort has flag RTF_ADDRCONF and RTF_DEFAULT.then hold the ort. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-13 12:58:29 -04:00
Shmulik Ladkani	2173bff5dc	ipv6: Fix 'inet6_rtm_getroute' to release 'rt->dst' in case of 'alloc_skb' failure In `72331bc` [ipv6: Fix RTM_GETROUTE's interpretation of RTA_IIF to be consistent with ipv4] the code of 'inet6_rtm_getroute()' was re-ordered such that the reference to 'rt->dst' is incremented prior skb allocation. Hence, if 'alloc_skb()' fails, must drop a reference from 'rt->dst'. Add the missing 'dst_release()' call. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-04 05:25:51 -04:00
David S. Miller	c78679e8f3	ipv6: Stop using NLA_PUT*(). These macros contain a hidden goto, and are thus extremely error prone and make code hard to audit. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-02 04:33:43 -04:00
Shmulik Ladkani	72331bc0cd	ipv6: Fix RTM_GETROUTE's interpretation of RTA_IIF to be consistent with ipv4 In IPv4, if an RTA_IIF attribute is specified within an RTM_GETROUTE message, then a route is searched as if a packet was received on the specified 'iif' interface. However in IPv6, RTA_IIF is not interpreted in the same way: 'inet6_rtm_getroute()' always calls 'ip6_route_output()', regardless the RTA_IIF attribute. As a result, in IPv6 there's no way to use RTM_GETROUTE in order to look for a route as if a packet was received on a specific interface. Fix 'inet6_rtm_getroute()' so that RTA_IIF is interpreted as "lookup a route as if a packet was received on the specified interface", similar to IPv4's 'inet_rtm_getroute()' interpretation. Reported-by: Ami Koren <amikoren@yahoo.com> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-01 17:29:40 -04:00
Eric Dumazet	94f826b807	net: fix a potential rcu_read_lock() imbalance in rt6_fill_node() Commit `f2c31e32b3` (net: fix NULL dereferences in check_peer_redir() ) added a regression in rt6_fill_node(), leading to rcu_read_lock() imbalance. Thats because NLA_PUT() can make a jump to nla_put_failure label. Fix this by using nla_put() Many thanks to Ben Greear for his help Reported-by: Ben Greear <greearb@candelatech.com> Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-03-27 18:48:35 -04:00
David S. Miller	4da0bd7365	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-03-18 23:29:41 -04:00
Eric Dumazet	122bdf67f1	ipv6: fix icmp6_dst_alloc() commit `87a115783` ( ipv6: Move xfrm_lookup() call down into icmp6_dst_alloc().) forgot to convert one error path, leading to crashes in mld_sendpack() Many thanks to Dave Jones for providing a very complete bug report. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-03-16 01:53:42 -07:00
David S. Miller	a7563f342d	ipv6: Use ipv6_addr_any() Suggested by YOSHIFUJI Hideaki. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-26 16:29:16 -05:00
David S. Miller	39232973b7	ipv4/ipv6: Prepare for new route gateway semantics. In the future the ipv4/ipv6 route gateway will take on two types of values: 1) INADDR_ANY/IN6ADDR_ANY, for local network routes, and in this case the neighbour must be obtained using the destination address in ipv4/ipv6 header as the lookup key. 2) Everything else, the actual nexthop route address. So if the gateway is not inaddr-any we use it, otherwise we must use the packet's destination address. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-26 15:22:32 -05:00
RongQing.Li	252c3d84ed	ipv6: release idev when ip6_neigh_lookup failed in icmp6_dst_alloc release idev when ip6_neigh_lookup failed in icmp6_dst_alloc Signed-off-by: RongQing.Li <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-13 10:10:46 -08:00
Josh Hunt	32b293a53d	IPv6: Avoid taking write lock for /proc/net/ipv6_route During some debugging I needed to look into how /proc/net/ipv6_route operated and in my digging I found its calling fib6_clean_all() which uses "write_lock_bh(&table->tb6_lock)" before doing the walk of the table. I found this on 2.6.32, but reading the code I believe the same basic idea exists currently. Looking at the rtnetlink code they are only calling "read_lock_bh(&table->tb6_lock);" via fib6_dump_table(). While I realize reading from proc isn't the recommended way of fetching the ipv6 route table; taking a write lock seems unnecessary and would probably cause network performance issues. To verify this I loaded up the ipv6 route table and then ran iperf in 3 cases: * doing nothing * reading ipv6 route table via proc (while :; do cat /proc/net/ipv6_route > /dev/null; done) * reading ipv6 route table via rtnetlink (while :; do ip -6 route show table all > /dev/null; done) * Load the ipv6 route table up with: * for ((i = 0;i < 4000;i++)); do ip route add unreachable 2000::$i; done * iperf commands: * client: iperf -i 1 -V -c <ipv6 addr> * server: iperf -V -s * iperf results - 3 runs each (in Mbits/sec) * nothing: client: 927,927,927 server: 927,927,927 * proc: client: 179,97,96,113 server: 142,112,133 * iproute: client: 928,927,928 server: 927,927,927 lock_stat shows taking the write lock is causing the slowdown. Using this info I decided to write a version of fib6_clean_all() which replaces write_lock_bh(&table->tb6_lock) with read_lock_bh(&table->tb6_lock). With this new function I see the same results as with my rtnetlink iperf test. Signed-off-by: Josh Hunt <joshhunt00@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-30 17:07:33 -05:00
David S. Miller	8ade06c616	ipv6: Fix neigh lookup using NULL device. In some of the rt6_bind_neighbour() call sites, it hasn't hooked up the rt->dst.dev pointer yet, so we'd deref a NULL pointer when obtaining dev->ifindex for the neighbour hash function computation. Just pass the netdevice explicitly in to fix this problem. Reported-by: Bjarke Istrup Pedersen <gurligebis@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-29 18:51:57 -05:00
David S. Miller	346f870b8a	ipv6: Report TCP timetstamp info in cacheinfo just like ipv4 does. I missed this while adding ipv6 support to inet_peer. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-29 15:22:33 -05:00
David S. Miller	d191854282	ipv6: Kill rt6i_dev and rt6i_expires defines. It just obscures that the netdevice pointer and the expires value are implemented in the dst_entry sub-object of the ipv6 route. And it makes grepping for dst_entry member uses much harder too. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-28 20:19:20 -05:00
David S. Miller	f83c7790dc	ipv6: Create fast inline ipv6 neigh lookup just like ipv4. Also, create and use an rt6_bind_neighbour() in net/ipv6/route.c to consolidate some common logic. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-28 15:41:23 -05:00
David S. Miller	c159d30c59	ipv6: Kill useless route tracing bits in net/ipv6/route.c RDBG() wasn't even used, and the messages printed by RT6_DEBUG() were far from useful. Just get rid of all this stuff, we can replace it with something more suitable if we want. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-26 15:24:36 -05:00
David S. Miller	c5e1fd8cca	Merge branch 'nf-next' of git://1984.lsi.us.es/net-next	2011-12-25 02:21:45 -05:00
David S. Miller	b26e478f8f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/freescale/fsl_pq_mdio.c net/batman-adv/translation-table.c net/ipv6/route.c	2011-12-16 02:11:14 -05:00
David S. Miller	bb3c36863e	ipv6: Check dest prefix length on original route not copied one in rt6_alloc_cow(). After commit `8e2ec63917` ("ipv6: don't use inetpeer to store metrics for routes.") the test in rt6_alloc_cow() for setting the ANYCAST flag is now wrong. 'rt' will always now have a plen of 128, because it is set explicitly to 128 by ip6_rt_copy. So to restore the semantics of the test, check the destination prefix length of 'ort'. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 17:35:06 -05:00
David S. Miller	b43faac690	ipv6: If neigh lookup fails during icmp6 dst allocation, propagate error. Don't just succeed with a route that has a NULL neighbour attached. This follows the behavior of addrconf_dst_alloc(). Allowing this kind of route to end up with a NULL neigh attached will result in packet drops on output until the route is somehow invalidated, since nothing will meanwhile try to lookup the neigh again. A statistic is bumped for the case where we see a neigh-less route on output, but the resulting packet drop is otherwise silent in nature, and frankly it's a hard error for this to happen and ipv6 should do what ipv4 does which is say something in the kernel logs. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 16:51:51 -05:00
David S. Miller	87a115783e	ipv6: Move xfrm_lookup() call down into icmp6_dst_alloc(). And return error pointers. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-06 17:04:13 -05:00
David S. Miller	8f0315190d	ipv6: Make third arg to anycast_dst_alloc() bool. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-06 16:48:14 -05:00
David Miller	2721745501	net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}. To reflect the fact that a refrence is not obtained to the resulting neighbour entry. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>	2011-12-05 15:20:19 -05:00
Florian Westphal	ea6e574e34	ipv6: add ip6_route_lookup like rt6_lookup, but allows caller to pass in flowi6 structure. Will be used by the upcoming ipv6 netfilter reverse path filter match. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2011-12-04 22:44:07 +01:00
David S. Miller	04a6f4417b	ipv6: Kill ndisc_get_neigh() inline helper. It's only used in net/ipv6/route.c and the NULL device check is superfluous for all of the existing call sites. Just expand the __ndisc_lookup_errno() call at each location. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-03 18:29:30 -05:00
David S. Miller	3830847396	ipv6: Various cleanups in route.c 1) x == NULL --> !x 2) x != NULL --> x 3) (x&BIT) --> (x & BIT) 4) (BIT1\|BIT2) --> (BIT1 \| BIT2) 5) proper argument and struct member alignment Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-03 18:02:47 -05:00
David S. Miller	6dec4ac4ee	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/ipv4/inet_diag.c	2011-11-26 14:47:03 -05:00
Steffen Klassert	618f9bc74a	net: Move mtu handling down to the protocol depended handlers We move all mtu handling from dst_mtu() down to the protocol layer. So each protocol can implement the mtu handling in a different manner. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-26 14:29:51 -05:00
Steffen Klassert	ebb762f27f	net: Rename the dst_opt default_mtu method to mtu We plan to invoke the dst_opt->default_mtu() method unconditioally from dst_mtu(). So rename the method to dst_opt->mtu() to match the name with the new meaning. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-26 14:29:50 -05:00
Steffen Klassert	6b600b26c0	route: Use the device mtu as the default for blackhole routes As it is, we return null as the default mtu of blackhole routes. This may lead to a propagation of a bogus pmtu if the default_mtu method of a blackhole route is invoked. So return dst->dev->mtu as the default mtu instead. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-26 14:29:50 -05:00
Alexey Dobriyan	4e3fd7a06d	net: remove ipv6_addr_copy() C assignment can handle struct in6_addr copying. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-22 16:43:32 -05:00
Matti Vaittinen	d71314b4ac	IPv6 routing, NLM_F_* flag support: warn if new route is created without NLM_F_CREATE The support for NLM_F_* flags at IPv6 routing requests. Warn if NLM_F_CREATE flag is not defined for RTM_NEWROUTE request, creating new table. Later NLM_F_CREATE may be required for new route creation. Patch created against linux-3.2-rc1 Signed-off-by: Matti Vaittinen <Mazziesaccount@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-14 14:35:33 -05:00
Linus Torvalds	32aaeffbd4	Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h	2011-11-06 19:44:47 -08:00
Paul Gortmaker	bc3b2d7fb9	net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules These files are non modular, but need to export symbols using the macros now living in export.h -- call out the include so that things won't break when we remove the implicit presence of module.h from everywhere. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:30:30 -04:00
Gao feng	7011687f0f	ipv6: fix route error binding peer in func icmp6_dst_alloc in func icmp6_dst_alloc,dst_metric_set call ipv6_cow_metrics to set metric. ipv6_cow_metrics may will call rt6_bind_peer to set rt6_info->rt6i_peer. So,we should move ipv6_addr_copy before dst_metric_set to make sure rt6_bind_peer success. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-28 16:36:07 -04:00
Madalin Bucur	fbe5818690	ipv6: check return value for dst_alloc return value of dst_alloc must be checked before use Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-09-27 15:32:06 -04:00
Yan, Zheng	8e2ec63917	ipv6: don't use inetpeer to store metrics for routes. Current IPv6 implementation uses inetpeer to store metrics for routes. The problem of inetpeer is that it doesn't take subnet prefix length in to consideration. If two routes have the same address but different prefix length, they share same inetpeer. So changing metrics of one route also affects the other. The fix is to allocate separate metrics storage for each route. Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-09-17 00:57:26 -04:00
Eric Dumazet	f2c31e32b3	net: fix NULL dereferences in check_peer_redir() Gergely Kalman reported crashes in check_peer_redir(). It appears commit `f39925dbde` (ipv4: Cache learned redirect information in inetpeer.) added a race, leading to possible NULL ptr dereference. Since we can now change dst neighbour, we should make sure a reader can safely use a neighbour. Add RCU protection to dst neighbour, and make sure check_peer_redir() can be called safely by different cpus in parallel. As neighbours are already freed after one RCU grace period, this patch should not add typical RCU penalty (cache cold effects) Many thanks to Gergely for providing a pretty report pointing to the bug. Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-03 03:34:12 -07:00
Eric Dumazet	21efcfa0ff	ipv6: unshare inetpeers We currently cow metrics a bit too soon in IPv6 case : All routes are tied to a single inetpeer entry. Change ip6_rt_copy() to get destination address as second argument, so that we fill rt6i_dst before the dst_copy_metrics() call. icmp6_dst_alloc() must set rt6i_dst before calling dst_metric_set(), or else the cow is done while rt6i_dst is still NULL. If orig route points to readonly metrics, we can share the pointer instead of performing the memory allocation and copy. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-21 21:24:25 -07:00
David S. Miller	d3aaeb38c4	net: Add ->neigh_lookup() operation to dst_ops In the future dst entries will be neigh-less. In that environment we need to have an easy transition point for current users of dst->neighbour outside of the packet output fast path. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-18 00:40:17 -07:00
David S. Miller	69cce1d140	net: Abstract dst->neighbour accesses behind helpers. dst_{get,set}_neighbour() Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-17 23:11:35 -07:00
David S. Miller	9cbb7ecbcf	ipv6: Get rid of rt6i_nexthop macro. It just makes it harder to see 1) what the code is doing and 2) grep for all users of dst{->,.}neighbour Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-17 23:11:35 -07:00
David S. Miller	e12fe68ce3	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2011-07-05 23:23:37 -07:00
David S. Miller	957c665f37	ipv6: Don't put artificial limit on routing table size. IPV6, unlike IPV4, doesn't have a routing cache. Routing table entries, as well as clones made in response to route lookup requests, all live in the same table. And all of these things are together collected in the destination cache table for ipv6. This means that routing table entries count against the garbage collection limits, even though such entries cannot ever be reclaimed and are added explicitly by the administrator (rather than being created in response to lookups). Therefore it makes no sense to count ipv6 routing table entries against the GC limits. Add a DST_NOCOUNT destination cache entry flag, and skip the counting if it is set. Use this flag bit in ipv6 when adding routing table entries. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-01 17:30:43 -07:00
David S. Miller	11d53b4990	ipv6: Don't change dst->flags using assignments. This blows away any flags already set in the entry. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-01 17:30:43 -07:00
Greg Rose	c7ac8679be	rtnetlink: Compute and store minimum ifinfo dump size The message size allocated for rtnl ifinfo dumps was limited to a single page. This is not enough for additional interface info available with devices that support SR-IOV and caused a bug in which VF info would not be displayed if more than approximately 40 VFs were created per interface. Implement a new function pointer for the rtnl_register service that will calculate the amount of data required for the ifinfo dump and allocate enough data to satisfy the request. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2011-06-09 20:38:07 -07:00
Florian Westphal	0f6c6392dc	ipv6: copy prefsrc setting when copying route entry commit `c3968a857a` ('ipv6: RTA_PREFSRC support for ipv6 route source address selection') added support for ipv6 prefsrc as an alternative to ipv6 addrlabels, but it did not work because the prefsrc entry was not copied. Cc: Daniel Walter <sahne@0x90.at> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-21 02:05:22 -04:00
David S. Miller	cf91166223	net: Use non-zero allocations in dst_alloc(). Make dst_alloc() and it's users explicitly initialize the entire entry. The zero'ing done by kmem_cache_zalloc() was almost entirely redundant. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-28 22:26:00 -07:00
David S. Miller	5c1e6aa300	net: Make dst_alloc() take more explicit initializations. Now the dst->dev, dev->obsolete, and dst->flags values can be specified as well. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-28 22:25:59 -07:00
David S. Miller	2bd93d7af1	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Resolved logic conflicts causing a build failure due to drivers/net/r8169.c changes using a patch from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-26 12:16:46 -07:00
Held Bernhard	0972ddb237	net: provide cow_metrics() methods to blackhole dst_ops Since commit `62fa8a846d` (net: Implement read-only protection and COW'ing of metrics.) the kernel throws an oops. [ 101.620985] BUG: unable to handle kernel NULL pointer dereference at (null) [ 101.621050] IP: [< (null)>] (null) [ 101.621084] PGD 6e53c067 PUD 3dd6a067 PMD 0 [ 101.621122] Oops: 0010 [#1] SMP [ 101.621153] last sysfs file: /sys/devices/virtual/ppp/ppp/uevent [ 101.621192] CPU 2 [ 101.621206] Modules linked in: l2tp_ppp pppox ppp_generic slhc l2tp_netlink l2tp_core deflate zlib_deflate twofish_x86_64 twofish_common des_generic cbc ecb sha1_generic hmac af_key iptable_filter snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device loop snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_pcm snd_timer snd i2c_i801 iTCO_wdt psmouse soundcore snd_page_alloc evdev uhci_hcd ehci_hcd thermal [ 101.621552] [ 101.621567] Pid: 5129, comm: openl2tpd Not tainted 2.6.39-rc4-Quad #3 Gigabyte Technology Co., Ltd. G33-DS3R/G33-DS3R [ 101.621637] RIP: 0010:[<0000000000000000>] [< (null)>] (null) [ 101.621684] RSP: 0018:ffff88003ddeba60 EFLAGS: 00010202 [ 101.621716] RAX: ffff88003ddb5600 RBX: ffff88003ddb5600 RCX: 0000000000000020 [ 101.621758] RDX: ffffffff81a69a00 RSI: ffffffff81b7ee61 RDI: ffff88003ddb5600 [ 101.621800] RBP: ffff8800537cd900 R08: 0000000000000000 R09: ffff88003ddb5600 [ 101.621840] R10: 0000000000000005 R11: 0000000000014b38 R12: ffff88003ddb5600 [ 101.621881] R13: ffffffff81b7e480 R14: ffffffff81b7e8b8 R15: ffff88003ddebad8 [ 101.621924] FS: 00007f06e4182700(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 [ 101.621971] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 101.622005] CR2: 0000000000000000 CR3: 0000000045274000 CR4: 00000000000006e0 [ 101.622046] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 101.622087] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 101.622129] Process openl2tpd (pid: 5129, threadinfo ffff88003ddea000, task ffff88003de9a280) [ 101.622177] Stack: [ 101.622191] ffffffff81447efa ffff88007d3ded80 ffff88003de9a280 ffff88007d3ded80 [ 101.622245] 0000000000000001 ffff88003ddebbb8 ffffffff8148d5a7 0000000000000212 [ 101.622299] ffff88003dcea000 ffff88003dcea188 ffffffff00000001 ffffffff81b7e480 [ 101.622353] Call Trace: [ 101.622374] [<ffffffff81447efa>] ? ipv4_blackhole_route+0x1ba/0x210 [ 101.622415] [<ffffffff8148d5a7>] ? xfrm_lookup+0x417/0x510 [ 101.622450] [<ffffffff8127672a>] ? extract_buf+0x9a/0x140 [ 101.622485] [<ffffffff8144c6a0>] ? __ip_flush_pending_frames+0x70/0x70 [ 101.622526] [<ffffffff8146fbbf>] ? udp_sendmsg+0x62f/0x810 [ 101.622562] [<ffffffff813f98a6>] ? sock_sendmsg+0x116/0x130 [ 101.622599] [<ffffffff8109df58>] ? find_get_page+0x18/0x90 [ 101.622633] [<ffffffff8109fd6a>] ? filemap_fault+0x12a/0x4b0 [ 101.622668] [<ffffffff813fb5c4>] ? move_addr_to_kernel+0x64/0x90 [ 101.622706] [<ffffffff81405d5a>] ? verify_iovec+0x7a/0xf0 [ 101.622739] [<ffffffff813fc772>] ? sys_sendmsg+0x292/0x420 [ 101.622774] [<ffffffff810b994a>] ? handle_pte_fault+0x8a/0x7c0 [ 101.622810] [<ffffffff810b76fe>] ? __pte_alloc+0xae/0x130 [ 101.622844] [<ffffffff810ba2f8>] ? handle_mm_fault+0x138/0x380 [ 101.622880] [<ffffffff81024af9>] ? do_page_fault+0x189/0x410 [ 101.622915] [<ffffffff813fbe03>] ? sys_getsockname+0xf3/0x110 [ 101.622952] [<ffffffff81450c4d>] ? ip_setsockopt+0x4d/0xa0 [ 101.622986] [<ffffffff813f9932>] ? sockfd_lookup_light+0x22/0x90 [ 101.623024] [<ffffffff814b61fb>] ? system_call_fastpath+0x16/0x1b [ 101.623060] Code: Bad RIP value. [ 101.623090] RIP [< (null)>] (null) [ 101.623125] RSP <ffff88003ddeba60> [ 101.623146] CR2: 0000000000000000 [ 101.650871] ---[ end trace ca3856a7d8e8dad4 ]--- [ 101.651011] __sk_free: optmem leakage (160 bytes) detected. The oops happens in dst_metrics_write_ptr() include/net/dst.h:124: return dst->ops->cow_metrics(dst, p); dst->ops->cow_metrics is NULL and causes the oops. Provide cow_metrics() methods, like we did in commit `214f45c91b` (net: provide default_advmss() methods to blackhole dst_ops) Signed-off-by: Held Bernhard <berny156@gmx.de> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-25 11:53:08 -07:00
Eric Dumazet	b71d1d426d	inet: constify ip headers and in6_addr Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-22 11:04:14 -07:00
Thomas Egerer	e965c05dab	ipv6: Remove hoplimit initialization to -1 The changes introduced with git-commit `a02e4b7d` ("ipv6: Demark default hoplimit as zero.") missed to remove the hoplimit initialization. As a result, ipv6_get_mtu interprets the return value of dst_metric_raw (-1) as 255 and answers ping6 with this hoplimit. This patche removes the line such that ping6 is answered with the hoplimit value configured via sysctl. Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-21 17:24:08 -07:00
Daniel Walter	c3968a857a	ipv6: RTA_PREFSRC support for ipv6 route source address selection [ipv6] Add support for RTA_PREFSRC This patch allows a user to select the preferred source address for a specific IPv6-Route. It can be set via a netlink message setting RTA_PREFSRC to a valid IPv6 address which must be up on the device the route will be bound to. Signed-off-by: Daniel Walter <dwalter@barracuda.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-15 15:44:37 -07:00
Florian Westphal	9c7a4f9ce6	ipv6: ip6_route_output does not modify sk parameter, so make it const This avoids explicit cast to avoid 'discards qualifiers' compiler warning in a netfilter patch that i've been working on. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-22 19:17:36 -07:00
David S. Miller	4c9483b2fb	ipv6: Convert to use flowi6 where applicable. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:54 -08:00
David S. Miller	1d28f42c1b	net: Put flowi_* prefix on AF independent members of struct flowi I intend to turn struct flowi into a union of AF specific flowi structs. There will be a common structure that each variant includes first, much like struct sock_common. This is the first step to move in that direction. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:44 -08:00
David S. Miller	33175d84ee	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x_cmn.c	2011-03-10 14:26:00 -08:00
David S. Miller	7343ff31eb	ipv6: Don't create clones of host routes. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=29252 Addresses https://bugzilla.kernel.org/show_bug.cgi?id=30462 In commit `d80bc0fd26` ("ipv6: Always clone offlink routes.") we forced the kernel to always clone offlink routes. The reason we do that is to make sure we never bind an inetpeer to a prefixed route. The logic turned on here has existed in the tree for many years, but was always off due to a protecting CPP define. So perhaps it's no surprise that there is a logic bug here. The problem is that we canot clone a route that is already a host route (ie. has DST_HOST set). Because if we do, an identical entry already exists in the routing tree and therefore the ip6_rt_ins() call is going to fail. This sets off a series of failures and high cpu usage, because when ip6_rt_ins() fails we loop retrying this operation a few times in order to handle a race between two threads trying to clone and insert the same host route at the same time. Fix this by simply using the route as-is when DST_HOST is set. Reported-by: slash@ac.auone-net.jp Reported-by: Ernst Sjöstrand <ernstp@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-09 19:55:25 -08:00
David S. Miller	0a0e9ae1bd	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x.h	2011-03-03 21:27:42 -08:00
David S. Miller	29546a6404	ipv6: Use ERR_CAST in addrconf_dst_alloc. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-03 12:10:37 -08:00
David S. Miller	2774c131b1	xfrm: Handle blackhole route creation via afinfo. That way we don't have to potentially do this in every xfrm_lookup() caller. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-01 14:59:04 -08:00
David S. Miller	69ead7afdf	ipv6: Normalize arguments to ip6_dst_blackhole(). Return a dst pointer which is potentitally error encoded. Don't pass original dst pointer by reference, pass a struct net instead of a socket, and elide the flow argument since it is unnecessary. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-01 14:45:33 -08:00
Hagen Paul Pfeifer	e9476e95d8	ipv6: variable next is never used in this function Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-02-25 14:00:22 -08:00
Lucian Adrian Grijincu	c486da3439	sysctl: ipv6: use correct net in ipv6_sysctl_rtcache_flush Before this patch issuing these commands: fd = open("/proc/sys/net/ipv6/route/flush") unshare(CLONE_NEWNET) write(fd, "stuff") would flush the newly created net, not the original one. The equivalent ipv4 code is correct (stores the net inside ->extra1). Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-02-25 11:01:56 -08:00
David S. Miller	da935c66ba	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/net/e1000e/netdev.c net/xfrm/xfrm_policy.c	2011-02-19 19:17:35 -08:00
Eric Dumazet	214f45c91b	net: provide default_advmss() methods to blackhole dst_ops Commit `0dbaee3b37` (net: Abstract default ADVMSS behind an accessor.) introduced a possible crash in tcp_connect_init(), when dst->default_advmss() is called from dst_metric_advmss() Reported-by: George Spelvin <linux@horizon.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-02-18 11:39:01 -08:00
David S. Miller	3c7bd1a140	net: Add initial_ref arg to dst_alloc(). This allows avoiding multiple writes to the initial __refcnt. The most simplest cases of wanting an initial reference of "1" in ipv4 and ipv6 have been converted, the rest have been left along and kept at the existing "0". Signed-off-by: David S. Miller <davem@davemloft.net>	2011-02-17 15:44:00 -08:00
David S. Miller	6431cbc25f	inet: Create a mechanism for upward inetpeer propagation into routes. If we didn't have a routing cache, we would not be able to properly propagate certain kinds of dynamic path attributes, for example PMTU information and redirects. The reason is that if we didn't have a routing cache, then there would be no way to lookup all of the active cached routes hanging off of sockets, tunnels, IPSEC bundles, etc. Consider the case where we created a cached route, but no inetpeer entry existed and also we were not asked to pre-COW the route metrics and therefore did not force the creation a new inetpeer entry. If we later get a PMTU message, or a redirect, and store this information in a new inetpeer entry, there is no way to teach that cached route about the newly existing inetpeer entry. The facilities implemented here handle this problem. First we create a generation ID. When we create a cached route of any kind, we remember the generation ID at the time of attachment. Any time we force-create an inetpeer entry in response to new path information, we bump that generation ID. The dst_ops->check() callback is where the knowledge of this event is propagated. If the global generation ID does not equal the one stored in the cached route, and the cached route has not attached to an inetpeer yet, we look it up and attach if one is found. Now that we've updated the cached route's information, we update the route's generation ID too. This clears the way for implementing PMTU and redirects directly in the inetpeer cache. There is absolutely no need to consult cached route information in order to maintain this information. At this point nothing bumps the inetpeer genids, that comes in the later changes which handle PMTUs and redirects using inetpeers. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-02-10 13:33:41 -08:00
David S. Miller	8d13a2a9fb	net: Kill NETEVENT_PMTU_UPDATE. Nobody actually does anything in response to the event, so just kill it off. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-02-08 16:17:55 -08:00
David S. Miller	bd4a6974cc	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2011-02-04 14:28:58 -08:00
Roland Dreier	ec831ea72e	net: Add default_mtu() methods to blackhole dst_ops When an IPSEC SA is still being set up, __xfrm_lookup() will return -EREMOTE and so ip_route_output_flow() will return a blackhole route. This can happen in a sndmsg call, and after `d33e455337` ("net: Abstract default MTU metric calculation behind an accessor.") this leads to a crash in ip_append_data() because the blackhole dst_ops have no default_mtu() method and so dst_mtu() calls a NULL pointer. Fix this by adding default_mtu() methods (that simply return 0, matching the old behavior) to the blackhole dst_ops. The IPv4 part of this patch fixes a crash that I saw when using an IPSEC VPN; the IPv6 part is untested because I don't have an IPv6 VPN, but it looks to be needed as well. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-31 13:16:00 -08:00
David S. Miller	065825402c	net: Store ipv4/ipv6 COW'd metrics in inetpeer cache. Please note that the IPSEC dst entry metrics keep using the generic metrics COW'ing mechanism using kmalloc/kfree. This gives the IPSEC routes an opportunity to use metrics which are unique to their encapsulated paths. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-27 14:59:31 -08:00
David S. Miller	1397e171f1	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2011-01-27 14:59:08 -08:00
David S. Miller	8f2771f2b8	ipv6: Remove route peer binding assertions. They are bogus. The basic idea is that I wanted to make sure that prefixed routes never bind to peers. The test I used was whether RTF_CACHE was set. But first of all, the RTF_CACHE flag is set at different spots depending upon which ip6_rt_copy() caller you're talking about. I've validated all of the code paths, and even in the future where we bind peers more aggressively (for route metric COW'ing) we never bind to prefix'd routes, only fully specified ones. This even applies when addrconf or icmp6 routes are allocated. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-27 14:55:22 -08:00
David S. Miller	62fa8a846d	net: Implement read-only protection and COW'ing of metrics. Routing metrics are now copy-on-write. Initially a route entry points it's metrics at a read-only location. If a routing table entry exists, it will point there. Else it will point at the all zero metric place-holder called 'dst_default_metrics'. The writeability state of the metrics is stored in the low bits of the metrics pointer, we have two bits left to spare if we want to store more states. For the initial implementation, COW is implemented simply via kmalloc. However future enhancements will change this to place the writable metrics somewhere else, in order to increase sharing. Very likely this "somewhere else" will be the inetpeer cache. Note also that this means that metrics updates may transiently fail if we cannot COW the metrics successfully. But even by itself, this patch should decrease memory usage and increase cache locality especially for routing workloads. In those cases the read-only metric copies stay in place and never get written to. TCP workloads where metrics get updated, and those rare cases where PMTU triggers occur, will take a very slight performance hit. But that hit will be alleviated when the long-term writable metrics move to a more sharable location. Since the metrics storage went from a u32 array of RTAX_MAX entries to what is essentially a pointer, some retooling of the dst_entry layout was necessary. Most importantly, we need to preserve the alignment of the reference count so that it doesn't share cache lines with the read-mostly state, as per Eric Dumazet's alignment assertion checks. The only non-trivial bit here is the move of the 'flags' member into the writeable cacheline. This is OK since we are always accessing the flags around the same moment when we made a modification to the reference count. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-26 20:51:05 -08:00
David S. Miller	d80bc0fd26	ipv6: Always clone offlink routes. Do not handle PMTU vs. route lookup creation any differently wrt. offlink routes, always clone them. Reported-by: PK <runningdoglackey@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-24 16:01:58 -08:00
stephen hemminger	bc3ef6605e	ipv6: fib6_ifdown cleanup Remove (unnecessary) casts to make code cleaner. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-18 22:01:16 -08:00
David S. Miller	b4aa9e05a6	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x.h drivers/net/wireless/iwlwifi/iwl-1000.c drivers/net/wireless/iwlwifi/iwl-6000.c drivers/net/wireless/iwlwifi/iwl-core.h drivers/vhost/vhost.c	2010-12-17 12:27:22 -08:00

... 4 5 6 7 8 ...

800 commits