alistair23-linux/net/ipv4
Ido Schimmel 3c71006d15 ipv4: fib_rules: Check if rule is a default rule
Currently, when non-default (custom) FIB rules are used, devices capable
of layer 3 offloading flush their tables and let the kernel do the
forwarding instead.

When these devices' drivers are loaded they register to the FIB
notification chain, which lets them know about the existence of any
custom FIB rules. This is done by sending a RULE_ADD notification based
on the value of 'net->ipv4.fib_has_custom_rules'.

This approach is problematic when VRF offload is taken into account, as
upon the creation of the first VRF netdev, a l3mdev rule is programmed
to direct skbs to the VRF's table.

Instead of merely reading the above value and sending a single RULE_ADD
notification, we should iterate over all the FIB rules and send a
detailed notification for each, thereby allowing offloading drivers to
sanitize the rules they don't support and potentially flush their
tables.

While l3mdev rules are uniquely marked, the default rules are not.
Therefore, when they are being notified they might invoke offloading
drivers to unnecessarily flush their tables.

Solve this by adding an helper to check if a FIB rule is a default rule.
Namely, its selector should match all packets and its action should
point to the local, main or default tables.

As noted by David Ahern, uniquely marking the default rules is
insufficient. When using VRFs, it's common to avoid false hits by moving
the rule for the local table to just before the main table:

Default configuration:
$ ip rule show
0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup default

Common configuration with VRFs:
$ ip rule show
1000:   from all lookup [l3mdev-table]
32765:  from all lookup local
32766:  from all lookup main
32767:  from all lookup default

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 10:18:33 -07:00
..
netfilter lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
af_inet.c net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
ah4.c IPsec: do not ignore crypto err in ah4 input 2017-01-16 12:57:48 +01:00
arp.c NET: Fix /proc/net/arp for AX.25 2017-02-13 22:15:03 -05:00
cipso_ipv4.c netlabel: out of bound access in cipso_v4_validate() 2017-02-04 19:44:22 -05:00
datagram.c
devinet.c net: Eliminate duplicated codes by creating one new function in_dev_select_addr 2017-03-12 23:27:09 -07:00
esp4.c esp: Introduce a helper to setup the trailer 2017-01-17 10:23:08 +01:00
esp4_offload.c esp: Add a software GRO codepath 2017-02-15 11:04:11 +01:00
fib_frontend.c net: route: add missing nla_policy entry for RTA_MARK attribute 2017-03-01 10:25:56 -08:00
fib_lookup.h
fib_notifier.c ipv4: fib: Remove redundant argument 2017-03-10 09:45:09 -08:00
fib_rules.c ipv4: fib_rules: Check if rule is a default rule 2017-03-16 10:18:33 -07:00
fib_semantics.c ipv4: fib: Notify about nexthop status changes 2017-02-08 15:25:18 -05:00
fib_trie.c ipv4: fib: Remove redundant argument 2017-03-10 09:45:09 -08:00
fou.c
gre_demux.c
gre_offload.c
icmp.c net: for rate-limited ICMP replies save one atomic operation 2017-01-09 15:49:12 -05:00
igmp.c igmp, mld: Fix memory leak in igmpv3/mld_del_delrec() 2017-02-09 16:43:45 -05:00
inet_connection_sock.c net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
inet_diag.c tcp: remove early retransmit 2017-01-13 22:37:16 -05:00
inet_fragment.c
inet_hashtables.c inet: kill smallest_size and smallest_port 2017-01-18 13:04:29 -05:00
inet_timewait_sock.c ipv4: Namespaceify tcp_tw_recycle and tcp_max_tw_buckets knob 2016-12-29 11:38:31 -05:00
inetpeer.c
ip_forward.c
ip_fragment.c
ip_gre.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ip_input.c
ip_options.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ip_output.c udp: avoid ufo handling on IP payload compression packets 2017-03-09 18:28:42 -08:00
ip_sockglue.c ip: fix IP_CHECKSUM handling 2017-02-21 12:23:53 -05:00
ip_tunnel.c
ip_tunnel_core.c lwtunnel: remove device arg to lwtunnel_build_state 2017-01-30 15:14:22 -05:00
ip_vti.c
ipcomp.c
ipconfig.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ipip.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ipmr.c lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
Kconfig Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2017-02-16 21:25:49 -05:00
Makefile ipv4: fib: Move FIB notification code to a separate file 2017-03-10 09:45:09 -08:00
netfilter.c netfilter: use skb_to_full_sk in ip_route_me_harder 2017-02-28 12:49:36 +01:00
ping.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-11 02:31:11 -05:00
proc.c net: add LINUX_MIB_PFMEMALLOCDROP counter 2017-02-02 23:34:19 -05:00
protocol.c
raw.c net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP 2017-02-07 13:07:47 -05:00
raw_diag.c
route.c ipv4: mask tos for input route 2017-02-26 11:03:38 -05:00
syncookies.c syncookies: use SipHash in place of SHA1 2017-01-09 13:58:57 -05:00
sysctl_net_ipv4.c net: Avoid receiving packets with an l3mdev on unbound UDP sockets 2017-01-30 15:00:58 -05:00
tcp.c tcp: fix potential double free issue for fastopen_req 2017-03-02 14:05:41 -08:00
tcp_bbr.c
tcp_bic.c
tcp_cdg.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> 2017-03-02 08:42:27 +01:00
tcp_cong.c
tcp_cubic.c
tcp_dctcp.c Revert "dctcp: update cwnd on congestion event" 2016-12-06 11:34:24 -05:00
tcp_diag.c
tcp_fastopen.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-01-28 10:33:06 -05:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: rename *_sequence_number() to *_seq_and_tsoff() 2017-03-09 18:25:34 -08:00
tcp_ipv4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-03-15 11:59:10 -07:00
tcp_lp.c
tcp_metrics.c tcp: replace dst_confirm with sk_dst_confirm 2017-02-07 13:07:46 -05:00
tcp_minisocks.c tcp: account for ts offset only if tsecr not zero 2017-02-22 16:35:58 -05:00
tcp_nv.c
tcp_offload.c
tcp_output.c tcp: accommodate sequence number to a peer's shrunk receive window caused by precision loss in window scaling 2017-02-17 15:30:33 -05:00
tcp_probe.c tcp: Revert "tcp: tcp_probe: use spin_lock_bh()" 2017-02-21 13:26:03 -05:00
tcp_rate.c
tcp_recovery.c tcp: enable RACK loss detection to trigger recovery 2017-01-13 22:37:16 -05:00
tcp_scalable.c
tcp_timer.c tcp: fix various issues for sockets morphing to listen state 2017-03-07 13:58:33 -08:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tunnel4.c
udp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-07 16:29:30 -05:00
udp_diag.c
udp_impl.h
udp_offload.c
udp_tunnel.c
udplite.c
xfrm4_input.c esp: Add a software GRO codepath 2017-02-15 11:04:11 +01:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c esp: Add a software GRO codepath 2017-02-15 11:04:11 +01:00
xfrm4_mode_tunnel.c
xfrm4_output.c
xfrm4_policy.c xfrm: policy: make policy backend const 2017-02-09 10:22:19 +01:00
xfrm4_protocol.c xfrm: input: constify xfrm_input_afinfo 2017-02-09 10:22:17 +01:00
xfrm4_state.c xfrm: remove unused function 2017-01-10 10:57:12 +01:00
xfrm4_tunnel.c