remarkable-linux

redonkable

Author	SHA1	Message	Date
Hangbin Liu	f3ef33ee85	ipvlan: call dev_change_flags when ipvlan mode is reset [ Upstream commit `5dc2d3996a` ] After we change the ipvlan mode from l3 to l2, or vice versa, we only reset IFF_NOARP flag, but don't flush the ARP table cache, which will cause eth->h_dest to be equal to eth->h_source in ipvlan_xmit_mode_l2(). Then the message will not come out of host. Here is the reproducer on local host: ip link set eth1 up ip addr add 192.168.1.1/24 dev eth1 ip link add link eth1 ipvlan1 type ipvlan mode l3 ip netns add net1 ip link set ipvlan1 netns net1 ip netns exec net1 ip link set ipvlan1 up ip netns exec net1 ip addr add 192.168.2.1/24 dev ipvlan1 ip route add 192.168.2.0/24 via 192.168.1.2 ping 192.168.2.2 -c 2 ip netns exec net1 ip link set ipvlan1 type ipvlan mode l2 ping 192.168.2.2 -c 2 Add the same configuration on remote host. After we set the mode to l2, we could find that the src/dst MAC addresses are the same on eth1: 21:26:06.648565 00:b7:13:ad:d3:05 > 00:b7:13:ad:d3:05, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 58356, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.2.1 > 192.168.2.2: ICMP echo request, id 22686, seq 1, length 64 Fix this by calling dev_change_flags(), which will call netdevice notifier with flag change info. v2: a) As pointed out by Wang Cong, check return value for dev_change_flags() when change dev flags. b) As suggested by Stefano and Sabrina, move flags setting before l3mdev_ops. So we don't need to redo ipvlan_{, un}register_nf_hook() again in err path. Reported-by: Jianlin Shi <jishi@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Fixes: `2ad7bf3638` ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-08-24 13:09:11 +02:00
Xin Long	f5a42d63f0	ipvlan: fix IFLA_MTU ignored on NEWLINK [ Upstream commit `30877961b1` ] Commit `296d485680` ("ipvlan: inherit MTU from master device") adjusted the mtu from the master device when creating a ipvlan device, but it would also override the mtu value set in rtnl_create_link. It causes IFLA_MTU param not to take effect. So this patch is to not adjust the mtu if IFLA_MTU param is set when creating a ipvlan device. Fixes: `296d485680` ("ipvlan: inherit MTU from master device") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-07-22 14:28:43 +02:00
Mahesh Bandewar	17c8c59988	ipvlan: add L2 check for packets arriving via virtual devices [ Upstream commit `92ff426450` ] Packets that don't have dest mac as the mac of the master device should not be entertained by the IPvlan rx-handler. This is mostly true as the packet path mostly takes care of that, except when the master device is a virtual device. As demonstrated in the following case - ip netns add ns1 ip link add ve1 type veth peer name ve2 ip link add link ve2 name iv1 type ipvlan mode l2 ip link set dev iv1 netns ns1 ip link set ve1 up ip link set ve2 up ip -n ns1 link set iv1 up ip addr add 192.168.10.1/24 dev ve1 ip -n ns1 addr 192.168.10.2/24 dev iv1 ping -c2 192.168.10.2 <Works!> ip neigh show dev ve1 ip neigh show 192.168.10.2 lladdr <random> dev ve1 ping -c2 192.168.10.2 <Still works! Wrong!!> This patch adds that missing check in the IPvlan rx-handler. Reported-by: Amit Sikka <amit.sikka@ericsson.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-03-19 08:42:56 +01:00
Gao Feng	ae5a0acea2	ipvlan: Add the skb->mark as flow4's member to lookup route [ Upstream commit `a98a4ebc8c` ] Current codes don't use skb->mark to assign flowi4_mark, it would make the policy route rule with fwmark doesn't work as expected. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-02-25 11:07:58 +01:00
Keefe Liu	7015ca81bc	ipvlan: fix ipv6 outbound device [ Upstream commit `ca29fd7cce` ] When process the outbound packet of ipv6, we should assign the master device to output device other than input device. Signed-off-by: Keefe Liu <liuqifa@huawei.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-17 15:08:00 +01:00
Girish Moodalbail	dea6e19f4e	tap: reference to KVA of an unloaded module causes kernel panic The commit `9a393b5d59` ("tap: tap as an independent module") created a separate tap module that implements tap functionality and exports interfaces that will be used by macvtap and ipvtap modules to create create respective tap devices. However, that patch introduced a regression wherein the modules macvtap and ipvtap can be removed (through modprobe -r) while there are applications using the respective /dev/tapX devices. These applications cause kernel to hold reference to /dev/tapX through 'struct cdev macvtap_cdev' and 'struct cdev ipvtap_dev' defined in macvtap and ipvtap modules respectively. So, when the application is later closed the kernel panics because we are referencing KVA that is present in the unloaded modules. ----------8<------- Example ----------8<---------- $ sudo ip li add name mv0 link enp7s0 type macvtap $ sudo ip li show mv0 \|grep mv0\| awk -e '{print $1 $2}' 14:mv0@enp7s0: $ cat /dev/tap14 & $ lsmod \|egrep -i 'tap\|vlan' macvtap 16384 0 macvlan 24576 1 macvtap tap 24576 3 macvtap $ sudo modprobe -r macvtap $ fg cat /dev/tap14 ^C <...system panics...> BUG: unable to handle kernel paging request at ffffffffa038c500 IP: cdev_put+0xf/0x30 ----------8<-----------------8<---------- The fix is to set cdev.owner to the module that creates the tap device (either macvtap or ipvtap). With this set, the operations (in fs/char_dev.c) on char device holds and releases the module through cdev_get() and cdev_put() and will not allow the module to unload prematurely. Fixes: `9a393b5d59` (tap: tap as an independent module) Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-28 19:17:21 +09:00
David S. Miller	b63f6044d8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. Basically, updates to the conntrack core, enhancements for nf_tables, conversion of netfilter hooks from linked list to array to improve memory locality and asorted improvements for the Netfilter codebase. More specifically, they are: 1) Add expection to hashes after timer initialization to prevent access from another CPU that walks on the hashes and calls del_timer(), from Florian Westphal. 2) Don't update nf_tables chain counters from hot path, this is only used by the x_tables compatibility layer. 3) Get rid of nested rcu_read_lock() calls from netfilter hook path. Hooks are always guaranteed to run from rcu read side, so remove nested rcu_read_lock() where possible. Patch from Taehee Yoo. 4) nf_tables new ruleset generation notifications include PID and name of the process that has updated the ruleset, from Phil Sutter. 5) Use skb_header_pointer() from nft_fib, so we can reuse this code from the nf_family netdev family. Patch from Pablo M. Bermudo. 6) Add support for nft_fib in nf_tables netdev family, also from Pablo. 7) Use deferrable workqueue for conntrack garbage collection, to reduce power consumption, from Patch from Subash Abhinov Kasiviswanathan. 8) Add nf_ct_expect_iterate_net() helper and use it. From Florian Westphal. 9) Call nf_ct_unconfirmed_destroy only from cttimeout, from Florian. 10) Drop references on conntrack removal path when skbuffs has escaped via nfqueue, from Florian. 11) Don't queue packets to nfqueue with dying conntrack, from Florian. 12) Constify nf_hook_ops structure, from Florian. 13) Remove neededlessly branch in nf_tables trace code, from Phil Sutter. 14) Add nla_strdup(), from Phil Sutter. 15) Rise nf_tables objects name size up to 255 chars, people want to use DNS names, so increase this according to what RFC 1035 specifies. Patch series from Phil Sutter. 16) Kill nf_conntrack_default_on, it's broken. Default on conntrack hook registration on demand, suggested by Eric Dumazet, patch from Florian. 17) Remove unused variables in compat_copy_entry_from_user both in ip_tables and arp_tables code. Patch from Taehee Yoo. 18) Constify struct nf_conntrack_l4proto, from Julia Lawall. 19) Constify nf_loginfo structure, also from Julia. 20) Use a single rb root in connlimit, from Taehee Yoo. 21) Remove unused netfilter_queue_init() prototype, from Taehee Yoo. 22) Use audit_log() instead of open-coding it, from Geliang Tang. 23) Allow to mangle tcp options via nft_exthdr, from Florian. 24) Allow to fetch TCP MSS from nft_rt, from Florian. This includes a fix for a miscalculation of the minimal length. 25) Simplify branch logic in h323 helper, from Nick Desaulniers. 26) Calculate netlink attribute size for conntrack tuple at compile time, from Florian. 27) Remove protocol name field from nf_conntrack_{l3,l4}proto structure. From Florian. 28) Remove holes in nf_conntrack_l4proto structure, so it becomes smaller. From Florian. 29) Get rid of print_tuple() indirection for /proc conntrack listing. Place all the code in net/netfilter/nf_conntrack_standalone.c. Patch from Florian. 30) Do not built in print_conntrack() if CONFIG_NF_CONNTRACK_PROCFS is off. From Florian. 31) Constify most nf_conntrack_{l3,l4}proto helper functions, from Florian. 32) Fix broken indentation in ebtables extensions, from Colin Ian King. 33) Fix several harmless sparse warning, from Florian. 34) Convert netfilter hook infrastructure to use array for better memory locality, joint work done by Florian and Aaron Conole. Moreover, add some instrumentation to debug this. 35) Batch nf_unregister_net_hooks() calls, to call synchronize_net once per batch, from Florian. 36) Get rid of noisy logging in ICMPv6 conntrack helper, from Florian. 37) Get rid of obsolete NFDEBUG() instrumentation, from Varsha Rao. 38) Remove unused code in the generic protocol tracker, from Davide Caratti. I think I will have material for a second Netfilter batch in my queue if time allow to make it fit in this merge window. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 17:08:42 -07:00
David S. Miller	3118e6e19d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net The UDP offload conflict is dealt with by simply taking what is in net-next where we have removed all of the UFO handling code entirely. The TCP conflict was a case of local variables in a function being removed from both net and net-next. In netvsc we had an assignment right next to where a missing set of u64 stats sync object inits were added. Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-09 16:28:45 -07:00
Florian Fainelli	87173cd6cf	ipvlan: Fix 64-bit statistics seqcount initialization On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a lockdep splat indicating this seqcount is not correctly initialized, fix that by using the proper helper function: netdev_alloc_pcpu_stats(). Fixes: `2ad7bf3638` ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-01 20:06:07 -07:00
Florian Westphal	591bb2789b	netfilter: nf_hook_ops structs can be const We no longer place these on a list so they can be const. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-07-31 19:10:44 +02:00
David S. Miller	182e0b6b58	ipvlan: Stop advertising NETIF_F_UFO support. It is going away. Signed-off-by: David S. Miller <davem@davemloft.net>	2017-07-17 09:52:57 -07:00
Matthias Schiffer	a8b8a889e3	net: add netlink_ext_ack argument to rtnl_link_ops.validate Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-26 23:13:22 -04:00
Matthias Schiffer	ad744b223c	net: add netlink_ext_ack argument to rtnl_link_ops.changelink Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-26 23:13:22 -04:00
Matthias Schiffer	7a3f4a1851	net: add netlink_ext_ack argument to rtnl_link_ops.newlink Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-26 23:13:21 -04:00
David S. Miller	0ddead90b2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net The conflicts were two cases of overlapping changes in batman-adv and the qed driver. Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-15 11:59:32 -04:00
Krister Johansen	3ad7d2468f	Ipvlan should return an error when an address is already in use. The ipvlan code already knows how to detect when a duplicate address is about to be assigned to an ipvlan device. However, that failure is not propogated outward and leads to a silent failure. Introduce a validation step at ip address creation time and allow device drivers to register to validate the incoming ip addresses. The ipvlan code is the first consumer. If it detects an address in use, we can return an error to the user before beginning to commit the new ifa in the networking code. This can be especially useful if it is necessary to provision many ipvlans in containers. The provisioning software (or operator) can use this to detect situations where an ip address is unexpectedly in use. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-09 12:26:07 -04:00
David S. Miller	cf124db566	net: Fix inconsistent teardown and release of private netdev state. Network devices can allocate reasources and private memory using netdev_ops->ndo_init(). However, the release of these resources can occur in one of two different places. Either netdev_ops->ndo_uninit() or netdev->destructor(). The decision of which operation frees the resources depends upon whether it is necessary for all netdev refs to be released before it is safe to perform the freeing. netdev_ops->ndo_uninit() presumably can occur right after the NETDEV_UNREGISTER notifier completes and the unicast and multicast address lists are flushed. netdev->destructor(), on the other hand, does not run until the netdev references all go away. Further complicating the situation is that netdev->destructor() almost universally does also a free_netdev(). This creates a problem for the logic in register_netdevice(). Because all callers of register_netdevice() manage the freeing of the netdev, and invoke free_netdev(dev) if register_netdevice() fails. If netdev_ops->ndo_init() succeeds, but something else fails inside of register_netdevice(), it does call ndo_ops->ndo_uninit(). But it is not able to invoke netdev->destructor(). This is because netdev->destructor() will do a free_netdev() and then the caller of register_netdevice() will do the same. However, this means that the resources that would normally be released by netdev->destructor() will not be. Over the years drivers have added local hacks to deal with this, by invoking their destructor parts by hand when register_netdevice() fails. Many drivers do not try to deal with this, and instead we have leaks. Let's close this hole by formalizing the distinction between what private things need to be freed up by netdev->destructor() and whether the driver needs unregister_netdevice() to perform the free_netdev(). netdev->priv_destructor() performs all actions to free up the private resources that used to be freed by netdev->destructor(), except for free_netdev(). netdev->needs_free_netdev is a boolean that indicates whether free_netdev() should be done at the end of unregister_netdevice(). Now, register_netdevice() can sanely release all resources after ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit() and netdev->priv_destructor(). And at the end of unregister_netdevice(), we invoke netdev->priv_destructor() and optionally call free_netdev(). Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 15:53:24 -04:00
Florian Westphal	3133822f5a	ipvlan: use pernet operations and restrict l3s hooks to master netns commit `4fbae7d83c` ("ipvlan: Introduce l3s mode") added registration of netfilter hooks via nf_register_hooks(). This API provides the illusion of 'global' netfilter hooks by placing the hooks in all current and future network namespaces. In case of ipvlan the hook appears to be only needed in the namespace that contains the ipvlan master device (i.e., usually init_net), so placing them in all namespaces is not needed. This switches ipvlan driver to pernet operations, and then only registers hooks in namespaces where a ipvlan master device is set to l3s mode. Extra care has to be taken when the master device is moved to another namespace, as we might have to 'move' the netfilter hooks too. This is done by storing the namespace the ipvlan port was created in. On REGISTER event, do (un)register operations in the old/new namespaces. This will also allow removal of the nf_register_hooks() in a future patch. Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 10:43:22 -04:00
Sainath Grandhi	235a9d89da	ipvtap: IP-VLAN based tap driver This patch adds a tap character device driver that is based on the IP-VLAN network interface, called ipvtap. An ipvtap device can be created in the same way as an ipvlan device, using 'type ipvtap', and then accessed using the tap user space interface. Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-02-11 20:59:41 -05:00
Mahesh Bandewar	c3262d9dec	ipvlan: use netdev_is_rx_handler_busy instead of checking specific type IPvlan checks if the master device is already used by checking a specific device (here it's macvlan device). This is technically not sufficient and it should just ensure the rx_handler is busy or not. This would be a super check that includes macvlan and any other that has already registered rx-handler. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:22:26 -05:00
Mahesh Bandewar	019ec0032e	ipvlan: fix dev_id creation corner case. In the last patch `da36e13cf6` ("ipvlan: improvise dev_id generation logic in IPvlan") I missed some part of Dave's suggestion and because of that the dev_id creation could fail in a corner case scenario. This would happen when more or less 64k devices have been already created and several have been deleted. If the devices that are still sticking around are the last n bits from the bitmap. So in this scenario even if lower bits are available, the dev_id search is so narrow that it always fails. Fixes: `da36e13cf6` ("ipvlan: improvise dev_id generation logic in IPvlan") CC: David Miller <davem@davemloft.org> CC: Eric Dumazet <edumazet@google.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-16 14:04:43 -05:00
Mahesh Bandewar	da36e13cf6	ipvlan: improvise dev_id generation logic in IPvlan The patch `009146d117` ("ipvlan: assign unique dev-id for each slave device.") used ida_simple_get() to generate dev_ids assigned to the slave devices. However (Eric has pointed out that) there is a shortcoming with that approach as it always uses the first available ID. This becomes a problem when a slave gets deleted and a new slave gets added. The ID gets reassigned causing the new slave to get the same link-local address. This side-effect is undesirable. This patch adds a per-port variable that keeps track of the IDs assigned and used as the stat-base for the IDR api. This base will be wrapped around when it reaches the MAX (0xFFFE) value possibly on a busy system where slaves are added and deleted routinely. Fixes: `009146d117` ("ipvlan: assign unique dev-id for each slave device.") Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Eric Dumazet <edumazet@google.com> CC: David Miller <davem@davemloft.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-10 20:47:12 -05:00
stephen hemminger	bc1f44709c	net: make ndo_get_stats64 a void function The network device operation for reading statistics is only called in one place, and it ignores the return value. Having a structure return value is potentially confusing because some future driver could incorrectly assume that the return value was used. Fix all drivers with ndo_get_stats64 to have a void function. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-08 17:51:44 -05:00
Mahesh Bandewar	009146d117	ipvlan: assign unique dev-id for each slave device. IPvlan setup uses one mac-address (of master). The IPv6 link-local addresses are derived using the mac-address on the link. Lack of dev-ids makes these link-local addresses same for all slaves including that of master device. dev-ids are necessary to add differentiation when L2 address is shared. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-04 13:30:42 -05:00
Gao Feng	3ea35d3406	driver: ipvlan: Remove unnecessary ipvlan NULL check in ipvlan_count_rx There are three functions which would invoke the ipvlan_count_rx. They are ipvlan_process_multicast, ipvlan_rcv_frame, and ipvlan_nf_input. The former two functions already use the ipvlan directly before ipvlan_count_rx, and ipvlan_nf_input gets the ipvlan from ipvl_addr->master, it is not possible to be NULL too. So the ipvlan pointer check is unnecessary in ipvlan_count_rx. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-28 14:23:22 -05:00
Gao Feng	8667398277	driver: ipvlan: Define common functions to decrease duplicated codes used to add or del IP address There are some duplicated codes in ipvlan_add_addr6/4 and ipvlan_del_addr6/4. Now define two common functions ipvlan_add_addr and ipvlan_del_addr to decrease the duplicated codes. It could be helful to maintain the codes. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-28 14:23:21 -05:00
Mahesh Bandewar	e252536068	ipvlan: fix multicast processing In an IPvlan setup when master is set in loopback mode e.g. ethtool -K eth0 set loopback on where eth0 is master device for IPvlan setup. The failure is caused by the faulty logic that determines if the packet is from TX-path vs. RX-path by just looking at the mac- addresses on the packet while processing multicast packets. In the loopback-mode where this crash was happening, the packets that are sent out are reflected by the NIC and are processed on the RX path, but mac-address check tricks into thinking this packet is from TX path and falsely uses dev_forward_skb() to pass packets to the slave (virtual) devices. This patch records the path while queueing packets and eliminates logic of looking at mac-addresses for the same decision. ------------[ cut here ]------------ kernel BUG at include/linux/skbuff.h:1737! Call Trace: [<ffffffff921fbbc2>] dev_forward_skb+0x92/0xd0 [<ffffffffc031ac65>] ipvlan_process_multicast+0x395/0x4c0 [ipvlan] [<ffffffffc031a9a7>] ? ipvlan_process_multicast+0xd7/0x4c0 [ipvlan] [<ffffffff91cdfea7>] ? process_one_work+0x147/0x660 [<ffffffff91cdff09>] process_one_work+0x1a9/0x660 [<ffffffff91cdfea7>] ? process_one_work+0x147/0x660 [<ffffffff91ce086d>] worker_thread+0x11d/0x360 [<ffffffff91ce0750>] ? rescuer_thread+0x350/0x350 [<ffffffff91ce960b>] kthread+0xdb/0xe0 [<ffffffff91c05c70>] ? _raw_spin_unlock_irq+0x30/0x50 [<ffffffff91ce9530>] ? flush_kthread_worker+0xc0/0xc0 [<ffffffff92348b7a>] ret_from_fork+0x9a/0xd0 [<ffffffff91ce9530>] ? flush_kthread_worker+0xc0/0xc0 Fixes: `ba35f8588f` ("ipvlan: Defer multicast / broadcast processing to a work-queue") Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-23 17:53:47 -05:00
Eric Dumazet	b1227d019f	ipvlan: fix various issues in ipvlan_process_multicast() 1) netif_rx() / dev_forward_skb() should not be called from process context. 2) ipvlan_count_rx() should be called with preemption disabled. 3) We should check if ipvlan->dev is up before feeding packets to netif_rx() 4) We need to prevent device from disappearing if some packets are in the multicast backlog. 5) One kfree_skb() should be a consume_skb() eventually Fixes: `ba35f8588f` ("ipvlan: Defer multicast / broadcast processing to a work-queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-23 17:53:47 -05:00
David S. Miller	821781a9f4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-12-10 16:21:55 -05:00
Gao Feng	1a31cc86ef	driver: ipvlan: Unlink the upper dev when ipvlan_link_new failed When netdev_upper_dev_unlink failed in ipvlan_link_new, need to unlink the ipvlan dev with upper dev. Signed-off-by: Gao Feng <fgao@ikuai8.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-08 14:30:07 -05:00
Gao Feng	48140a210b	driver: ipvlan: Free ipvl_port directly with kfree instead of kfree_rcu There are two functions which would free the ipvl_port now. The first is ipvlan_port_create. It frees the ipvl_port in the error handler, so it could kfree it directly. The second is ipvlan_port_destroy. It invokes netdev_rx_handler_unregister which enforces one grace period by synchronize_net firstly, so it also could kfree the ipvl_port directly and safely. So it is unnecessary to use kfree_rcu to free ipvl_port. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-07 13:21:21 -05:00
David S. Miller	2745529ac7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Couple conflicts resolved here: 1) In the MACB driver, a bug fix to properly initialize the RX tail pointer properly overlapped with some changes to support variable sized rings. 2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix overlapping with a reorganization of the driver to support ACPI, OF, as well as PCI variants of the chip. 3) In 'net' we had several probe error path bug fixes to the stmmac driver, meanwhile a lot of this code was cleaned up and reorganized in 'net-next'. 4) The cls_flower classifier obtained a helper function in 'net-next' called __fl_delete() and this overlapped with Daniel Borkamann's bug fix to use RCU for object destruction in 'net'. It also overlapped with Jiri's change to guard the rhashtable_remove_fast() call with a check against tc_skip_sw(). 5) In mlx4, a revert bug fix in 'net' overlapped with some unrelated changes in 'net-next'. 6) In geneve, a stale header pointer after pskb_expand_head() bug fix in 'net' overlapped with a large reorganization of the same code in 'net-next'. Since the 'net-next' code no longer had the bug in question, there was nothing to do other than to simply take the 'net-next' hunks. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-03 12:29:53 -05:00
Gao Feng	8f679ed88f	driver: ipvlan: Remove useless member mtu_adj of struct ipvl_dev The mtu_adj is initialized to zero when alloc mem, there is no any assignment to mtu_adj. It is only used in ipvlan_adjust_mtu as one right value. So it is useless member of struct ipvl_dev, then remove it. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-30 15:01:32 -05:00
Gao Feng	147fd2874d	driver: ipvlan: Fix one possible memleak in ipvlan_link_new When ipvlan_link_new fails and creates one ipvlan port, it does not destroy the ipvlan port created. It causes mem leak and the physical device contains invalid ipvlan data. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-27 19:58:04 -05:00
Julia Lawall	ab530f63e7	ipvlan: constify l3mdev_ops structure This l3mdev_ops structure is only stored in the l3mdev_ops field of a net_device structure. This field is declared const, so the l3mdev_ops structure can be declared as const also. Additionally drop the __read_mostly annotation. The semantic patch that adds const is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r disable optional_qualifier@ identifier i; position p; @@ static struct l3mdev_ops i@p = { ... }; @ok@ identifier r.i; struct net_device *e; position p; @@ e->l3mdev_ops = &i@p; @bad@ position p != {r.p,ok.p}; identifier r.i; struct l3mdev_ops e; @@ e@i@p @depends on !bad disable optional_qualifier@ identifier r.i; @@ static +const struct l3mdev_ops i = { ... }; // </smpl> The effect on the layout of the .o file is shown by the following output of the size command, first before then after the transformation: text data bss dec hex filename 7364 466 52 7882 1eca drivers/net/ipvlan/ipvlan_main.o 7412 434 52 7898 1eda drivers/net/ipvlan/ipvlan_main.o Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-10-15 17:49:57 -04:00
Mahesh Bandewar	4fbae7d83c	ipvlan: Introduce l3s mode In a typical IPvlan L3 setup where master is in default-ns and each slave is into different (slave) ns. In this setup egress packet processing for traffic originating from slave-ns will hit all NF_HOOKs in slave-ns as well as default-ns. However same is not true for ingress processing. All these NF_HOOKs are hit only in the slave-ns skipping them in the default-ns. IPvlan in L3 mode is restrictive and if admins want to deploy iptables rules in default-ns, this asymmetric data path makes it impossible to do so. This patch makes use of the l3_rcv() (added as part of l3mdev enhancements) to perform input route lookup on RX packets without changing the skb->dev and then uses nf_hook at NF_INET_LOCAL_IN to change the skb->dev just before handing over skb to L4. Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-19 01:25:22 -04:00
Mahesh Bandewar	b93dd49c1a	ipvlan: Scrub skb before crossing the namespace boundry The earlier patch `c3aaa06d5a` (ipvlan: scrub skb before routing in L3 mode.) did this but only for TX path in L3 mode. This patch extends it for both the modes for TX/RX path. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-25 21:47:26 -07:00
Eric Dumazet	0d7dd798fd	net: ipvlan: call netdev_lockdep_set_classes() In case a qdisc is used on a ipvlan device, we need to use different lockdep classes to avoid false positives. Use the new netdev_lockdep_set_classes() generic helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-09 13:28:37 -07:00
Mahesh Bandewar	494e8489db	ipvlan: Fix failure path in dev registration during link creation When newlink creation fails at device-registration, the port->count is decremented twice. Francesco Ruggeri (fruggeri@arista.com) found this issue in Macvlan and the same exists in IPvlan driver too. While fixing this issue I noticed another issue of missing unregister in case of failure, so adding it to the fix which is similar to the macvlan fix by Francesco in commit `3083796075` ("macvlan: fix failure during registration v3") Reported-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Eric Dumazet <edumazet@google.com> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 17:23:08 -04:00
Eric Dumazet	f6773c5e95	vlan: propagate gso_max_segs vlan drivers lack proper propagation of gso_max_segs from lower device. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-17 21:05:01 -04:00
David Decotigny	314d10d73b	net: ipvlan: use __ethtool_get_ksettings Signed-off-by: David Decotigny <decot@googlers.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-25 22:06:46 -05:00
Mahesh Bandewar	ab5b7013db	ipvlan: misc changes 1. scope correction for few functions that are used in single file. 2. Adjust variables that are used in fast-path to fit into single cacheline 3. Update rcv_frame() to skip shared check for frames coming over wire Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-21 22:43:24 -05:00
Mahesh Bandewar	e93fbc5a15	ipvlan: mode is u16 The mode argument was erronusly defined as u32 but it has always been u16. Also use ipvlan_set_mode() helper to set the mode instead of assigning directly. This should avoid future erronus assignments / updates. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-21 22:43:24 -05:00
Mahesh Bandewar	c3aaa06d5a	ipvlan: scrub skb before routing in L3 mode. Scrub skb before hitting the iptable hooks to ensure packets hit these hooks. Set the xnet param only when the packet is crossing the ns boundry so if the IPvlan slave and master belong to the same ns, the param will be set to false. Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-21 22:43:24 -05:00
Mahesh Bandewar	296d485680	ipvlan: inherit MTU from master device When we create IPvlan slave; we use ether_setup() and that sets up default MTU to 1500 while the master device may have lower / different MTU. Any subsequent changes to the masters' MTU are reflected into the slaves' MTU setting. However if those don't happen (most likely scenario), the slaves' MTU stays at 1500 which could be bad. This change adds code to inherit MTU from the master device instead of using the default value during the link initialization phase. Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Tim Hockins <thockins@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-04 19:18:53 -05:00
Tom Herbert	a188222b6e	net: Rename NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK The name NETIF_F_ALL_CSUM is a misnomer. This does not correspond to the set of features for offloading all checksums. This is a mask of the checksum offload related features bits. It is incorrect to set both NETIF_F_HW_CSUM and NETIF_F_IP_CSUM or NETIF_F_IPV6 at the same time for features of a device. This patch: - Changes instances of NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK (where NETIF_F_ALL_CSUM is being used as a mask). - Changes bonding, sfc/efx, ipvlan, macvlan, vlan, and team drivers to use NEITF_F_HW_CSUM in features list instead of NETIF_F_ALL_CSUM. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-15 16:50:08 -05:00
Sabrina Dubroca	a534dc5298	ipvlan: fix use after free of skb ipvlan_handle_frame is a rx_handler, and when it returns a value other than RX_HANDLER_CONSUMED (here, NET_RX_DROP aka RX_HANDLER_ANOTHER), __netif_receive_skb_core expects that the skb still exists and will process it further, but we just freed it. Fixes: `2ad7bf3638` ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-17 14:39:29 -05:00
Sabrina Dubroca	cf554ada0b	ipvlan: fix leak in ipvlan_rcv_frame Pass a *skb to ipvlan_rcv_frame so that if skb_share_check returns a new skb, we actually use it during further processing. It's safe to ignore the new skb in the ipvlan_xmit_ functions, because they call ipvlan_rcv_frame with local == true, so that dev_forward_skb is called and always takes ownership of the skb. Fixes: `2ad7bf3638` ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-17 14:39:28 -05:00
Brenden Blanco	63b11e757d	ipvlan: read direct ifindex instead of iflink In the ipv4 outbound path of an ipvlan device in l3 mode, the ifindex is being grabbed from dev_get_iflink. This works for the physical device case, since as the documentation of that function notes: "Physical interfaces have the same 'ifindex' and 'iflink' values.". However, if the master device is a veth, and the pairs are in separate net namespaces, the route lookup will fail with -ENODEV due to outer veth pair being in a separate namespace from the ipvlan master/routing namespace. ns0 \| ns1 \| ns2 veth0a--\|--veth0b--\|--ipvl0 In ipvlan_process_v4_outbound(), a packet sent from ipvl0 in the above configuration will pass fl.flowi4_oif == veth0a to ip_route_output_flow(), but *net == ns1. Notice also that ipv6 processing is not using iflink. Since there is a discrepancy in usage, fixup both v4 and v6 case to use local dev variable. Tested this with l3 ipvlan on top of veth, as well as with single physical interface in the top namespace. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Reviewed-by: Jiri Benc <jbenc@redhat.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 06:39:08 -07:00
Eric W. Biederman	33224b16ff	ipv4, ipv6: Pass net into ip_local_out and ip6_local_out Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:27:02 -07:00

1 2

78 Commits (redonkable)