redonkable/alistair23-linux

Author	SHA1	Message	Date
Craig Gallek	eb4cb00852	sock_diag: define destruction multicast groups These groups will contain socket-destruction events for AF_INET/AF_INET6, IPPROTO_TCP/IPPROTO_UDP. Near the end of socket destruction, a check for listeners is performed. In the presence of a listener, rather than completely cleanup the socket, a unit of work will be added to a private work queue which will first broadcast information about the socket and then finish the cleanup operation. Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 19:49:22 -07:00
David S. Miller	916035ddd3	Merge branch 'mlx4-vf-counters' Or Gerlitz says: ==================== mlx4 driver update (+ new VF ndo) This series from Eran and Hadar is further dealing with traffic counters in the mlx4 driver, this time mostly around SRIOV. We added a new ndo to read the VF counters through the PF netdev netlink infrastructure plus mlx4 implementation for that ndo. changes from V0: - applied feedback from John to use nested netlink encoding for the VF counters so we can extend it later - add handling of single ported VFs in the mlx4_en driver new ndo - avoid chopping the FW counters from 64 to 32 bits in mlx4_en PF flow ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:03 -07:00
Eran Ben Elisha	62a890557f	net/mlx4_en: Support ndo_get_vf_stats Implement the ndo to gather VF statistics through the PF. All counters related to this VF are stored in a per slave list, run over the slave's list and collect all statistics. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:03 -07:00
Eran Ben Elisha	3b766cd832	net/core: Add reading VF statistics through the PF netdevice Add ndo_get_vf_stats where the PF retrieves and fills the VFs traffic statistics. We encode the VF stats in a nested manner to allow for future extensions. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:03 -07:00
Eran Ben Elisha	b42de4d012	net/mlx4_en: Show PF own statistics via ethtool Allow the user to observe the PF own statistics using ethtool with pf_ prefixed counter names. Those counters are the PF statistics out of the overall port statistics. Every PF QP is attached to a counter and the summary of those counters is the PF statistics. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:02 -07:00
Eran Ben Elisha	9616982f3f	net/mlx4_core: Add helper to query counters This is an infrastructure step for querying VF and PF counters. This code was in the IB driver, move it to the mlx4 core driver so it will be accessible for more use cases. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:02 -07:00
Eran Ben Elisha	7193a141eb	IB/mlx4: Set VF to read from QP counters As IB VFs are not capable to read the port counters through MADs, move there to read their own QP counters to gather statistics. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:02 -07:00
Eran Ben Elisha	c3abb51bdb	IB/mlx4: Add RoCE/IB dedicated counters This is an infrastructure step to attach all the QPs opened from the IB driver to a counter in order to collect VF stats from the PF using those counters. If the port's type is Ethernet, the counter policy demands two counters per port (one for RoCE and one for Ethernet). The port default counter (allocated in mlx4_core) is used for the Ethernet netdev QPs and we allocate another counter for RoCE. If the port's traffic is Infiniband, the counter policy demands one counter per port, so it can use the port's default counter. Also, Add 'allocated' flag for each counter in order to clean it at unload. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:02 -07:00
Eran Ben Elisha	6de5f7f6a1	net/mlx4_core: Allocate default counter per port Default counter per port will be allocated at the mlx4 core driver load. Every QP opened by the Ethernet driver will be attached to the port's default counter. This is an infrastructure step to collect VF statistics from the PF. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:02 -07:00
Eran Ben Elisha	68230242cd	net/mlx4_core: Add port attribute when tracking counters Counter will get its port attribute within the resource tracker when the first QP attached to it is modified to RTR. If a QP is counter-less, an attempt to create a new counter with assigned port will be made. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:02 -07:00
Eran Ben Elisha	9de92c60be	net/mlx4_core: Adjust counter grant policy in the resource tracker Each physical function has a guarantee of two counters per port, one for a default counter and one for the IB driver. Each virtual function has a guarantee of one counter per port. All other counters are free and can be obtained on demand. This is a preparation step for supporting a get_vf_stats ndo call, so we can promise a counter for every VF in order to collect their statistics from the PF context. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:01 -07:00
Eran Ben Elisha	2632d18d3a	net/mlx4_core: Remove counters table allocation from VF flow Since virtual functions get their counters indices allocation from the PF, allocate counters indices bitmap only in case the function isn't virtual. Also, check that the device has counters to allocate before creating the indices bitmap table. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:01 -07:00
Eran Ben Elisha	47d8417f59	net/mlx4_core: Add sink counter Reserve the last valid counter index for "sink" counter, when a new counter cannot be allocated, the driver will use this counter. In order to avoid allocating this counter on any other flow, fix the indices bitmap allocation range, and reserve the sink counter index. Add macro for the sink counter index and replace all appearences of the index with the macro. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:01 -07:00
Eran Ben Elisha	b72ca7e96a	net/mlx4_core: Reset counters data when freed Add resetting the counter data to the free counter flow, so the counter's data won't be accessible anymore if querying the counter. Also, on next counter allocation (to another VM for example), it will be fresh and clear. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:01 -07:00
Eran Ben Elisha	efa6bc91cb	net/mlx4_core: Check before cleaning counters bitmap If counters are not supported by the device. The indices bitmap table is not allocated during initialization. Add the symmetrical check before cleaning the counters bitmap table or freeing a counter. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:01 -07:00
Scott Feldman	b4ad7baa01	bridge: del external_learned fdbs from device on flush or ageout We need to delete from offload the device externally learnded fdbs when any one of these events happen: 1) Bridge ages out fdb. (When bridge is doing ageing vs. device doing ageing. If device is doing ageing, it would send SWITCHDEV_FDB_DEL directly). 2) STP state change flushes fdbs on port. 3) User uses sysfs interface to flush fdbs from bridge or bridge port: echo 1 >/sys/class/net/BR_DEV/bridge/flush echo 1 >/sys/class/net/BR_PORT/brport/flush 4) Offload driver send event SWITCHDEV_FDB_DEL to delete fdb entry. For rocker, we can now get called to delete fdb entry in wait and nowait contexts, so set NOWAIT flag when deleting fdb entry. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:08:49 -07:00
David S. Miller	023033b1ec	NFC 4.2 pull request This is the NFC pull request for 4.2. - NCI drivers can now define their own handlers for processing proprietary NCI responses and notifications. - NFC vendors can use a dedicated netlink API to send their own proprietary commands, like e.g. all commands needed to implement vendor specific manufacturing tools. - A new generic NCI over UART driver against which any NCI chipset running on top of a serial interface can register. - The st21nfcb driver is renamed to st-nci as it can and will support most of ST Microelectronics NCI chipsets. - The st21nfcb driver can put its CLF in hibernate mode and save significant amount of power. - A few st21nfcb minor fixes. - The NXP NCI driver now supports ACPI enumeration. - The Marvell NCI driver now supports both USB and serial physical interfaces. - The Marvell NCI drivers also supports NCI frames being muxed over HCI. This is a setting that can be defined by a DT property. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVe2k1AAoJEIqAPN1PVmxKt88P/11r4VVq57Kh4BHKTa6CAs5m FBMmQdGlpU+O8VUjQLl7Y+GWoVTmDOUm/uKd6xWIvgBl4/X+UKwhNVldpsWXvw1t cTnn0BykvvfA4FOQJBqgGTC38oC04REr4uTK3+NnjE6sBpQ86Ljfk7xarMKdDpKI TKchY4sIOHIHXHVPVjW4fyEVF6pUduTenH73zWPkyKZgOQaSbgR75j12WMEI20kw POykX9t6UcPDzcJ+doktUgfxhHB1YlML7Z5xNrIkbk4kyAj140Ds9mzEEBllyivc xX1Cy6h3S+vrnx/CnNa85nA/y7cH0oK8NVQwmYicqKT/W7wASF7JZYLT6A7E81c1 zLGHWVEK0awS/KM+bLI3ixQVNWFDqYPM8R36Ag0NotUZJPKiHRQkyBU3VSS8XT2f HRBlYgNYSKfR1m9LCPtm+sq6psP7cnvRAfzDrZuWtyqctriZKqCuOXbVAHJtb/3s gHxMkfyTL1zRW7u/xGid15JiicNAJd2aQmkHzZJCnIgUyTrnvhSNX6sRjBZ6iQCf z3+aTIaNEApOZ2qtfOrnSTtW0wjOBdpUK3hlSX5ozQ+V6dspoN0ld1JWURQPqkzu jWwCONO29/9yuoc/qTbWPGL4avAqyCFEzhAk/Ux19jIqoXNzHVF9f4HICc1aRLTf TKpcrQPYB61fU07CV9uT =+IG2 -----END PGP SIGNATURE----- Merge tag 'nfc-next-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz says: ==================== NFC 4.2 pull request This is the NFC pull request for 4.2. - NCI drivers can now define their own handlers for processing proprietary NCI responses and notifications. - NFC vendors can use a dedicated netlink API to send their own proprietary commands, like e.g. all commands needed to implement vendor specific manufacturing tools. - A new generic NCI over UART driver against which any NCI chipset running on top of a serial interface can register. - The st21nfcb driver is renamed to st-nci as it can and will support most of ST Microelectronics NCI chipsets. - The st21nfcb driver can put its CLF in hibernate mode and save significant amount of power. - A few st21nfcb minor fixes. - The NXP NCI driver now supports ACPI enumeration. - The Marvell NCI driver now supports both USB and serial physical interfaces. - The Marvell NCI drivers also supports NCI frames being muxed over HCI. This is a setting that can be defined by a DT property. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 16:44:19 -07:00
David S. Miller	cadfaf432b	Merge branch 'bond-netlink-3ad-attrs' Nikolay Aleksandrov says: ==================== bonding: extend the 3ad exported attributes These are two small patches that export actor_oper_port_state and partner_oper_port_state via netlink and sysfs, until now they were only exported via bond's proc entry. If this set gets accepted I have an iproute2 patch prepared that will export them with which I tested these changes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 16:40:25 -07:00
Nikolay Aleksandrov	46ea297ed6	bonding: export slave's partner_oper_port_state via sysfs and netlink Export the partner_oper_port_state of each port via sysfs and netlink. In 802.3ad mode it is valuable for the user to be able to check the partner_oper state, it is already exported via bond's proc entry. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 16:40:24 -07:00
Nikolay Aleksandrov	254cb6dbfd	bonding: export slave's actor_oper_port_state via sysfs and netlink Export the actor_oper_port_state of each port via sysfs and netlink. In 802.3ad mode it is valuable for the user to be able to check the actor_oper state, it is already exported via bond's proc entry. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 16:40:24 -07:00
David S. Miller	4d367963ac	Merge branch 'rocker-no-wait' Scott Feldman says: ==================== rocker: revert back to support for nowait processes One of the items removed from the rocker driver in the Spring Cleanup patch series was the ability to mark processing in the driver as "no wait" for those contexts where we cannot sleep. Turns out, we have "no wait" contexts where we want to program the device and we don't want to defer the processing to a process context. So re-add the ROCKER_OP_FLAG_NOWAIT flag to mark such processes, and propagate flags to mem allocator and to the device cmd executor. With NOWAIT, mem allocs are GFP_ATOMIC and device cmds are queued to the device, but the driver will not wait (sleep) for the response back from the device. My bad for removing NOWAIT support in the first place; I thought we could swing non-sleep contexts to process context using a work queue, for example, but there is push-back to keep processing in original context. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 16:06:49 -07:00
Scott Feldman	f66feaa98b	rocker: move port stop to 'no wait' processing rocker_port_stop can be called from atomic and non-atomic contexts. Since we can't test what context we're getting called in, do the processing as 'no wait', which will cover all cases. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 16:06:49 -07:00
Scott Feldman	92014b97ed	rocker: move MAC learn event back to 'no wait' processing Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 16:06:49 -07:00
Scott Feldman	ac28393e85	rocker: mark STP update as 'no wait' processing We can get STP updates from the bridge driver in atomic and non-atomic contexts. Since we can't test what context we're getting called in, do the STP processing as 'no wait', which will cover all cases. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 16:06:49 -07:00
Scott Feldman	02a9fbfc87	rocker: mark neigh update event processing as 'no wait' Neigh update event handler runs in a context where we can't sleep, so mark processing in driver with ROCKER_OP_FLAG_NOWAIT. NOWAIT will use GFP_ATOMIC for allocations and will queue cmds to the device's cmd ring but will not wait (sleep) for cmd response back from device. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 16:06:48 -07:00
Scott Feldman	179f9a2590	rocker: revert back to support for nowait processes One of the items removed from the rocker driver in the Spring Cleanup patch series was the ability to mark processing in the driver as "no wait" for those contexts where we cannot sleep. Turns out, we have "no wait" contexts where we want to program the device. So re-add the ROCKER_OP_FLAG_NOWAIT flag to mark such processes, and propagate flags to mem allocator and to the device cmd executor. With NOWAIT, mem allocs are GFP_ATOMIC and device cmds are queued to the device, but the driver will not wait (sleep) for the response back from the device. My bad for removing NOWAIT support in the first place; I thought we could swing non-sleep contexts to process context using a work queue, for example, but there is push-back to keep processing in original context. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 16:06:48 -07:00
Scott Feldman	4d81db4156	rocker: fix neigh tbl index increment race rocker->neigh_tbl_next_index is used to generate unique indices for neigh entries programmed into the device. The way new indices were generated was racy with the new prepare-commit transaction model. A simple fix here removes the race. The race was with two processes getting the same index, one process using prepare-commit, the other not: Proc A Proc B PREPARE phase get neigh_tbl_next_index NONE phase get neigh_tbl_next_index neigh_tbl_next_index++ COMMIT phase neigh_tbl_next_index++ Both A and B got the same index. The fix is to store and increment neigh_tbl_next_index in the PREPARE (or NONE) phase and use value in COMMIT phase: Proc A Proc B PREPARE phase get neigh_tbl_next_index neigh_tbl_next_index++ NONE phase get neigh_tbl_next_index neigh_tbl_next_index++ COMMIT phase // use value stashed in PREPARE phase Reported-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 16:04:21 -07:00
Scott Feldman	a072031084	rocker: gaurd against NULL rocker_port when removing ports The ports array is filled in as ports are probed, but if probing doesn't finish, we need to stop only those ports that where probed successfully. Check the ports array for NULL to skip un-probed ports when stopping. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 16:03:48 -07:00
Eric Dumazet	9464ca6500	net: make u64_stats_init() a function Using a function instead of a macro is cleaner and remove following W=1 warnings (extract) In file included from net/ipv6/ip6_vti.c:29:0: net/ipv6/ip6_vti.c: In function ‘vti6_dev_init_gen’: include/linux/netdevice.h:2029:18: warning: variable ‘stat’ set but not used [-Wunused-but-set-variable] typeof(type) stat; \ ^ net/ipv6/ip6_vti.c:862:16: note: in expansion of macro ‘netdev_alloc_pcpu_stats’ dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats); ^ CC [M] net/ipv6/sit.o In file included from net/ipv6/sit.c:30:0: net/ipv6/sit.c: In function ‘ipip6_tunnel_init’: include/linux/netdevice.h:2029:18: warning: variable ‘stat’ set but not used [-Wunused-but-set-variable] typeof(type) stat; \ ^ Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 16:02:52 -07:00
Scott Feldman	7f10953949	bridge: use either ndo VLAN ops or switchdev VLAN ops to install MASTER vlans v2: Move struct switchdev_obj automatics to inner scope where there used. v1: To maintain backward compatibility with the existing iproute2 "bridge vlan" command, let bridge's setlink/dellink handler call into either the port driver's 8021q ndo ops or the port driver's bridge_setlink/dellink ops. This allows port driver to choose 8021q ops or the newer bridge_setlink/dellink ops when implementing VLAN add/del filtering on the device. The iproute "bridge vlan" command does not need to be modified. To summarize using the "bridge vlan" command examples, we have: 1) bridge vlan add\|del vid VID dev DEV Here iproute2 sets MASTER flag. Bridge's bridge_setlink/dellink is called. Vlan is set on bridge for port. If port driver implements ndo 8021q ops, call those to port driver can install vlan filter on device. Otherwise, if port driver implements bridge_setlink/dellink ops, call those to install vlan filter to device. This option only works if port is bridged. 2) bridge vlan add\|del vid VID dev DEV master Same as 1) 3) bridge vlan add\|del vid VID dev DEV self Bridge's bridge_setlink/dellink isn't called. Port driver's bridge_setlink/dellink is called, if implemented. This option works if port is bridged or not. If port is not bridged, a VLAN can still be added/deleted to device filter using this variant. 4) bridge vlan add\|del vid VID dev DEV master self This is a combination of 1) and 3), but will only work if port is bridged. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 16:02:21 -07:00
David S. Miller	9f42c8b3eb	Merge branch 'bpf-share-helpers' Alexei Starovoitov says: ==================== v1->v2: switched to init_user_ns from current_user_ns as suggested by Andy Introduce new helpers to access 'struct task_struct'->pid, tgid, uid, gid, comm fields in tracing and networking. Share bpf_trace_printk() and bpf_get_smp_processor_id() helpers between tracing and networking. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 15:58:26 -07:00
Alexei Starovoitov	ab1973d325	bpf: let kprobe programs use bpf_get_smp_processor_id() helper It's useful to do per-cpu histograms. Suggested-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 15:53:50 -07:00
Alexei Starovoitov	0756ea3e85	bpf: allow networking programs to use bpf_trace_printk() for debugging bpf_trace_printk() is a helper function used to debug eBPF programs. Let socket and TC programs use it as well. Note, it's DEBUG ONLY helper. If it's used in the program, the kernel will print warning banner to make sure users don't use it in production. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 15:53:50 -07:00
Alexei Starovoitov	ffeedafbf0	bpf: introduce current->pid, tgid, uid, gid, comm accessors eBPF programs attached to kprobes need to filter based on current->pid, uid and other fields, so introduce helper functions: u64 bpf_get_current_pid_tgid(void) Return: current->tgid << 32 \| current->pid u64 bpf_get_current_uid_gid(void) Return: current_gid << 32 \| current_uid bpf_get_current_comm(char *buf, int size_of_buf) stores current->comm into buf They can be used from the programs attached to TC as well to classify packets based on current task fields. Update tracex2 example to print histogram of write syscalls for each process instead of aggregated for all. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 15:53:50 -07:00
David S. Miller	ada6c1de9e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next This a bit large (and late) patchset that contains Netfilter updates for net-next. Most relevantly br_netfilter fixes, ipset RCU support, removal of x_tables percpu ruleset copy and rework of the nf_tables netdev support. More specifically, they are: 1) Warn the user when there is a better protocol conntracker available, from Marcelo Ricardo Leitner. 2) Fix forwarding of IPv6 fragmented traffic in br_netfilter, from Bernhard Thaler. This comes with several patches to prepare the change in first place. 3) Get rid of special mtu handling of PPPoE/VLAN frames for br_netfilter. This is not needed anymore since now we use the largest fragment size to refragment, from Florian Westphal. 4) Restore vlan tag when refragmenting in br_netfilter, also from Florian. 5) Get rid of the percpu ruleset copy in x_tables, from Florian. Plus another follow up patch to refine it from Eric Dumazet. 6) Several ipset cleanups, fixes and finally RCU support, from Jozsef Kadlecsik. 7) Get rid of parens in Netfilter Kconfig files. 8) Attach the net_device to the basechain as opposed to the initial per table approach in the nf_tables netdev family. 9) Subscribe to netdev events to detect the removal and registration of a device that is referenced by a basechain. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 14:30:32 -07:00
Pablo Neira Ayuso	835b803377	netfilter: nf_tables_netdev: unregister hooks on net_device removal In case the net_device is gone, we have to unregister the hooks and put back the reference on the net_device object. Once it comes back, register them again. This also covers the device rename case. This patch also adds a new flag to indicate that the basechain is disabled, so their hooks are not registered. This flag is used by the netdev family to handle the case where the net_device object is gone. Currently this flag is not exposed to userspace. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 23:02:35 +02:00
Pablo Neira Ayuso	d8ee8f7c56	netfilter: nf_tables: add nft_register_basechain() and nft_unregister_basechain() This wrapper functions take care of hook registration for basechains. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 23:02:33 +02:00
Pablo Neira Ayuso	2cbce139fc	netfilter: nf_tables: attach net_device to basechain The device is part of the hook configuration, so instead of a global configuration per table, set it to each of the basechain that we create. This patch reworks `ebddf1a8d7` ("netfilter: nf_tables: allow to bind table to net_device"). Note that this adds a dev_name field in the nft_base_chain structure which is required the netdev notification subscription that follows up in a patch to handle gone net_devices. Suggested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 23:02:31 +02:00
Eric Dumazet	711bdde6a8	netfilter: x_tables: remove XT_TABLE_INFO_SZ and a dereference. After Florian patches, there is no need for XT_TABLE_INFO_SZ anymore : Only one copy of table is kept, instead of one copy per cpu. We also can avoid a dereference if we put table data right after xt_table_info. It reduces register pressure and helps compiler. Then, we attempt a kmalloc() if total size is under order-3 allocation, to reduce TLB pressure, as in many cases, rules fit in 32 KB. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 20:19:20 +02:00
Pablo Neira Ayuso	53b8762727	Merge branch 'master' of git://blackhole.kfki.hu/nf-next Jozsef Kadlecsik says: ==================== ipset patches for nf-next Please consider to apply the next bunch of patches for ipset. First comes the small changes, then the bugfixes and at the end the RCU related patches. * Use MSEC_PER_SEC consistently instead of the number. * Use SET_WITH_() helpers to test set extensions from Sergey Popovich. Check extensions attributes before getting extensions from Sergey Popovich. * Permit CIDR equal to the host address CIDR in IPv6 from Sergey Popovich. * Make sure we always return line number on batch in the case of error from Sergey Popovich. * Check CIDR value only when attribute is given from Sergey Popovich. * Fix cidr handling for hash:net types, reported by Jonathan Johnson. * Fix parallel resizing and listing of the same set so that the original set is kept for the whole dumping. * Make sure listing doesn't grab a set which is just being destroyed. * Remove rbtree from ip_set_hash_netiface.c in order to introduce RCU. * Replace rwlock_t with spinlock_t in "struct ip_set", change the locking in the core and simplifications in the timeout routines. * Introduce RCU locking in bitmap:* types with a slight modification in the logic on how an element is added. * Introduce RCU locking in hash:* types. This is the most complex part of the changes. * Introduce RCU locking in list type where standard rculist is used. * Fix coding styles reported by checkpatch.pl. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 18:33:09 +02:00
Pablo Neira Ayuso	f09becc79f	netfilter: Kconfig: get rid of parens around depends on According to the reporter, they are not needed. Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 17:26:37 +02:00
Kenneth Klette Jonassen	758f0d4b16	tcp: cdg: use div_u64() Fixes cross-compile to mips. Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-14 12:57:45 -07:00
Jozsef Kadlecsik	ca0f6a5cd9	netfilter: ipset: Fix coding styles reported by checkpatch.pl Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:18 +02:00
Jozsef Kadlecsik	00590fdd5b	netfilter: ipset: Introduce RCU locking in list type Standard rculist is used. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:17 +02:00
Jozsef Kadlecsik	18f84d41d3	netfilter: ipset: Introduce RCU locking in hash:* types Three types of data need to be protected in the case of the hash types: a. The hash buckets: standard rcu pointer operations are used. b. The element blobs in the hash buckets are stored in an array and a bitmap is used for book-keeping to tell which elements in the array are used or free. c. Networks per cidr values and the cidr values themselves are stored in fix sized arrays and need no protection. The values are modified in such an order that in the worst case an element testing is repeated once with the same cidr value. The ipset hash approach uses arrays instead of lists and therefore is incompatible with rhashtable. Performance is tested by Jesper Dangaard Brouer: Simple drop in FORWARD ~~~~~~~~~~~~~~~~~~~~~~ Dropping via simple iptables net-mask match:: iptables -t raw -N simple \|\| iptables -t raw -F simple iptables -t raw -I simple -s 198.18.0.0/15 -j DROP iptables -t raw -D PREROUTING -j simple iptables -t raw -I PREROUTING -j simple Drop performance in "raw": 11.3Mpps Generator: sending 12.2Mpps (tx:12264083 pps) Drop via original ipset in RAW table ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Create a set with lots of elements:: sudo ./ipset destroy test echo "create test hash:ip hashsize 65536" > test.set for x in `seq 0 255`; do for y in `seq 0 255`; do echo "add test 198.18.$x.$y" >> test.set done done sudo ./ipset restore < test.set Dropping via ipset:: iptables -t raw -F iptables -t raw -N net198 \|\| iptables -t raw -F net198 iptables -t raw -I net198 -m set --match-set test src -j DROP iptables -t raw -I PREROUTING -j net198 Drop performance in "raw" with ipset: 8Mpps Perf report numbers ipset drop in "raw":: + 24.65% ksoftirqd/1 [ip_set] [k] ip_set_test - 21.42% ksoftirqd/1 [kernel.kallsyms] [k] _raw_read_lock_bh - _raw_read_lock_bh + 99.88% ip_set_test - 19.42% ksoftirqd/1 [kernel.kallsyms] [k] _raw_read_unlock_bh - _raw_read_unlock_bh + 99.72% ip_set_test + 4.31% ksoftirqd/1 [ip_set_hash_ip] [k] hash_ip4_kadt + 2.27% ksoftirqd/1 [ixgbe] [k] ixgbe_fetch_rx_buffer + 2.18% ksoftirqd/1 [ip_tables] [k] ipt_do_table + 1.81% ksoftirqd/1 [ip_set_hash_ip] [k] hash_ip4_test + 1.61% ksoftirqd/1 [kernel.kallsyms] [k] __netif_receive_skb_core + 1.44% ksoftirqd/1 [kernel.kallsyms] [k] build_skb + 1.42% ksoftirqd/1 [kernel.kallsyms] [k] ip_rcv + 1.36% ksoftirqd/1 [kernel.kallsyms] [k] __local_bh_enable_ip + 1.16% ksoftirqd/1 [kernel.kallsyms] [k] dev_gro_receive + 1.09% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_unlock + 0.96% ksoftirqd/1 [ixgbe] [k] ixgbe_clean_rx_irq + 0.95% ksoftirqd/1 [kernel.kallsyms] [k] __netdev_alloc_frag + 0.88% ksoftirqd/1 [kernel.kallsyms] [k] kmem_cache_alloc + 0.87% ksoftirqd/1 [xt_set] [k] set_match_v3 + 0.85% ksoftirqd/1 [kernel.kallsyms] [k] inet_gro_receive + 0.83% ksoftirqd/1 [kernel.kallsyms] [k] nf_iterate + 0.76% ksoftirqd/1 [kernel.kallsyms] [k] put_compound_page + 0.75% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_lock Drop via ipset in RAW table with RCU-locking ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ With RCU locking, the RW-lock is gone. Drop performance in "raw" with ipset with RCU-locking: 11.3Mpps Performance-tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:17 +02:00
Jozsef Kadlecsik	96f51428c4	netfilter: ipset: Introduce RCU locking in bitmap:* types There's nothing much required because the bitmap types use atomic bit operations. However the logic of adding elements slightly changed: first the MAC address updated (which is not atomic), then the element activated (added). The extensions may call kfree_rcu() therefore we call rcu_barrier() at module removal. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:16 +02:00
Jozsef Kadlecsik	b57b2d1fa5	netfilter: ipset: Prepare the ipset core to use RCU at set level Replace rwlock_t with spinlock_t in "struct ip_set" and change the locking accordingly. Convert the comment extension into an rcu-avare object. Also, simplify the timeout routines. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:16 +02:00
Jozsef Kadlecsik	bd55389cc3	netfilter:ipset Remove rbtree from hash:net,iface Remove rbtree in order to introduce RCU instead of rwlock in ipset Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:15 +02:00
Jozsef Kadlecsik	9c1ba5c809	netfilter: ipset: Make sure listing doesn't grab a set which is just being destroyed. There was a small window when all sets are destroyed and a concurrent listing of all sets could grab a set which is just being destroyed. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:15 +02:00
Jozsef Kadlecsik	c4c997839c	netfilter: ipset: Fix parallel resizing and listing of the same set When elements added to a hash:* type of set and resizing triggered, parallel listing could start to list the original set (before resizing) and "continue" with listing the new set. Fix it by references and using the original hash table for listing. Therefore the destroying of the original hash table may happen from the resizing or listing functions. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:15 +02:00

1 2 3 4 5 ...

521673 commits