Commit graph

483578 commits

Author SHA1 Message Date
Ying Xue 1b61e70ad1 tipc: remove size variable from publ_list struct
The size variable is introduced in publ_list struct to help us exactly
calculate SKB buffer sizes needed by publications when all publications
in name table are delivered in bulk in named_distribute(). But if
publication SKB buffer size is assumed to MTU, the size variable in
publ_list struct can be completely eliminated at the cost of wasting
a bit memory space for last SKB.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Tero Aho <tero.aho@coriant.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:39:55 -05:00
Joe Perches 60c04aecd8 udp: Neaten and reduce size of compute_score functions
The compute_score functions are a bit difficult to read.

Neaten them a bit to reduce object sizes and make them a
bit more intelligible.

Return early to avoid indentation and avoid unnecessary
initializations.

(allyesconfig, but w/ -O2 and no profiling)

$ size net/ipv[46]/udp.o.*
   text    data     bss     dec     hex filename
  28680    1184      25   29889    74c1 net/ipv4/udp.o.new
  28756    1184      25   29965    750d net/ipv4/udp.o.old
  17600    1010       2   18612    48b4 net/ipv6/udp.o.new
  17632    1010       2   18644    48d4 net/ipv6/udp.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:28:47 -05:00
Petri Gynther b0ba512e25 net: bcmgenet: enable driver to work without a device tree
Modify bcmgenet driver so that it can be used on Broadcom 7xxx
MIPS-based STB platforms without a device tree.

Signed-off-by: Petri Gynther <pgynther@google.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:26:59 -05:00
Haiyang Zhang c3582a2c4d hyperv: Add support for vNIC hot removal
This patch adds proper handling of the vNIC hot removal event, which includes
a rescind-channel-offer message from the host side that triggers vNIC close and
removal. In this case, the notices to the host during close and removal is not
necessary because the channel is rescinded. This patch blocks these unnecessary
messages, and lets vNIC removal process complete normally.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:24:11 -05:00
Denis Kirjanov 6867b17b26 test: bpf: expand DIV_KX to DIV_MOD_KX
Expand DIV_KX to use BPF_MOD operation in the
DIV_KX bpf 'classic' test.

CC: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:23:22 -05:00
David S. Miller aae68bc6f6 Merge branch 'tstamp-next'
Willem de Bruijn says:

====================
timestamping updates

The main goal for this patchset is to allow correlating timestamps
with the egress interface. Also introduce a warning, as discussed
previously, and update the tests to verify the new feature.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:20:55 -05:00
Willem de Bruijn cbd3aad5ce net-timestamp: expand documentation and test
Documentation:
  expand explanation of timestamp counter

Test:
  new: flag -I requests and prints PKTINFO
  new: flag -x prints payload (possibly truncated)
  fix: remove pretty print that breaks common flag '-l 1'

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:20:48 -05:00
Willem de Bruijn 829ae9d611 net-timestamp: allow reading recv cmsg on errqueue with origin tstamp
Allow reading of timestamps and cmsg at the same time on all relevant
socket families. One use is to correlate timestamps with egress
device, by asking for cmsg IP_PKTINFO.

on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
avoid changing legacy expectations, only do so if the caller sets a
new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.

on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
returned for all origins. only change is to set ifindex, which is
not initialized for all error origins.

In both cases, only generate the pktinfo message if an ifindex is
known. This is not the case for ACK timestamps.

The difference between the protocol families is probably a historical
accident as a result of the different conditions for generating cmsg
in the relevant ip(v6)_recv_error function:

ipv4:        if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
ipv6:        if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {

At one time, this was the same test bar for the ICMP/ICMP6
distinction. This is no longer true.

Signed-off-by: Willem de Bruijn <willemb@google.com>

----

Changes
  v1 -> v2
    large rewrite
    - integrate with existing pktinfo cmsg generation code
    - on ipv4: only send with new flag, to maintain legacy behavior
    - on ipv6: send at most a single pktinfo cmsg
    - on ipv6: initialize fields if not yet initialized

The recv cmsg interfaces are also relevant to the discussion of
whether looping packet headers is problematic. For v6, cmsgs that
identify many headers are already returned. This patch expands
that to v4. If it sounds reasonable, I will follow with patches

1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
   (http://patchwork.ozlabs.org/patch/366967/)
2. sysctl to conditionally drop all timestamps that have payload or
   cmsg from users without CAP_NET_RAW.
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:20:48 -05:00
Willem de Bruijn 7ce875e5ec ipv4: warn once on passing AF_INET6 socket to ip_recv_error
One line change, in response to catching an occurrence of this bug.
See also fix f4713a3dfa ("net-timestamp: make tcp_recvmsg call ...")

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:20:48 -05:00
David S. Miller 8d0c469753 Merge branch 'ebpf-next'
Alexei Starovoitov says:

====================
allow eBPF programs to be attached to sockets

V1->V2:

fixed comments in sample code to state clearly that packet data is accessed
with LD_ABS instructions and not internal skb fields.
Also replaced constants in:
BPF_LD_ABS(BPF_B, 14 + 9 /* R0 = ip->proto */),
with:
BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol) /* R0 = ip->proto */),

V1 cover:

Introduce BPF_PROG_TYPE_SOCKET_FILTER type of eBPF programs that can be
attached to sockets with setsockopt().
Allow such programs to access maps via lookup/update/delete helpers.

This feature was previewed by bpf manpage in commit b4fc1a460f30("Merge branch 'bpf-next'")
Now it can actually run.

1st patch adds LD_ABS/LD_IND instruction verification and
2nd patch adds new setsockopt() flag.
Patches 3-6 are examples in assembler and in C.

Though native eBPF programs are way more powerful than classic filters
(attachable through similar setsockopt() call), they don't have skb field
accessors yet. Like skb->pkt_type, skb->dev->ifindex are not accessible.
There are sevaral ways to achieve that. That will be in the next set of patches.
So in this set native eBPF programs can only read data from packet and
access maps.

The most powerful example is sockex2_kern.c from patch 6 where ~200 lines of C
are compiled into ~300 of eBPF instructions.
It shows how quite complex packet parsing can be done.

LLVM used to build examples is at https://github.com/iovisor/llvm
which is fork of llvm trunk that I'm cleaning up for upstreaming.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:48 -08:00
Alexei Starovoitov fbe3310840 samples: bpf: large eBPF program in C
sockex2_kern.c is purposefully large eBPF program in C.
llvm compiles ~200 lines of C code into ~300 eBPF instructions.

It's similar to __skb_flow_dissect() to demonstrate that complex packet parsing
can be done by eBPF.
Then it uses (struct flow_keys)->dst IP address (or hash of ipv6 dst) to keep
stats of number of packets per IP.
User space loads eBPF program, attaches it to loopback interface and prints
dest_ip->#packets stats every second.

Usage:
$sudo samples/bpf/sockex2
ip 127.0.0.1 count 19
ip 127.0.0.1 count 178115
ip 127.0.0.1 count 369437
ip 127.0.0.1 count 559841
ip 127.0.0.1 count 750539

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:34 -08:00
Alexei Starovoitov a80857822b samples: bpf: trivial eBPF program in C
this example does the same task as previous socket example
in assembler, but this one does it in C.

eBPF program in kernel does:
    /* assume that packet is IPv4, load one byte of IP->proto */
    int index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
    long *value;

    value = bpf_map_lookup_elem(&my_map, &index);
    if (value)
        __sync_fetch_and_add(value, 1);

Corresponding user space reads map[tcp], map[udp], map[icmp]
and prints protocol stats every second

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:33 -08:00
Alexei Starovoitov 249b812d80 samples: bpf: elf_bpf file loader
simple .o parser and loader using BPF syscall.
.o is a standard ELF generated by LLVM backend

It parses elf file compiled by llvm .c->.o
- parses 'maps' section and creates maps via BPF syscall
- parses 'license' section and passes it to syscall
- parses elf relocations for BPF maps and adjusts BPF_LD_IMM64 insns
  by storing map_fd into insn->imm and marking such insns as BPF_PSEUDO_MAP_FD
- loads eBPF programs via BPF syscall

One ELF file can contain multiple BPF programs.

int load_bpf_file(char *path);
populates prog_fd[] and map_fd[] with FDs received from bpf syscall

bpf_helpers.h - helper functions available to eBPF programs written in C

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:33 -08:00
Alexei Starovoitov 03f4723ed7 samples: bpf: example of stateful socket filtering
this socket filter example does:
- creates arraymap in kernel with key 4 bytes and value 8 bytes

- loads eBPF program which assumes that packet is IPv4 and loads one byte of
  IP->proto from the packet and uses it as a key in a map

  r0 = skb->data[ETH_HLEN + offsetof(struct iphdr, protocol)];
  *(u32*)(fp - 4) = r0;
  value = bpf_map_lookup_elem(map_fd, fp - 4);
  if (value)
       (*(u64*)value) += 1;

- attaches this program to raw socket

- every second user space reads map[IPPROTO_TCP], map[IPPROTO_UDP], map[IPPROTO_ICMP]
  to see how many packets of given protocol were seen on loopback interface

Usage:
$sudo samples/bpf/sock_example
TCP 0 UDP 0 ICMP 0 packets
TCP 187600 UDP 0 ICMP 4 packets
TCP 376504 UDP 0 ICMP 8 packets
TCP 563116 UDP 0 ICMP 12 packets
TCP 753144 UDP 0 ICMP 16 packets

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:32 -08:00
Alexei Starovoitov 89aa075832 net: sock: allow eBPF programs to be attached to sockets
introduce new setsockopt() command:

setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))

where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...)
and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER

setsockopt() calls bpf_prog_get() which increments refcnt of the program,
so it doesn't get unloaded while socket is using the program.

The same eBPF program can be attached to multiple sockets.

User task exit automatically closes socket which calls sk_filter_uncharge()
which decrements refcnt of eBPF program

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:32 -08:00
Alexei Starovoitov ddd872bc30 bpf: verifier: add checks for BPF_ABS | BPF_IND instructions
introduce program type BPF_PROG_TYPE_SOCKET_FILTER that is used
for attaching programs to sockets where ctx == skb.

add verifier checks for ABS/IND instructions which can only be seen
in socket filters, therefore the check:
  if (env->prog->aux->prog_type != BPF_PROG_TYPE_SOCKET_FILTER)
    verbose("BPF_LD_ABS|IND instructions are only allowed in socket filters\n");

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:32 -08:00
Jason Wang f51a5e82ea tun/macvtap: use consume_skb() instead of kfree_skb() when needed
To be more friendly with drop monitor, we should only call kfree_skb() when
the packets were dropped and use consume_skb() in other cases.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:45:09 -08:00
Markus Elfring 6db16718c9 net-PA Semi: Deletion of unnecessary checks before the function call "pci_dev_put"
The pci_dev_put() function tests whether its argument is NULL
and then returns immediately. Thus the test around the call
is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:14:20 -08:00
Markus Elfring 04901cea21 net-ipvlan: Deletion of an unnecessary check before the function call "free_percpu"
The free_percpu() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:14:20 -08:00
Markus Elfring 39af455daf net: cassini: Deletion of an unnecessary check before the function call "vfree"
The vfree() function performs also input parameter validation.
Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:14:19 -08:00
Andy Shevchenko c4b2b9a849 stmmac: pci: allocate memory resources dynamically
Instead of using global variables we are going to use dynamically allocated
memory. It allows to append a support of more than one ethernet adapter which
might have different settings simultaniously.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:03:48 -08:00
David S. Miller 244ebd9f8f Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains netfilter updates for net-next. Basically,
enhancements for xt_recent, skip zeroing of timer in conntrack, fix
linking problem with recent redirect support for nf_tables, ipset
updates and a couple of cleanups. More specifically, they are:

1) Rise maximum number per IP address to be remembered in xt_recent
   while retaining backward compatibility, from Florian Westphal.

2) Skip zeroing timer area in nf_conn objects, also from Florian.

3) Inspect IPv4 and IPv6 traffic from the bridge to allow filtering using
   using meta l4proto and transport layer header, from Alvaro Neira.

4) Fix linking problems in the new redirect support when CONFIG_IPV6=n
   and IP6_NF_IPTABLES=n.

And ipset updates from Jozsef Kadlecsik:

5) Support updating element extensions when the set is full (fixes
   netfilter bugzilla id 880).

6) Fix set match with 32-bits userspace / 64-bits kernel.

7) Indicate explicitly when /0 networks are supported in ipset.

8) Simplify cidr handling for hash:*net* types.

9) Allocate the proper size of memory when /0 networks are supported.

10) Explicitly add padding elements to hash:net,net and hash:net,port,
    because the elements must be u32 sized for the used hash function.

Jozsef is also cooking ipset RCU conversion which should land soon if
they reach the merge window in time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 20:56:46 -08:00
David S. Miller ddd5c50f9b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2014-12-05

This series contains updates to ixgbe and ixgbevf.

Alex provides a couple of patches to cleanup ixgbe.  First cleans up the
page reuse code getting it into a state where all the workarounds needed
are in place as well as cleaning up a few minor oversights such as using
__free_pages instead of put_page to drop a locally allocated page.  Then
cleans up the tail writes for the ixgbe descriptor queues.

Mark Peterson adds support to lookup MAC addresses in Open Firmware or
IDPROM.

Emil provides patches for ixgbe and ixgbevf to fix an issue on rmmod and
to add support for X550 in the VF driver.  First removes the read/write
operations to the CIAA/D registers since it can block access to the PCI
config space and make use of standard kernel functions for accessing the
PCI config space.  Then fixes an issue where the driver has logic to free
up used data in case any of the checks in ixgbe_probe() fail, however
there is a similar set of cleanups that can occur on driver unload in
ixgbe_remove() which can cause the rmmod command to crash.

Don provides the remaining patches in the series to complete the addition
of X550 support into the ixgbe driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 20:50:54 -08:00
Emil Tantilov 0333464f5f ixgbevf: fix possible crashes in probe and remove
This patch resolves couple of issues in ixgbevf_probe/remove():

1. Fix a case where adapter->state is tested after free_netdev() this is
same as the patch for ixgbe from Daniel Borkmann <dborkman@redhat.com>:
commit b5b2ffc057 ("ixgbe: fix use after free adapter->state test in ixgbe_remove/ixgbe_probe")

2. Move pci_set_drvdata() after all the error checks in ixgbevf_probe() and
then add a check in ixgbevf_probe() to avoid running the cleanup functions
twice in cases where probe failed.

CC: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-05 09:13:09 -08:00
Emil Tantilov 47068b0ddf ixgbevf: add support for X550 VFs
This patch adds initial support for VFs on a new mac - X550.

The patch adds the basic structures and device IDs for the X550 VFs
that would allow the driver to load and pass traffic.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-05 09:13:08 -08:00
Emil Tantilov 0fb6a55cc3 ixgbe: fix crash on rmmod after probe fail
The driver has logic to free up used data in case any of the checks in
ixgbe_probe() fail, however there is a similar set of cleanups that can
occur on driver unload in ixgbe_remove() which can cause the rmmod command
to crash.

This patch aims to fix the logic by moving pci_set_drvdata() after all error
checks and then adds a check in ixgbe_remove() to skip it altogether if
adapter comes up empty.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-05 09:13:08 -08:00
Don Skidmore 9be4a9bb34 ixgbe: bump version number
Since we now support X550 mac's bump the version number to reflect this.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-05 09:13:08 -08:00
Don Skidmore 6a14ee0cfb ixgbe: Add X550 support function pointers
This patch extends the function pointer structure to include the new
X550 class MAC types. This creates a new file ixgbe_x550.c that contains
all of the new methods.  Because of similarities to the X540 part in
some cases we just use it's methods where they can be used without any
modification.  These exported functions are now defined in the new
ixgbe_x540.h file.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-05 09:13:07 -08:00
Don Skidmore 735c35afed ixgbe: cleanup checksum to allow error results
Currently the shared code checksum calculation function only
returns a u16 and cannot return an error code. Unfortunately
a variety of errors can happen that completely prevent the
calculation of a checksum. So, change the function return value
from a u16 to an s32 and return a negative value on error, or the
positive checksum value when there is no error.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-05 09:13:07 -08:00
Don Skidmore 28abba05d9 ixgbe: add methods for combined read and write operations
Some X550 procedures will be using CS4227 PHY and need to
perform combined read and write operations.  This patch
adds those methods.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-05 09:13:07 -08:00
Don Skidmore 030eaece2d ixgbe: Add x550 SW/FW semaphore support
The X550 hardware will use more bits in the mask, so change
the prototypes to match.  This larger mask will require changes
in callers which use the higher bits. Likewise since X550 will
use different semaphore mask values and will use the lan_id
value.  So save these values in the ixgbe_phy_info struct.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-05 09:13:06 -08:00
Don Skidmore b48e4aa3e5 ixgbe: Add timeout parameter to ixgbe_host_interface_command
Since on X550 we use host interface commands to read,write and erase
some commands require more time to complete. So this adds a timeout
parameter to ixgbe_host_interface_command as wells as a return_data
parameter allowing us to return with any data.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-05 09:13:06 -08:00
Don Skidmore 0f9b232b17 ixgbe: add support for X550 extended RSS support
The new X550 family of MAC's will have a larger RSS hash (16 -> 64).
It will also support individual VF to have their own independent RSS
hash key.  This patch will enable this functionality

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-05 09:13:06 -08:00
Emil Tantilov 9079e41631 ixgbe: remove CIAA/D register reads from bad VF check
Accessing the CIAA/D register can block access to the PCI config space.

This patch removes the read/write operations to the CIAA/D registers
and makes use of standard kernel functions for accessing the PCI config
space.

In addition it moves ixgbevf_check_for_bad_vf() into the watchdog subtask
which reduces the frequency of the checks.

CC: Alex Williamson <alex.williamson@redhat.com>
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-05 09:13:05 -08:00
Martin K Petersen c762dff24c ixgbe: Look up MAC address in Open Firmware or IDPROM
Attempt to look up the MAC address in Open Firmware on systems that
support it. On SPARC resort to using the IDPROM if no OF address is
found.

Signed-off-by: Martin K Petersen <martin.petersen@oracle.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-05 09:13:05 -08:00
Alexander Duyck ad435ec689 ixgbe: Remove tail write abstraction and add missing barrier
This change cleans up the tail writes for the ixgbe descriptor queues.  The
current implementation had me confused as I wasn't sure if it was still
making use of the surprise remove logic or not.

It also adds the mmiowb which is needed on ia64, mips, and a couple other
architectures in order to synchronize the MMIO writes with the Tx queue
_xmit_lock spinlock.

Cc: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-05 09:13:05 -08:00
Alexander Duyck 18cb652a41 ixgbe: Clean-up page reuse code
This patch cleans up the page reuse code getting it into a state where all
the workarounds needed are in place as well as cleaning up a few minor
oversights such as using __free_pages instead of put_page to drop a locally
allocated page.

It also cleans up how we clear the descriptor status bits.  Previously they
were zeroed as a part of clearing the hdr_addr.  However the hdr_addr is a
64 bit field and 64 bit writes can be a bit more expensive on on 32 bit
systems.  Since we are no longer using the header split feature the upper
32 bits of the address no longer need to be cleared.  As a result we can
just clear the status bits and leave the length and VLAN fields as-is which
should provide more information in debugging.

Cc: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-05 09:13:04 -08:00
Jozsef Kadlecsik cac3763967 netfilter: ipset: Explicitly add padding elements to hash:net, net and hash:net, port, net
The elements must be u32 sized for the used hash function.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:36 +01:00
Jozsef Kadlecsik 77b4311d20 netfilter: ipset: Allocate the proper size of memory when /0 networks are supported
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:36 +01:00
Jozsef Kadlecsik 25a76f3463 netfilter: ipset: Simplify cidr handling for hash:*net* types
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:36 +01:00
Jozsef Kadlecsik 59de79cf57 netfilter: ipset: Indicate when /0 networks are supported
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:36 +01:00
Jozsef Kadlecsik a51b9199b1 netfilter: ipset: Alignment problem between 64bit kernel 32bit userspace
Sven-Haegar Koch reported the issue:

sims:~# iptables -A OUTPUT -m set --match-set testset src -j ACCEPT
iptables: Invalid argument. Run `dmesg' for more information.

In syslog:
x_tables: ip_tables: set.3 match: invalid size 48 (kernel) != (user) 32

which was introduced by the counter extension in ipset.

The patch fixes the alignment issue with introducing a new set match
revision with the fixed underlying 'struct ip_set_counter_match'
structure.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:35 +01:00
Jozsef Kadlecsik 86ac79c7be netfilter: ipset: Support updating extensions when the set is full
When the set was full (hash type and maxelem reached), it was not
possible to update the extension part of already existing elements.
The patch removes this limitation.

Fixes: https://bugzilla.netfilter.org/show_bug.cgi?id=880
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:34 +01:00
Herbert Xu d8febb77b5 tun: Fix GSO meta-data handling in tun_get_user
When we write the GSO meta-data in tun_get_user we end up advancing
the IO vector twice, thus exhausting the user buffer before we can
finish writing the packet.

Fixes: f5ff53b4d9 ("{macvtap,tun}_get_user(): switch to iov_iter")
Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:53:47 -08:00
David S. Miller cbc24658ba Merge branch 'rocker-next'
Jiri Pirko says:

====================
introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload

This patchset is just the first phase of switch and switch-ish device
support api in kernel. Note that the api will extend.

So what this patchset includes:
- introduce switchdev api skeleton for implementing switch drivers
- introduce rocker switch driver which implements switchdev api fdb and
  bridge set/get link ndos

As to the discussion if there is need to have specific class of device
representing the switch itself, so far we found no need to introduce that.
But we are generally ok with the idea and when the time comes and it will
be needed, it can be easily introduced without any disturbance.

This patchset introduces switch id export through rtnetlink and sysfs,
which is similar to what we have for port id in SR-IOV. I will send iproute2
patchset for showing the switch id for port netdevs once this is applied.
This applies also for the PF_BRIDGE and fdb iproute2 patches.

iproute2 patches are now available here:
https://github.com/jpirko/iproute2-rocker

For detailed description and version history, please see individual patches.

In v4 I reordered the patches leaving rocker patches on the end of the patchset.

In v5 I only fixed whitespace issues of patch #13

We have a TODO for related items we want to work on in near future:
https://etherpad.wikimedia.org/p/netdev-swdev-todo
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:31 -08:00
Thomas Graf 51ace887a0 rocker: Use logical operators on booleans
Silences various sparse warnings

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:27 -08:00
Thomas Graf e75605822e rocker: Add proper validation of Netlink attributes
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:26 -08:00
Scott Feldman 5111f80cbc rocker: add ndo_bridge_setlink/getlink support for learning policy
Rocker ports will use new "swdev" hwmode for bridge port offload policy.
Current supported policy settings are BR_LEARNING and BR_LEARNING_SYNC.
User can turn on/off device port FDB learning and syncing to bridge.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:26 -08:00
Jiri Pirko ce76ca689d rocker: implement ndo_fdb_dump
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:26 -08:00
Scott Feldman 6c70794500 rocker: implement L2 bridge offloading
Add L2 bridge offloading support to rocker driver.  Here, the Linux bridge
driver is used to collect swdev ports into a tagged (or untagged) VLAN
bridge.  The switchdev will offload from the bridge driver the following L2
bridging functions:

 - Learning of neighbor MAC addresses on VLAN X  Learned mac/vlan is
installed in bridge FDB.  (And removed when device unlearns mac/vlan).
Learning must be turned off on each bridge port to disable the feature in
the bridge driver.

- Flooding of multicast/broadcast and unknown unicast pkts to (STP)
active ports in bridge.  The bridge driver is unaware of the flooding happening
at the device level.  Flooding must be turned off on each bridge port to
disable the feature on the bridge driver.

- STP port state is pushed down to driver/device.  The bridge still processes
STP BDPUs and maintains port STP state (for all VLANs in bridge), but
the driver/device must be notified of port STP state change to program
the device.

Multiple (VLAN) bridges are supported.  The device (implemented per
the OF-DPA spec) must use a portion of the VLAN namespace for
internal VLANs.  Right now, the upper 255 VLANs (0xf00 to 0xffe) are
used as internal VLAN IDs for untagged traffic and are not available
as port VLANs.

The driver uses the following interfaces:

1. To track VLAN add/del on ports in bridge:

.ndo_vlan_rx_add_vid
.ndo_vlan_rx_kill_vid

2. To track port add/del membership in bridge:

NETDEV_CHANGEUPPER netdevice notifier

3. To catch static FDB entries installed on bridge/vlan by user using netlink:

.ndo_fdb_add
.ndo_fdb_del

4. To be notified on port STP state change:

.ndo_switch_port_stp_update

5. To notify bridge driver on learned/forgotten mac/vlans on bridge port:

br_fdb_external_learn_add
br_fdb_external_learn_del

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:25 -08:00