Commit graph

506953 commits

Author SHA1 Message Date
Herbert Xu 393619474e rhashtable: Fix read-side crash during rehash
This patch fixes a typo rhashtable_lookup_compare where we fail
to recompute the hash when looking up the new table.  This causes
elements to be missed and potentially a crash during a resize.

Reported-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 23:02:30 -04:00
Daniel Borkmann a5b6846f9e rhashtable: kill ht->shift atomic operations
Commit c0c09bfdc4 ("rhashtable: avoid unnecessary wakeup for worker
queue") changed ht->shift to be atomic, which is actually unnecessary.

Instead of leaving the current shift in the core rhashtable structure,
it can be cached inside the individual bucket tables.

There, it will only be initialized once during a new table allocation
in the shrink/expansion slow path, and from then onward it stays immutable
for the rest of the bucket table liftime.

That allows shift to be non-atomic. The patch also moves hash_rnd
management into the table setup. The rhashtable structure now consumes
3 instead of 4 cachelines.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ying Xue <ying.xue@windriver.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 23:02:30 -04:00
Herbert Xu 9497df88ab rhashtable: Fix reader/rehash race
There is a potential race condition between readers and the rehasher.
In particular, the rehasher could have started a rehash while the
reader finishes a scan of the old table but fails to see the new
table pointer.

This patch closes this window by adding smp_wmb/smp_rmb.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 23:02:30 -04:00
David S. Miller 5ff0d16aac Merge branch 'listener_refactor'
Eric Dumazet says:

====================
inet: tcp listener refactoring, part 8

These patches prepare request socks being hashed into general ehash
table : We declare 3 aliases (ireq_state, ireq_refcnt, ireq_family)

Note that refcnt is not yet handled, this will be done later.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:27 -04:00
Eric Dumazet 3f66b083a5 inet: introduce ireq_family
Before inserting request socks into general hash table,
fill their socket family.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:13 -04:00
Eric Dumazet d4f06873b6 inet: get_openreq4() & get_openreq6() do not need listener
ireq->ir_num contains local port, use it.

Also, get_openreq4() dumping listen_sk->refcnt makes litle sense.

inet_diag_fill_req() can also use ireq->ir_num

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:13 -04:00
Eric Dumazet 41b822c59e inet: prepare sock_edemux() & sock_gen_put() for new SYN_RECV state
sock_edemux() & sock_gen_put() should be ready to cope with request socks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:13 -04:00
Eric Dumazet 0159dfd3d7 net: add req_prot_cleanup() & req_prot_init() helpers
Make proto_register() & proto_unregister() a bit nicer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:13 -04:00
Eric Dumazet 1e2e01172f inet: add rsk_refcnt/ireq_refcnt to request socks
When request socks will be in ehash, they'll need to be refcounted.

This patch adds rsk_refcnt/ireq_refcnt macros, and adds
reqsk_put() function, but nothing yet use them.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:13 -04:00
Eric Dumazet d34ac51b76 inet: add ireq_state field to inet_request_sock
We need to identify request sock when they'll be visible in
global ehash table.

ireq_state is an alias to req.__req_common.skc_state.

Its value is set to TCP_NEW_SYN_RECV

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:12 -04:00
Eric Dumazet 10feb428a5 inet: add TCP_NEW_SYN_RECV state
TCP_SYN_RECV state is currently used by fast open sockets.

Initial TCP requests (the pseudo sockets created when a SYN is received)
are not yet associated to a state. They are attached to their parent,
and the parent is in TCP_LISTEN state.

This commit adds TCP_NEW_SYN_RECV state, so that we can convert
TCP stack to a different schem gradually.

This state is not exported to user space.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:12 -04:00
Eric Dumazet bd337c581b ipv6: add missing ireq_net & ir_cookie initializations
I forgot to update dccp_v6_conn_request() & cookie_v6_check().
They both need to set ireq->ireq_net and ireq->ir_cookie

Lets clear ireq->ir_cookie in inet_reqsk_alloc()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 33cf7c90fe ("net: add real socket cookies")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:12 -04:00
Daniel Borkmann 54720df130 cls_bpf: do eBPF invocation under non-bh RCU lock variant for maps
Currently, it is possible in cls_bpf to access eBPF maps only under
rcu_read_lock_bh() variants: while on ingress side, that is, handle_ing(),
the classifier would be called from __netif_receive_skb_core() under
rcu_read_lock(); on egress side, however, it's rcu_read_lock_bh() via
__dev_queue_xmit().

This rcu/rcu_bh mix doesn't work together with eBPF maps as they require
soley to be called under rcu_read_lock(). eBPF maps could also be shared
among various other eBPF programs (possibly even with other eBPF program
types, f.e. tracing) and user space processes, so any context is assumed.

Therefore, a possible fix for cls_bpf is to wrap/nest eBPF program
invocation under non-bh RCU lock variant.

Fixes: e2e9b6541d ("cls_bpf: add initial eBPF support for programmable classifiers")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 18:33:15 -04:00
David S. Miller 06741d055b Merge branch 'fib_trie_table_merge_fixes'
Alexander Duyck says:

====================
fib_trie: Minor fixes for table merge

This patch set addresses two issues reported with the tables merged, the
first is a NULL pointer dereference, and the other is to remove a WARN_ON
and set the ordering for aliases from different tables with the same slen
values.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 18:26:58 -04:00
Alexander Duyck 0b65bd97ba fib_trie: Provide a deterministic order for fib_alias w/ tables merged
This change makes it so that we should always have a deterministic ordering
for the main and local aliases within the merged table when two leaves
overlap.

So for example if we have a leaf with a key of 192.168.254.0.  If we
previously added two aliases with a prefix length of 24 from both local and
main the first entry would be first and the second would be second.  When I
was coding this I had added a WARN_ON should such a situation occur as I
wasn't sure how likely it would be.  However this WARN_ON has been
triggered so this is something that should be addressed.

With this patch the ordering of the aliases is as follows.  First they are
sorted on prefix length, then on their table ID, then tos, and finally
priority.  This way what we end up doing is essentially interleaving the
two tables on what used to be leaf_info structure boundaries.

Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 18:26:51 -04:00
Alexander Duyck 3c9e9f7320 fib_trie: Avoid NULL pointer if local table is not allocated
The function fib_unmerge assumed the local table had already been
allocated.  If that is not the case however when custom rules are applied
then this can result in a NULL pointer dereference.

In order to prevent this we must check the value of the local table pointer
and if it is NULL simply return 0 as there is no local table to separate
from the main.

Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse")
Reported-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 18:26:51 -04:00
Daniel Borkmann 80f1d68ccb ebpf: verifier: check that call reg with ARG_ANYTHING is initialized
I noticed that a helper function with argument type ARG_ANYTHING does
not need to have an initialized value (register).

This can worst case lead to unintented stack memory leakage in future
helper functions if they are not carefully designed, or unintended
application behaviour in case the application developer was not careful
enough to match a correct helper function signature in the API.

The underlying issue is that ARG_ANYTHING should actually be split
into two different semantics:

  1) ARG_DONTCARE for function arguments that the helper function
     does not care about (in other words: the default for unused
     function arguments), and

  2) ARG_ANYTHING that is an argument actually being used by a
     helper function and *guaranteed* to be an initialized register.

The current risk is low: ARG_ANYTHING is only used for the 'flags'
argument (r4) in bpf_map_update_elem() that internally does strict
checking.

Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 15:29:31 -04:00
David S. Miller 20453d88cc Merge branch 'possible_net_t'
Eric W. Biederman says:

====================
Introduce possible_net_t

The current usage of write_pnet and read_pnet is a little laborious and
error prone as you only notice if you failed to include them if are
compiling with network namespaces enabled.

possible_net_t remedies that by using a type that is 0 bytes when
network namespaces are disabled and can only be read and written to with
read_pnet and write_pnet.

Aka less work and safer for the same effect.

I kill hold_net and release_net first as are they are haven't been used
since 2008 and are noise at the points where write_pnet and read_pnet
are used.

I have folded in Eric Dumazets suggestions to improve the killing of
hold_net and release net.  And respon.  I had to respin anyway as
there was enough changes elsewhere in the tree the previous version
of these patches did not quite apply cleanly.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:39:44 -04:00
Eric W. Biederman 0c5c9fb551 net: Introduce possible_net_t
Having to say
> #ifdef CONFIG_NET_NS
> 	struct net *net;
> #endif

in structures is a little bit wordy and a little bit error prone.

Instead it is possible to say:
> typedef struct {
> #ifdef CONFIG_NET_NS
>       struct net *net;
> #endif
> } possible_net_t;

And then in a header say:

> 	possible_net_t net;

Which is cleaner and easier to use and easier to test, as the
possible_net_t is always there no matter what the compile options.

Further this allows read_pnet and write_pnet to be functions in all
cases which is better at catching typos.

This change adds possible_net_t, updates the definitions of read_pnet
and write_pnet, updates optional struct net * variables that
write_pnet uses on to have the type possible_net_t, and finally fixes
up the b0rked users of read_pnet and write_pnet.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:39:40 -04:00
Eric W. Biederman efd7ef1c19 net: Kill hold_net release_net
hold_net and release_net were an idea that turned out to be useless.
The code has been disabled since 2008.  Kill the code it is long past due.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:39:40 -04:00
David S. Miller 6c7005f6cb Merge branch 'rhashtable-cleanups'
Herbert Xu says:

====================
rhashtable hash cleanups

This is a rebase on top of the nested lock annotation fix.

Nothing to see here, just a bunch of simple clean-ups before
I move onto something more substantial (hopefully).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:35:35 -04:00
Herbert Xu ec9f71c59e rhashtable: Remove obj_raw_hashfn
Now that the only caller of obj_raw_hashfn is head_hashfn, we can
simply kill it and fold it into the latter.

This patch also moves the common shift from head_hashfn/key_hashfn
into rht_bucket_index.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:35:30 -04:00
Herbert Xu cffaa9cb92 rhashtable: Remove key length argument to key_hashfn
key_hashfn has only one caller and it doesn't really need to supply
the key length as an extra parameter.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:35:30 -04:00
Herbert Xu eca8493330 rhashtable: Use head_hashfn instead of obj_raw_hashfn
Now that we don't have cross-table hashes, we no longer need to
keep the entire hash value so all users of obj_raw_hashfn can
use head_hashfn instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:35:30 -04:00
Herbert Xu 8d2b18793d rhashtable: Move masking back into key_hashfn
This patch reverts commit c88455ce50
("rhashtable: key_hashfn() must return full hash value") because
the only user of it always masks the hash value.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:35:30 -04:00
Julia Lawall 6b9f53bc10 net/mlx5_core: don't export static symbol
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r@
type T;
identifier f;
@@

static T f (...) { ... }

@@
identifier r.f;
declarer name EXPORT_SYMBOL;
@@

-EXPORT_SYMBOL(f);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 00:03:34 -04:00
Herbert Xu 84ed82b74d rhashtable: Add annotation to nested lock
Commit aa34a6cb04 ("rhashtable:
Add arbitrary rehash function") killed the annotation on the
nested lock which leads to bitching from lockdep.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 23:53:40 -04:00
Eric Dumazet d77c555d32 net: fix CONFIG_NET_NS=n compilation
I forgot to use write_pnet() in three locations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 33cf7c90fe ("net: add real socket cookies")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 23:28:49 -04:00
Lubomir Rintel c78ba6d64c ipv6: expose RFC4191 route preference via rtnetlink
This makes it possible to retain the route preference when RAs are handled in
userspace.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 23:28:09 -04:00
Simon Horman ac70c05b6f switchdev: correct spelling of notifier in comments
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 23:06:39 -04:00
Simon Horman d299ce149c vxlan: Correct path typo in comment
Flags are used in the return path rather than the return patch.

Fixes: af33c1adae ("vxlan: Eliminate dependency on UDP socket in transmit path")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 23:05:38 -04:00
Eric Dumazet 33cf7c90fe net: add real socket cookies
A long standing problem in netlink socket dumps is the use
of kernel socket addresses as cookies.

1) It is a security concern.

2) Sockets can be reused quite quickly, so there is
   no guarantee a cookie is used once and identify
   a flow.

3) request sock, establish sock, and timewait socks
   for a given flow have different cookies.

Part of our effort to bring better TCP statistics requires
to switch to a different allocator.

In this patch, I chose to use a per network namespace 64bit generator,
and to use it only in the case a socket needs to be dumped to netlink.
(This might be refined later if needed)

Note that I tried to carry cookies from request sock, to establish sock,
then timewait sockets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Salo <salo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 21:55:28 -04:00
Alexander Duyck 654eff4516 fib_trie: Only display main table in /proc/net/route
When we merged the tries for local and main I had overlooked the iterator
for /proc/net/route.  As a result it was outputting both local and main
when the two tries were merged.

This patch resolves that by only providing output for aliases that are
actually in the main trie.  As a result we should go back to the original
behavior which I assume will be necessary to maintain legacy support.

Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse")
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 21:24:32 -04:00
David S. Miller 388a83576a Merge branch 'dsa_phy_divert'
Florian Fainelli says:

====================
net: dsa: support PHY reads/writes diversion

This patch series completes the PHY reads/writes diversion when we need to use
the slave MII bus provided by DSA and the underlying switch drivers to
implement the real PHY reads and writes. This is particularly useful when they
are conflicting MDIO bus addresses as in the case of multiple Broadcom switches
connected to each other (internal and external, or just external).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 17:56:39 -04:00
Florian Fainelli cd28a1a9ba net: dsa: fully divert PHY reads/writes if requested
In case a PHY is found via Device Tree, and is also flagged by the
switch driver as needing indirect reads/writes using the switch driver
implemented MDIO bus, make sure that we bind this PHY to the slave MII
bus in order for this to happen.

Without this, we would succeed in having the PHY driver probe()'s
function to use slave MII bus read/write functions, because this is done
during dsa_slave_mii_init(), but past that point, the PHY driver would
not go through these diverted reads and writes.

Fixes: 0d8bcdd383 ("net: dsa: allow for more complex PHY setups")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 17:56:29 -04:00
Florian Fainelli c305c1651c net: dsa: move PHY setup on DSA MII bus to its own function
In preparation for dealing with indirect reads and writes towards
certain PHY devices, move the code which deals with binding the PHY
device to the slave MII bus created by DSA to its own function:
dsa_slave_phy_connect().

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 17:56:28 -04:00
Florian Fainelli 33d6737761 of: mdio: export of_mdio_parse_addr
Export of_mdio_parse_addr() which allows parsing a given Ethernet PHY
node MDIO address, verify it is within the allowed range, and return
its value. This is going to be useful for the DSA code which needs to
deal with multiple layers of MDIO buses.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 17:56:28 -04:00
Petri Gynther d26ea6cc48 net: bcmgenet: collect Rx discarded packet count
Bits 31:16 of RDMA_PROD_INDEX contain Rx discarded packet count, which
are the Rx packets that had to be dropped by MAC hardware since there
was no room on the Rx queue. Add code to collect this information into
the netdev stats.

Signed-off-by: Petri Gynther <pgynther@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 17:54:55 -04:00
Alexander Duyck 61f0d861fc fib_trie: Fix uninitialized variable warning
The 0-day kernel test infrastructure reported a use of uninitialized
variable warning for local_table due to the fact that the local and main
allocations had been swapped from the original setup.  This change corrects
that by making it so that we free the main table if the local table
allocation fails.

Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse")

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 17:33:44 -04:00
Sabrina Dubroca 6dede75b7e fib_trie: call fib_table_flush_external under RTNL
Move rtnl_lock() before the call to fib4_rules_exit so that
fib_table_flush_external is called under RTNL.

Fixes: 104616e74e ("switchdev: don't support custom ip rules, for now")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:46:26 -04:00
Robert Shearman 8a08919f43 mpls: Allow mpls_gso and mpls_router to be built as modules
CONFIG_MPLS=m doesn't result in a kernel module being built because it
applies to the net/mpls directory, rather than to .o files.

So revert the MPLS menuitem to being a boolean and make MPLS_GSO and
MPLS_ROUTING tristates to allow mpls_gso and mpls_router modules to be
produced as desired.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:38:54 -04:00
Shaohui Xie 19693f1166 net/fsl: remove dependency FSL_SOC from MDIO
FSL_PQ_MDIO and FSL_XGMAC_MDIO are not really depend on FSL_SOC, they
can build on non-PPC platforms.

Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:37:50 -04:00
Herbert Xu aa34a6cb04 rhashtable: Add arbitrary rehash function
This patch adds a rehash function that supports the use of any
hash function for the new table.  This is needed to support changing
the random seed value during the lifetime of the hash table.

However for now the random seed value is still constant and the
rehash function is simply used to replace the existing expand/shrink
functions.

[ ASSERT_BUCKET_LOCK() and thus debug_dump_table() +
  debug_dump_buckets() are not longer used, so delete them
  entirely. -DaveM ]

Signed-off-by: Herbert Xu <herbert.xu@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:36:21 -04:00
Herbert Xu 988dfbd795 rhashtable: Move hash_rnd into bucket_table
Currently hash_rnd is a parameter that users can set.  However,
no existing users set this parameter.  It is also something that
people are unlikely to want to set directly since it's just a
random number.

In preparation for allowing the reseeding/rehashing of rhashtable,
this patch moves hash_rnd into bucket_table so that it's now an
internal state rather than a parameter.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:28:25 -04:00
Alexander Duyck 0ddcf43d5d ipv4: FIB Local/MAIN table collapse
This patch is meant to collapse local and main into one by converting
tb_data from an array to a pointer.  Doing this allows us to point the
local table into the main while maintaining the same variables in the
table.

As such the tb_data was converted from an array to a pointer, and a new
array called data is added in order to still provide an object for tb_data
to point to.

In order to track the origin of the fib aliases a tb_id value was added in
a hole that existed on 64b systems.  Using this we can also reverse the
merge in the event that custom FIB rules are enabled.

With this patch I am seeing an improvement of 20ns to 30ns for routing
lookups as long as custom rules are not enabled, with custom rules enabled
we fall back to split tables and the original behavior.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:22:14 -04:00
Jon Paul Maloy 169bf9121b tipc: ensure that idle links are deleted when a bearer is disabled
commit afaa3f65f6
(tipc: purge links when bearer is disabled) was an attempt to resolve
a problem that turned out to have a more profound reason.

When we disable a bearer, we delete all its pertaining links if
there is no other bearer to perform failover to, or if the module
is shutting down. In case there are dual bearers, we wait with
deleting links until the failover procedure is finished.

However, this misses the case when a link on the removed bearer
was already down, so that there will be no failover procedure to
finish the link delete. This causes confusion if a new bearer is
added to replace the removed one, and also entails a small memory
leak.

This commit takes the current state of the link into account when
deciding when to delete it, and also reverses the above-mentioned
commit.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 18:37:36 -04:00
Alexander Duyck ddb4b9a132 fib_trie: Address possible NULL pointer dereference in resize
If the inflate call failed it would return NULL.  As a result tp would be
set to NULL and cause use to trigger a NULL pointer dereference in
should_halve if the inflate failed on the first attempt.

In order to prevent this we should decrement max_work before we actually
attempt to inflate as this will force us to exit before attempting to halve
a node we should have inflated.  In order to keep things symmetric between
inflate and halve I went ahead and also moved the decrement of max_work for
the halve case as well so we take care of that before we actually attempt
to halve the tnode.

Fixes: 88bae714 ("fib_trie: Add key vector to root, return parent key_vector in resize")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 18:36:56 -04:00
Stephen Rothwell 416377ea39 macb: Fix merge error.
The code removed by commit 421d9df062 ("net/macb: merge
at91_ether driver into macb driver") should be removed
in the merge resolution as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 18:33:49 -04:00
Alexander Duyck 3ec320dd5c fib_trie: Correctly handle case of key == 0 in leaf_walk_rcu
In the case of a trie that had no tnodes with a key of 0 the initial
look-up would fail resulting in an out-of-bounds cindex on the first tnode.
This resulted in an entire trie being skipped.

In order resolve this I have updated the cindex logic in the initial
look-up so that if the key is zero we will always traverse the child zero
path.

Fixes: 8be33e95 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 16:13:55 -04:00
David S. Miller 7c23aaf2ea Merge branch 'inet_diag_cleanups'
Eric Dumazet says:

====================
inet_diag: cleanups and improvements

Before changing way request socks are dumped, let's clean up this code
a bit.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 13:45:33 -04:00