Commit graph

36052 commits

Author SHA1 Message Date
Ying Xue bafa29e341 tipc: make tipc random value aware of net namespace
After namespace is supported, each namespace should own its private
random value. So the global variable representing the random value
must be moved to tipc_net structure.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:33 -05:00
Ying Xue a62fbccecd tipc: make subscriber server support net namespace
TIPC establishes one subscriber server which allows users to subscribe
their interesting name service status. After tipc supports namespace,
one dedicated tipc stack instance is created for each namespace, and
each instance can be deemed as one independent TIPC node. As a result,
subscriber server must be built for each namespace.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:33 -05:00
Ying Xue 3474753954 tipc: make tipc node address support net namespace
If net namespace is supported in tipc, each namespace will be treated
as a separate tipc node. Therefore, every namespace must own its
private tipc node address. This means the "tipc_own_addr" global
variable of node address must be moved to tipc_net structure to
satisfy the requirement. It's turned out that users also can assign
node address for every namespace.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:33 -05:00
Ying Xue 4ac1c8d0ee tipc: name tipc name table support net namespace
TIPC name table is used to store the mapping relationship between
TIPC service name and socket port ID. When tipc supports namespace,
it allows users to publish service names only owned by a certain
namespace. Therefore, every namespace must have its private name
table to prevent service names published to one namespace from being
contaminated by other service names in another namespace. Therefore,
The name table global variable (ie, nametbl) and its lock must be
moved to tipc_net structure, and a parameter of namespace must be
added for necessary functions so that they can obtain name table
variable defined in tipc_net structure.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:33 -05:00
Ying Xue e05b31f4bf tipc: make tipc socket support net namespace
Now tipc socket table is statically allocated as a global variable.
Through it, we can look up one socket instance with port ID, insert
a new socket instance to the table, and delete a socket from the
table. But when tipc supports net namespace, each namespace must own
its specific socket table. So the global variable of socket table
must be redefined in tipc_net structure. As a concequence, a new
socket table will be allocated when a new namespace is created, and
a socket table will be deallocated when namespace is destroyed.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:33 -05:00
Ying Xue 1da465683a tipc: make tipc broadcast link support net namespace
TIPC broadcast link is statically established and its relevant states
are maintained with the global variables: "bcbearer", "bclink" and
"bcl". Allowing different namespace to own different broadcast link
instances, these variables must be moved to tipc_net structure and
broadcast link instances would be allocated and initialized when
namespace is created.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:33 -05:00
Ying Xue 7f9f95d9d9 tipc: make bearer list support net namespace
Bearer list defined as a global variable is used to store bearer
instances. When tipc supports net namespace, bearers created in
one namespace must be isolated with others allocated in other
namespaces, which requires us that the bearer list(bearer_list)
must be moved to tipc_net structure. As a result, a net namespace
pointer has to be passed to functions which access the bearer list.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:32 -05:00
Ying Xue f2f9800d49 tipc: make tipc node table aware of net namespace
Global variables associated with node table are below:
- node table list (node_htable)
- node hash table list (tipc_node_list)
- node table lock (node_list_lock)
- node number counter (tipc_num_nodes)
- node link number counter (tipc_num_links)

To make node table support namespace, above global variables must be
moved to tipc_net structure in order to keep secret for different
namespaces. As a consequence, these variables are allocated and
initialized when namespace is created, and deallocated when namespace
is destroyed. After the change, functions associated with these
variables have to utilize a namespace pointer to access them. So
adding namespace pointer as a parameter of these functions is the
major change made in the commit.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:32 -05:00
Ying Xue c93d3baa24 tipc: involve namespace infrastructure
Involve namespace infrastructure, make the "tipc_net_id" global
variable aware of per namespace, and rename it to "net_id". In
order that the conversion can be successfully done, an instance
of networking namespace must be passed to relevant functions,
allowing them to access the "net_id" variable of per namespace.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:32 -05:00
Ying Xue 54fef04ad0 tipc: remove unused tipc_link_get_max_pkt routine
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:32 -05:00
Ying Xue f2f2a96a20 tipc: feed tipc sock pointer to tipc_sk_timeout routine
In order to make tipc socket table aware of namespace, a networking
namespace instance must be passed to tipc_sk_lookup(), allowing it
to look up tipc socket instance with a given port ID from a concrete
socket table. However, as now tipc_sk_timeout() only has one port ID
parameter and is not namespace aware, it's unable to obtain a correct
socket instance through tipc_sk_lookup() just with a port ID,
especially after namespace is completely supported.

If port ID is replaced with socket instance as tipc_sk_timeout()'s
parameter, it's unnecessary to look up socket table. But as the timer
handler - tipc_sk_timeout() is run asynchronously, socket reference
must be held before its timer is launched, and must be carefully
checked to identify whether the socket reference needs to be put or
not when its timer is terminated.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:32 -05:00
Ying Xue 859fc7c0ce tipc: cleanup core.c and core.h files
Only the works of initializing and shutting down tipc module are done
in core.h and core.c files, so all stuffs which are not closely
associated with the two tasks should be moved to appropriate places.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:31 -05:00
Ying Xue 2f55c43788 tipc: remove unnecessary wrapper functions of kernel timer APIs
Not only some wrapper function like k_term_timer() is empty, but also
some others including k_start_timer() and k_cancel_timer() don't return
back any value to its caller, what's more, there is no any component
in the kernel world to do such thing. Therefore, these timer interfaces
defined in tipc module should be purged.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:31 -05:00
Ying Xue 6b8326ed14 tipc: remove tipc_core_start/stop routines
Remove redundant wrapper functions like tipc_core_start() and
tipc_core_stop(), and directly move them to their callers, such
as tipc_init() and tipc_exit(), having us clearly know what are
really done in both initialization and deinitialzation functions.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:31 -05:00
Jon Maloy 703068eee6 tipc: fix bug in broadcast retransmit code
In commit 58dc55f256 ("tipc: use generic
SKB list APIs to manage link transmission queue") we replace all list
traversal loops with the macros skb_queue_walk() or
skb_queue_walk_safe(). While the previous loops were based on the
assumption that the list was NULL-terminated, the standard macros
stop when the iterator reaches the list head, which is non-NULL.

In the function bclink_retransmit_pkt() this macro replacement has
lead to a bug. When we receive a BCAST STATE_MSG we unconditionally
call the function bclink_retransmit_pkt(), whether there really is
anything to retransmit or not, assuming that the sequence number
comparisons will lead to the correct behavior. However, if the
transmission queue is empty, or if there are no eligible buffers in
the transmission queue, we will by mistake pass the list head pointer
to the function tipc_link_retransmit(). Since the list head is not a
valid sk_buff, this leads to a crash.

In this commit we fix this by only calling tipc_link_retransmit()
if we actually found eligible buffers in the transmission queue.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:31 -05:00
Toshiaki Makita f902e8812e bridge: Add ability to enable TSO
Currently a bridge device turns off TSO feature if no bridge ports
support it. We can always enable it, since packets can be segmented on
ports by software as well as on the bridge device.
This will reduce the number of packets processed in the bridge.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:18:09 -05:00
Jon Paul Maloy 164167794c tipc: fix bug in broadcast retransmit code
In commit 58dc55f256 ("tipc: use generic
SKB list APIs to manage link transmission queue") we replace all list
traversal loops with the macros skb_queue_walk() or
skb_queue_walk_safe(). While the previous loops were based on the
assumption that the list was NULL-terminated, the standard macros
stop when the iterator reaches the list head, which is non-NULL.

In the function bclink_retransmit_pkt() this macro replacement has
lead to a bug. When we receive a BCAST STATE_MSG we unconditionally
call the function bclink_retransmit_pkt(), whether there really is
anything to retransmit or not, assuming that the sequence number
comparisons will lead to the correct behavior. However, if the
transmission queue is empty, or if there are no eligible buffers in
the transmission queue, we will by mistake pass the list head pointer
to the function tipc_link_retransmit(). Since the list head is not a
valid sk_buff, this leads to a crash.

In this commit we fix this by only calling tipc_link_retransmit()
if we actually found eligible buffers in the transmission queue.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:01:59 -05:00
Willem de Bruijn eee2f04b80 packet: make packet too small warning match condition
The expression in ll_header_truncated() tests less than or equal, but
the warning prints less than. Update the warning.

Reported-by: Jouni Malinen <jkmalinen@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:00:55 -05:00
Marcel Holtmann a936612036 Bluetooth: Process result of HCI Delete Stored Link Key command
When the HCI Delete Stored Link Key command completes, then update the
value of current stored keys in hci_dev structure.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-12 21:56:06 +02:00
Marcel Holtmann 48ce62c4fa Bluetooth: Read stored link key information when powering on controller
The information about max stored link keys and current stored link keys
should be read at controller initialization. So issue HCI Read Stored
Link Key command with BDADDR_ANY and read_all flag set to 0x01 to
retrieve this information.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-12 21:54:48 +02:00
Marcel Holtmann c2f0f97927 Bluetooth: Handle command complete event for HCI Read Stored Link Keys
When the HCI Read Stored Link Keys command completes it gives useful
information of the current stored keys and maximum keys a controller
can actually store. So process this event and store these information
in hci_dev structure.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-12 21:54:16 +02:00
Marcel Holtmann 41e91e71f6 Bluetooth: Replace send_monitor_event with queue_monitor_skb
The send_monitor_event function is essentially the same as the newly
introduced queue_monitor_skb. So instead of having duplicated code,
replace send_monitor_event with queue_monitor_skb.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-12 11:26:09 +02:00
Marcel Holtmann d7f72f6195 Bluetooth: Create generic queue_monitor_skb helper function
The hci_send_to_monitor function contains generic code for queueing the
packet into the receive queue of every monitor client. To avoid code
duplication, create a generic queue_monitor_skb function to interate
over all monitor sockets.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-12 11:26:07 +02:00
Marcel Holtmann 2b531294b0 Bluetooth: Simplify packet copy in hci_send_to_monitor function
Within the monitor functionality, the global atomic variable called
monitor_promisc ensures that no memory allocation happend when there
is actually no client listening. This means it is safe to just create
a copy of the skb since it is guaranteed that at least one client
exists. No extra checks needed.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-12 11:26:04 +02:00
Marcel Holtmann 15762fa772 Bluetooth: Add BUILD_BUG_ON for size of struct sockaddr_sco
This adds an extra check for ensuring that the size of sockaddr_sco
does not grow larger than sockaddr.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-12 11:24:24 +02:00
Marcel Holtmann 74b3fb8d0d Bluetooth: Add BUILD_BUG_ON for size of struct sockaddr_rc
This adds an extra check for ensuring that the size of sockaddr_rc
does not grow larger than sockaddr.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-12 11:24:21 +02:00
Marcel Holtmann dd6255588a Bluetooth: Add BUILD_BUG_ON for size of struct sockaddr_l2
This adds an extra check for ensuring that the size of sockaddr_l2
does not grow larger than sockaddr.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-12 11:24:19 +02:00
Marcel Holtmann b0a8e282b5 Bluetooth: Add BUILD_BUG_ON for size of struct sockaddr_hci
This adds an extra check for ensuring that the size of sockaddr_hci
does not grow larger than sockaddr.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-12 11:24:16 +02:00
Marcel Holtmann 1904a853fa Bluetooth: Add opcode parameter to hci_req_complete_t callback
When hci_req_run() calls its provided complete function and one of the
HCI commands in the sequence fails, then provide the opcode of failing
command. In case of success HCI_OP_NOP is provided since all commands
completed.

This patch fixes the prototype of hci_req_complete_t and all its users.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-12 11:16:31 +02:00
David S. Miller 2bd8221804 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
netfilter/ipvs fixes for net

The following patchset contains netfilter/ipvs fixes, they are:

1) Small fix for the FTP helper in IPVS, a diff variable may be left
   unset when CONFIG_IP_VS_IPV6 is set. Patch from Dan Carpenter.

2) Fix nf_tables port NAT in little endian archs, patch from leroy
   christophe.

3) Fix race condition between conntrack confirmation and flush from
   userspace. This is the second reincarnation to resolve this problem.

4) Make sure inner messages in the batch come with the nfnetlink header.

5) Relax strict check from nfnetlink_bind() that may break old userspace
   applications using all 1s group mask.

6) Schedule removal of chains once no sets and rules refer to them in
   the new nf_tables ruleset flush command. Reported by Asbjoern Sloth
   Toennesen.

Note that this batch comes later than usual because of the short
winter holidays.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 00:14:49 -05:00
Christoph Jaeger 46d2cfb192 packet: bail out of packet_snd() if L2 header creation fails
Due to a misplaced parenthesis, the expression

  (unlikely(offset) < 0),

which expands to

  (__builtin_expect(!!(offset), 0) < 0),

never evaluates to true. Therefore, when sending packets with
PF_PACKET/SOCK_DGRAM, packet_snd() does not abort as intended
if the creation of the layer 2 header fails.

Spotted by Coverity - CID 1259975 ("Operands don't affect result").

Fixes: 9c7077622d ("packet: make packet_snd fail on len smaller than l2 header")
Signed-off-by: Christoph Jaeger <cj@linux.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-11 21:54:03 -05:00
Linus Torvalds dc9319f5a3 Merge branch 'for-3.19' of git://linux-nfs.org/~bfields/linux
Pull two nfsd bugfixes from Bruce Fields.

* 'for-3.19' of git://linux-nfs.org/~bfields/linux:
  rpc: fix xdr_truncate_encode to handle buffer ending on page boundary
  nfsd: fix fi_delegees leak when fi_had_conflict returns true
2015-01-09 18:10:48 -08:00
Linus Torvalds 20ebb34528 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull two Ceph fixes from Sage Weil:
 "These are both pretty trivial: a sparse warning fix and size_t printk
  thing"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: fix sparse endianness warnings
  ceph: use %zu for len in ceph_fill_inline_data()
2015-01-09 17:55:00 -08:00
Johannes Berg 9b7a86f351 mac80211: fix handling TIM IE when stations disconnect
When a station disconnects with frames still pending, we clear
the TIM bit, but too late - it's only cleared when the station
is already removed from the driver, and thus the driver can get
confused (and hwsim will loudly complain.)

Fix this by clearing the TIM bit earlier, when the station has
been unlinked but not removed from the driver yet. To do this,
refactor the TIM recalculation to in that case ignore traffic
and simply assume no pending traffic - this is correct for the
disconnected station even though the frames haven't been freed
yet at that point.

This patch isn't needed for current drivers though as they don't
check the station argument to the set_tim() operation and thus
don't really run into the possible confusion.

Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-09 11:48:37 +01:00
David S. Miller fb57720daf Included changes:
- remove useless return in void functions
 - remove unused member 'primary_iface' from 'struct orig_node'
 - improve existing kernel doc
 - fix several checkpatch complaints
 - ensure socket's control block is cleared for received skbs
 - add missing DEBUG_FS dependency to BATMAN_ADV_DEBUG symbol
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJUraKDAAoJEJgn97Bh2u9eJsQQAI0HHsTbv1yWi0LyQO25ZlJG
 H0sR6Mj21ol36jvlSa4OYVsEn9QdoJ5RO+KDwkxvFPQQU0VyN0TZHEKzcPzFC98P
 r7BboF7JGI7P/ixGWhhdjoH/ECNJOoS0MioEX2YjGpGYIdDzl2BH7HHPV8WEEJTW
 A2l11owAv+Sv6PkYrx8OOqNCtSaO5ogX9BxZkFZwFZDA8VeFZO9eYIDStSnHKt5x
 /gT8EQd7bsVZc4IRZaPH8Sc18hkpcEZoDgMQ2Wwk4pw+5g+KgoLTzfAU6xXDnX34
 GqVqWNlrcdOQg5337mBH+xTszq6bvVSlYvcN+q8QZfRS/bkLEEwigcBh+H5H//Gf
 +MueLW8JFKKIH5sP2wbeTgdj4l7JYs5TlZsn7O+jroXrblT/SUSji8TJkkCiSfBP
 MjR9WTOrqZTzxRN/KoT+DgmQ1t2ZhKN9WVBOukKaBpMh9lxOPZD/pxJIJJLmFyWh
 VWcszxerll3ilFKgWR7YRz6h6tB3xs6DUMgl3PEFzjTa0xhZPT4NzOGIfuBTDLPc
 vg7HAm+DCIwCDfYn0N5/HLv0cU14CMWSy0SCJBIzvo+0fHppLgxBTeJ24lSywfqN
 89soGNmdGADTPkXnt4kX4U/XPRHTC3stzdMYiWoCmkoo5zQkiGmVLQ7TWG2Soqez
 QQ+ookKhShBLDZ2FQJYk
 =2Y3v
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- remove useless return in void functions
- remove unused member 'primary_iface' from 'struct orig_node'
- improve existing kernel doc
- fix several checkpatch complaints
- ensure socket's control block is cleared for received skbs
- add missing DEBUG_FS dependency to BATMAN_ADV_DEBUG symbol

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-08 20:14:32 -08:00
Ying Xue 07f6c4bc04 tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.

If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.

Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-08 19:47:14 -08:00
Ilya Dryomov d7d5a007b1 libceph: fix sparse endianness warnings
The only real issue is the one in auth_x.c and it came with
3.19-rc1 merge.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-01-08 20:36:57 +03:00
Alexander Aring bc6efeeeb5 ieee802154: 6lowpan: fix Makefile entry
Since commit ea81ac2e70 ("ieee802154:
create 6lowpan sub-directory") we have a subdirectory for the ieee802154
6lowpan implementation. This commit also moves the Kconfig entry inside
of net/ieee802154/6lowpan/ and forgot to rename the Makefile entry from
obj-$(CONFIG_IEEE802154_6LOWPAN) to obj-y and handle the
obj-$(CONFIG_IEEE802154_6LOWPAN) inside the created 6lowpan directory.
This will occur that the ieee802154_6lowpan can't be build.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-08 15:48:06 +01:00
Johannes Berg 79c892b850 mac80211: provide per-TID RX/TX MSDU counters
Implement the new counters cfg80211 can now advertise to userspace.
The TX code is in the sequence number handler, which is a bit odd,
but that place already knows the TID and frame type, so it was
easiest and least impact there.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:28:20 +01:00
Johannes Berg 6de39808cf nl80211: support per-TID station statistics
The base for the current statistics is pretty mixed up, support
exporting RX/TX statistics for MSDUs per TID. This (currently)
covers received MSDUs, transmitted MSDUs and retries/failures
thereof.

Doing it per TID for MSDUs makes more sense than say only per AC
because it's symmetric - we could export per-AC statistics for all
frames (which AC we used for transmission can be determined also
for management frames) but per TID is better and usually data
frames are really the ones we care about. Also, on RX we can't
determine the AC - but we do know the TID for any QoS MPDU we
received.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:28:18 +01:00
Johannes Berg a76b1942a1 cfg80211: add nl80211 beacon-only statistics
Add these two values:
 * BEACON_RX: number of beacons received from this peer
 * BEACON_SIGNAL_AVG: signal strength average for beacons only

These can then be used for Android Lollipop's statistics request.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:28:13 +01:00
Johannes Berg 319090bf6c cfg80211: remove enum station_info_flags
This is really just duplicating the list of information that's
already available in the nl80211 attribute, so remove the list.
Two small changes are needed:
 * remove STATION_INFO_ASSOC_REQ_IES complete, but the length
   (assoc_req_ies_len) can be used instead
 * add NL80211_STA_INFO_RX_DROP_MISC which exists internally
   but not in nl80211 yet

This gets rid of the duplicate maintenance of the two lists.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:28:10 +01:00
Johannes Berg 2b9a7e1bac mac80211: allow drivers to provide most station statistics
In many cases, drivers can filter things like beacons that will
skew statistics reported by mac80211. To get correct statistics
in these cases, call drivers to obtain statistics and let them
override all values, filling values from mac80211 if the driver
didn't provide them. Not all of them make sense for the driver
to fill, so some are still always done by mac80211.

Note that this doesn't currently allow a driver to say "I know
this value is wrong, don't report it at all", or to sum it up
with a mac80211 value (as could be useful for "dropped misc"),
that can be added if it turns out to be needed.

This also gets rid of the get_rssi() method as is can now be
implemented using sta_statistics().

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:28:06 +01:00
Johannes Berg 6f7a8d26e2 mac80211: send statistics with delete station event
Use the new cfg80211_del_sta_sinfo() function to send the
statistics about the deleted station with the delete event.
This lets userspace see how much traffic etc. the deleted
station used.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:28:03 +01:00
Johannes Berg cf5ead822d cfg80211: allow including station info in delete event
When a station is removed, its statistics may be interesting to
userspace, for example for further aggregation of statistics of
all stations that ever connected to an AP.

Introduce a new cfg80211_del_sta_sinfo() function (and make the
cfg80211_del_sta() a static inline calling it) to allow passing
a struct station_info along with this, and send the data in the
nl80211 event message.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:28:00 +01:00
Johannes Berg 052536abfa cfg80211: add scan time to survey data
Add the time spent scanning to the survey data so it can be
reported by drivers that collect such information.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:27:58 +01:00
Johannes Berg 11f78ac32b cfg80211: allow survey data to return global data
Not all devices are able to report survey data (particularly
time spent for various operations) per channel. As all these
statistics already exist in survey data, allow such devices
to report them (if userspace requested it)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:27:54 +01:00
Johannes Berg 4ed20bebf5 cfg80211: remove "channel" from survey names
All of the survey data is (currently) per channel anyway,
so having the word "channel" in the name does nothing. In
the next patch I'll introduce global data to the survey,
where the word "channel" is actually confusing.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-08 15:27:52 +01:00
Kristian Evensen ae406bd057 netfilter: conntrack: Remove nf_ct_conntrack_flush_report
The only user of nf_ct_conntrack_flush_report() was ctnetlink_del_conntrack().
After adding support for flushing connections with a given mark, this function
is no longer called.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-08 12:16:58 +01:00
Kristian Evensen 866476f323 netfilter: conntrack: Flush connections with a given mark
This patch adds support for selective flushing of conntrack mappings.
By adding CTA_MARK and CTA_MARK_MASK to a delete-message, the mark (and
mask) is checked before a connection is deleted while flushing.

Configuring the flush is moved out of ctnetlink_del_conntrack(), and
instead of calling nf_conntrack_flush_report(), we always call
nf_ct_iterate_cleanup().  This enables us to only make one call from the
new ctnetlink_flush_conntrack() and makes it easy to add more filter
parameters.

Filtering is done in the ctnetlink_filter_match()-function, which is
also called from ctnetlink_dump_table(). ctnetlink_dump_filter has been
renamed ctnetlink_filter, to indicated that it is no longer only used
when dumping conntrack entries.

Moreover, reject mark filters with -EOPNOTSUPP if no ct mark support is
available.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-08 12:14:20 +01:00
Alexander Aring 52d84ffc54 ieee802154: 6lowpan: rename to core
This patch renames the 6lowpan_rtnl.c file to core.c. 6lowpan_rtnl.c
contains functionality to put all 802.15.4 6LoWPAN functionality
together.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-08 07:25:59 +01:00
Alexander Aring 4dc315e267 ieee802154: 6lowpan: move transmit functionality
This patch moves all relevant transmit functionality into a separate tx.c
file. We can simple separate this functionality like we did it in mac802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-08 07:25:59 +01:00
Alexander Aring 4662a0da54 ieee802154: 6lowpan: move receive functionality
This patch moves all relevant receive functionality into a separate rx.c
file. We can simple separate this functionality like we did it in mac802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-08 07:25:59 +01:00
Alexander Aring 8691ee592c ieee802154: 6lowpan: rename internal header
This patch renames the internal header for af802154. This naming
convention is like ieee802154_i.h in mac802154 and avoids naming
confusing with the global af802154 header. Furthermore this header
contains more ieee802154 specific definitions.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-08 07:25:59 +01:00
Alexander Aring ea81ac2e70 ieee802154: create 6lowpan sub-directory
This patch creates an 6lowpan sub-directory inside ieee802154.
Additional we move all ieee802154 6lowpan relevant files into
this sub-directory instead of placing the 6lowpan related files
inside ieee802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-08 07:25:59 +01:00
Markus Pargmann 9535395640 batman-adv: Kconfig, Add missing DEBUG_FS dependency
BATMAN_ADV_DEBUG is using debugfs files for the debugging log. So it
depends on DEBUG_FS which is missing as dependency in the Kconfig file.

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-07 22:17:11 +01:00
J. Bruce Fields 49a068f82a rpc: fix xdr_truncate_encode to handle buffer ending on page boundary
A struct xdr_stream at a page boundary might point to the end of one
page or the beginning of the next, but xdr_truncate_encode isn't
prepared to handle the former.

This can cause corruption of NFSv4 READDIR replies in the case that a
readdir entry that would have exceeded the client's dircount/maxcount
limit would have ended exactly on a 4k page boundary.  You're more
likely to hit this case on large directories.

Other xdr_truncate_encode callers are probably also affected.

Reported-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Fixes: 3e19ce762b "rpc: xdr_truncate_encode"
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-07 14:03:58 -05:00
Simon Wunderlich dcba6c9b45 batman-adv: Start new development cycle
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-07 17:21:58 +01:00
Antonio Quartulli 3f68785e61 batman-adv: fix misspelled words
Reported-by: checkpatch
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-01-07 17:21:57 +01:00
Martin Hundebøll e0d9677ea3 batman-adv: clear control block of received socket buffers
Since other network components (and some drivers) uses the control block
provided in skb's, the network coding feature might wrongly assume that
an SKB has been decoded, and thus not try to code it with another packet
again. This happens for instance when batman-adv is running on a bridge device.

Fix this by clearing the control block for every received SKB.

Introduced by 3c12de9a5c
("batman-adv: network coding - code and transmit packets if possible")
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-07 17:21:57 +01:00
Antonio Quartulli 32db6aaafe batman-adv: checkpatch - remove unnecessary parentheses
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-01-07 17:21:56 +01:00
Antonio Quartulli 8a3f8b6ac5 batman-adv: checkpatch - Please don't use multiple blank lines
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-01-07 17:21:56 +01:00
Antonio Quartulli aa143d2837 batman-adv: checkpatch - Please use a blank line after declarations
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-01-07 17:21:55 +01:00
Antonio Quartulli 9687a31c84 batman-adv: checkpatch - No space is necessary after a cast
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-01-07 17:21:55 +01:00
Antonio Quartulli 24820df144 batman-adv: checkpatch - else is not generally useful after a break or return
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-01-07 17:21:55 +01:00
Martin Hundebøll a0e2877505 batman-adv: kernel doc fixes for main.{c, h}
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-07 17:21:54 +01:00
Martin Hundebøll ba97abb863 batman-adv: kernel doc fix for distributed-arp-table.h
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Acked-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-07 17:21:54 +01:00
Martin Hundebøll e335718988 batman-adv: kernel doc fixes for bridge_loop_avoidance.c
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Acked-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-07 17:21:54 +01:00
Martin Hundebøll 95298a91f9 batman-adv: kernel doc fixes for bat_iv_ogm.c
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-07 17:21:53 +01:00
Simon Wunderlich 320b936678 batman-adv: remove obsolete variable primary_iface from orig_node
This variable became obsolete when changing to the new bonding mechanism
based on the multi interface optimization. Since its not used anywhere,
remove it.

Reported-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-07 17:21:53 +01:00
Antonio Quartulli 193924472a batman-adv: avoid useless return in void functions
Cc: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-01-07 17:21:52 +01:00
Ido Yariv db12847ca8 mac80211: Re-fix accounting of the tailroom-needed counter
When hw acceleration is enabled, the GENERATE_IV or PUT_IV_SPACE flags
only require headroom space. Therefore, the tailroom-needed counter can
safely be decremented for most drivers.

The older incarnation of this patch (ca34e3b5) assumed that the above
holds true for all drivers. As reported by Christopher Chavez and
researched by Christian Lamparter and Larry Finger, this isn't a valid
assumption for p54 and cw1200.

Drivers that still require tailroom for ICV/MIC even when HW encryption
is enabled can use IEEE80211_KEY_FLAG_RESERVE_TAILROOM to indicate it.

Signed-off-by: Ido Yariv <idox.yariv@intel.com>
Cc: Christopher Chavez <chrischavez@gmx.us>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Solomon Peachy <pizza@shaftnet.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-07 14:39:32 +01:00
Johannes Berg 3a4b0c948d Merge branch 'mac80211' into mac80211-next
Merge mac80211.git to get some changes that would otherwise
cause conflicts with new changes coming here.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-07 14:39:16 +01:00
David S. Miller 44d84d7272 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-01-06 22:29:20 -05:00
Linus Torvalds bdec419638 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Just a pile of random fixes, including:

   1) Do not apply TSO limits to non-TSO packets, fix from Herbert Xu.

   2) MDI{,X} eeprom check in e100 driver is reversed, from John W.
      Linville.

   3) Missing error return assignments in several ethernet drivers, from
      Julia Lawall.

   4) Altera TSE device doesn't come back up after ifconfig down/up
      sequence, fix from Kostya Belezko.

   5) Add more cases to the check for whether the qmi_wwan device has a
      bogus MAC address and needs to be assigned a random one.  From
      Kristian Evensen.

   6) Fix interrupt hangs in CPSW, from Felipe Balbi.

   7) Implement ndo_features_check in r8152 so that the stack doesn't
      feed GSO packets which are outside of the chip's capabilities.
      From Hayes Wang"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
  qla3xxx: don't allow never end busy loop
  xen-netback: fixing the propagation of the transmit shaper timeout
  r8152: support ndo_features_check
  batman-adv: fix potential TT client + orig-node memory leak
  batman-adv: fix multicast counter when purging originators
  batman-adv: fix counter for multicast supporting nodes
  batman-adv: fix lock class for decoding hash in network-coding.c
  batman-adv: fix delayed foreign originator recognition
  batman-adv: fix and simplify condition when bonding should be used
  Revert "mac80211: Fix accounting of the tailroom-needed counter"
  net: ethernet: cpsw: fix hangs with interrupts
  enic: free all rq buffs when allocation fails
  qmi_wwan: Set random MAC on devices with buggy fw
  openvswitch: Consistently include VLAN header in flow and port stats.
  tcp: Do not apply TSO segment limit to non-TSO packets
  Altera TSE: Add missing phydev
  net/mlx4_core: Fix error flow in mlx4_init_hca()
  net/mlx4_core: Correcly update the mtt's offset in the MR re-reg flow
  qlcnic: Fix return value in qlcnic_probe()
  net: axienet: fix error return code
  ...
2015-01-06 17:48:14 -08:00
Ed Swierk 2f4383667d ethtool: Extend ethtool plugin module eeprom API to phylib
This patch extends the ethtool plugin module eeprom API to support cards
whose phy support is delegated to a separate driver.

The handlers for ETHTOOL_GMODULEINFO and ETHTOOL_GMODULEEEPROM call the
module_info and module_eeprom functions if the phy driver provides them;
otherwise the handlers call the equivalent ethtool_ops functions provided
by network drivers with built-in phy support.

Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-06 17:16:55 -05:00
Pablo Neira Ayuso a2f18db0c6 netfilter: nf_tables: fix flush ruleset chain dependencies
Jumping between chains doesn't mix well with flush ruleset. Rules
from a different chain and set elements may still refer to us.

[  353.373791] ------------[ cut here ]------------
[  353.373845] kernel BUG at net/netfilter/nf_tables_api.c:1159!
[  353.373896] invalid opcode: 0000 [#1] SMP
[  353.373942] Modules linked in: intel_powerclamp uas iwldvm iwlwifi
[  353.374017] CPU: 0 PID: 6445 Comm: 31c3.nft Not tainted 3.18.0 #98
[  353.374069] Hardware name: LENOVO 5129CTO/5129CTO, BIOS 6QET47WW (1.17 ) 07/14/2010
[...]
[  353.375018] Call Trace:
[  353.375046]  [<ffffffff81964c31>] ? nf_tables_commit+0x381/0x540
[  353.375101]  [<ffffffff81949118>] nfnetlink_rcv+0x3d8/0x4b0
[  353.375150]  [<ffffffff81943fc5>] netlink_unicast+0x105/0x1a0
[  353.375200]  [<ffffffff8194438e>] netlink_sendmsg+0x32e/0x790
[  353.375253]  [<ffffffff818f398e>] sock_sendmsg+0x8e/0xc0
[  353.375300]  [<ffffffff818f36b9>] ? move_addr_to_kernel.part.20+0x19/0x70
[  353.375357]  [<ffffffff818f44f9>] ? move_addr_to_kernel+0x19/0x30
[  353.375410]  [<ffffffff819016d2>] ? verify_iovec+0x42/0xd0
[  353.375459]  [<ffffffff818f3e10>] ___sys_sendmsg+0x3f0/0x400
[  353.375510]  [<ffffffff810615fa>] ? native_sched_clock+0x2a/0x90
[  353.375563]  [<ffffffff81176697>] ? acct_account_cputime+0x17/0x20
[  353.375616]  [<ffffffff8110dc78>] ? account_user_time+0x88/0xa0
[  353.375667]  [<ffffffff818f4bbd>] __sys_sendmsg+0x3d/0x80
[  353.375719]  [<ffffffff81b184f4>] ? int_check_syscall_exit_work+0x34/0x3d
[  353.375776]  [<ffffffff818f4c0d>] SyS_sendmsg+0xd/0x20
[  353.375823]  [<ffffffff81b1826d>] system_call_fastpath+0x16/0x1b

Release objects in this order: rules -> sets -> chains -> tables, to
make sure no references to chains are held anymore.

Reported-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.biz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06 22:27:48 +01:00
Pablo Neira Ayuso 62924af247 netfilter: nfnetlink: relax strict multicast group check from netlink_bind
Relax the checking that was introduced in 97840cb ("netfilter:
nfnetlink: fix insufficient validation in nfnetlink_bind") when the
subscription bitmask is used. Existing userspace code code may request
to listen to all of the existing netlink groups by setting an all to one
subscription group bitmask. Netlink already validates subscription via
setsockopt() for us.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06 22:27:47 +01:00
Pablo Neira Ayuso 9ea2aa8b7d netfilter: nfnetlink: validate nfnetlink header from batch
Make sure there is enough room for the nfnetlink header in the
netlink messages that are part of the batch. There is a similar
check in netlink_rcv_skb().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06 22:27:46 +01:00
Pablo Neira Ayuso 8ca3f5e974 netfilter: conntrack: fix race between confirmation and flush
Commit 5195c14c8b ("netfilter: conntrack: fix race in
__nf_conntrack_confirm against get_next_corpse") aimed to resolve the
race condition between the confirmation (packet path) and the flush
command (from control plane). However, it introduced a crash when
several packets race to add a new conntrack, which seems easier to
reproduce when nf_queue is in place.

Fix this race, in __nf_conntrack_confirm(), by removing the CT
from unconfirmed list before checking the DYING bit. In case
race occured, re-add the CT to the dying list

This patch also changes the verdict from NF_ACCEPT to NF_DROP when
we lose race. Basically, the confirmation happens for the first packet
that we see in a flow. If you just invoked conntrack -F once (which
should be the common case), then this is likely to be the first packet
of the flow (unless you already called flush anytime soon in the past).
This should be hard to trigger, but better drop this packet, otherwise
we leave things in inconsistent state since the destination will likely
reply to this packet, but it will find no conntrack, unless the origin
retransmits.

The change of the verdict has been discussed in:
https://www.marc.info/?l=linux-netdev&m=141588039530056&w=2

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06 22:27:45 +01:00
David S. Miller 627d2cc016 Included changes:
- ensure bonding is used (if enabled) for packets coming in the soft
   interface
 - fix race condition to avoid orig_nodes to be deleted right after
   being added
 - avoid false positive lockdep splats by assigning lockclass to
   the proper hashtable lock objects
 - avoid miscounting of multicast 'disabled' nodes in the network
 - fix memory leak in the Global Translation Table in case of
   originator interval change
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJUq7PtAAoJEJgn97Bh2u9ejD0P/AqFX7KAqlzEZo7SSIwduX/e
 OLUWJyplATljKAwmTM+gFxyTC6HOUhqG/0UZpUu83OA0uC5HmT2ATJXUal1vG8uN
 ewfxqhsnz8UMxmjENcjsExGdhjW0CDeD+lRdxPGbbq5vEbqe9dXXr4HVAlJFGhnX
 fzR65WB2om9BTYMgIWmZutRjP9VTPArgOCKtCgLclHut19VXmZDALPWZ5S6kPfUG
 O9fSPi0uzr7WIy6p+5XCi69pw+rlHv23A3soXK9G4Ze+ET2QcQSgkL6Q3VJ2+uMV
 wBzTahRN7sJgJ8GYBDrhXgw3yRl2FkjLNPav3+gOnbK0+PycrnGiC1YPbX8w/cpr
 EHyFB/kZCeBusPaRQAYP91YI3cYJ3Onq3kjbUPVV+ASMoJ/vdVORRjDHY8ALf12w
 uGp/+9aAEJYqAOZhVZKTofTgfT3ULQycd+Ffi/0LC8OLaKQeqdEcmGRrIKNJmsP5
 a+gsKEMe7QCbHkWAVnKLGKthLH4IRU98gVApS1nld9OE2yrPyywrA5ac353iFB1K
 Y/56RFAqU1nywv+hhArcCZymiOPxKyF+lK3VAHvnlFgMMJs5gNZzoGX7V7TODkiA
 yiZCuzOGsB4hPSdzf1KfCVR3LkFfaI2jGr7X+38OdK9WGMG2mcPEB0tyXr7uFuky
 V/VPJDENwQwmgfdR+D+3
 =30sl
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- ensure bonding is used (if enabled) for packets coming in the soft
  interface
- fix race condition to avoid orig_nodes to be deleted right after
  being added
- avoid false positive lockdep splats by assigning lockclass to
  the proper hashtable lock objects
- avoid miscounting of multicast 'disabled' nodes in the network
- fix memory leak in the Global Translation Table in case of
  originator interval change

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-06 14:24:49 -05:00
David S. Miller 15ecf7a063 Here's just a single fix - a revert of a patch that broke the
p54 and cw2100 drivers (arguably due to bad assumptions there.)
 Since this affects kernels since 3.17, I decided to revert for
 now and we'll revisit this optimisation properly for -next.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUq9EvAAoJEDBSmw7B7bqr7mMQAJRbdepVVqK5IFH1BH0NC7vi
 LkBE2lp8ZAz1Crg+OAQNUdlZUHtGyfYoXSfzezmrMG51i5xjHyOYQQikW7aJ2SQ0
 XsjJJ5TcqKe83NwoakXUrMpE7KmCt/LnbjKNXDsZIvLlUkqa7ksXaS7btK195aXy
 WlVrmUE+BqT9a16VjFLZ6wRjI43+3bGxhtFL+g1eXw6nZ4a2o4EbIXdc9SN+/bT4
 tAhWJfdAQqQc34jhesWGbMIvkXWhzy2R6Js+9gMIBNsmlAiYbFa4QZ/9tI3nBI/O
 yHSiDc7JnPNjkkC+3wTJxMl7mEd6fEKnAS1ryZ5L4XhPrQpV39iZuWSPvPGw6LLW
 kB6+wXkIyQdCSoyrQZxY75ibqOUKYYxhhkSYfMePXRKTYY6MlHYqiH8wPWFpPoqO
 iumLqx8/CtRW1q1t2EBAG6rZLRF8HqmfqtB+ptT0DWcAP8E81q8BImPoPFr+P9S2
 XfuuSw97xKCcilOcYJ0uYSBe4XNNhy1dtC/zJ8cA9nV4WNkofALga5Z/t8ARhDsM
 wvP1D2uIX3U9My17bXq+Xn/fSSS7yhpLZjEHj/JNRvpDCWGf/tQl6A3ydMy//Oqe
 lRSKfmiAGysqhXnmK12+YhfO+4ioTz8dA88tHs1AO8qasfQwx45eRsUPemWeExiL
 9Lntb0U6MhYvgiTdWqt6
 =CnbJ
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2015-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Here's just a single fix - a revert of a patch that broke the
p54 and cw2100 drivers (arguably due to bad assumptions there.)
Since this affects kernels since 3.17, I decided to revert for
now and we'll revisit this optimisation properly for -next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-06 13:29:27 -05:00
Arik Nemtsov fa44b988d2 mac80211: skip disabled channels in VHT check
The patch "40a11ca mac80211: check if channels allow 80 MHz for VHT
probe requests" considered disabled channels as VHT enabled, and
mistakenly sent out probe-requests with the VHT IE.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-06 14:59:07 +01:00
Johannes Berg 71b836eca7 nl80211: define multicast group names in header
Put the group names into the userspace API header file so that
userspace clients can use symbolic names from there instead of
hardcoding the actual names. This doesn't really change much,
but seems somewhat cleaner.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-06 12:10:26 +01:00
Gautam Kumar Shukla d75bb06b61 cfg80211: add extensible feature flag attribute
With the wiphy::features flag being used up this patch adds a
new field wiphy::ext_features. Considering extensibility this
new field is declared as a byte array. This extensible flag is
exposed to user-space by NL80211_ATTR_EXT_FEATURES.

Cc: Avinash Patil <patila@marvell.com>
Signed-off-by: Gautam (Gautam Kumar) Shukla <gautams@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-06 12:10:24 +01:00
Linus Lüssing 9d31b3ce81 batman-adv: fix potential TT client + orig-node memory leak
This patch fixes a potential memory leak which can occur once an
originator times out. On timeout the according global translation table
entry might not get purged correctly. Furthermore, the non purged TT
entry will cause its orig-node to leak, too. Which additionally can lead
to the new multicast optimization feature not kicking in because of a
therefore bogus counter.

In detail: The batadv_tt_global_entry->orig_list holds the reference to
the orig-node. Usually this reference is released after
BATADV_PURGE_TIMEOUT through: _batadv_purge_orig()->
batadv_purge_orig_node()->batadv_update_route()->_batadv_update_route()->
batadv_tt_global_del_orig() which purges this global tt entry and
releases the reference to the orig-node.

However, if between two batadv_purge_orig_node() calls the orig-node
timeout grew to 2*BATADV_PURGE_TIMEOUT then this call path isn't
reached. Instead the according orig-node is removed from the
originator hash in _batadv_purge_orig(), the batadv_update_route()
part is skipped and won't be reached anymore.

Fixing the issue by moving batadv_tt_global_del_orig() out of the rcu
callback.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Acked-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-06 11:07:01 +01:00
Linus Lüssing a5164886b0 batman-adv: fix multicast counter when purging originators
When purging an orig_node we should only decrease counter tracking the
number of nodes without multicast optimizations support if it was
increased through this orig_node before.

A not yet quite initialized orig_node (meaning it did not have its turn
in the mcast-tvlv handler so far) which gets purged would not adhere to
this and will lead to a counter imbalance.

Fixing this by adding a check whether the orig_node is mcast-initalized
before decreasing the counter in the mcast-orig_node-purging routine.

Introduced by 60432d756c
("batman-adv: Announce new capability via multicast TVLV")

Reported-by: Tobias Hachmer <tobias@hachmer.de>
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-06 11:06:04 +01:00
Linus Lüssing e8829f007e batman-adv: fix counter for multicast supporting nodes
A miscounting of nodes having multicast optimizations enabled can lead
to multicast packet loss in the following scenario:

If the first OGM a node receives from another one has no multicast
optimizations support (no multicast tvlv) then we are missing to
increase the counter. This potentially leads to the wrong assumption
that we could safely use multicast optimizations.

Fixings this by increasing the counter if the initial OGM has the
multicast TVLV unset, too.

Introduced by 60432d756c
("batman-adv: Announce new capability via multicast TVLV")

Reported-by: Tobias Hachmer <tobias@hachmer.de>
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-06 11:05:42 +01:00
Martin Hundebøll f44d54077a batman-adv: fix lock class for decoding hash in network-coding.c
batadv_has_set_lock_class() is called with the wrong hash table as first
argument (probably due to a copy-paste error), which leads to false
positives when running with lockdep.

Introduced-by: 612d2b4fe0
("batman-adv: network coding - save overheard and tx packets for decoding")

Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-06 11:05:12 +01:00
Linus Lüssing 2c667a339c batman-adv: fix delayed foreign originator recognition
Currently it can happen that the reception of an OGM from a new
originator is not being accepted. More precisely it can happen that
an originator struct gets allocated and initialized
(batadv_orig_node_new()), even the TQ gets calculated and set correctly
(batadv_iv_ogm_calc_tq()) but still the periodic orig_node purging
thread will decide to delete it if it has a chance to jump between
these two function calls.

This is because batadv_orig_node_new() initializes the last_seen value
to zero and its caller (batadv_iv_ogm_orig_get()) makes it visible to
other threads by adding it to the hash table already.
batadv_iv_ogm_calc_tq() will set the last_seen variable to the correct,
current time a few lines later but if the purging thread jumps in between
that it will think that the orig_node timed out and will wrongly
schedule it for deletion already.

If the purging interval is the same as the originator interval (which is
the default: 1 second), then this game can continue for several rounds
until the random OGM jitter added enough difference between these
two (in tests, two to about four rounds seemed common).

Fixing this by initializing the last_seen variable of an orig_node
to the current time before adding it to the hash table.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-06 11:05:09 +01:00
Simon Wunderlich 329887ad13 batman-adv: fix and simplify condition when bonding should be used
The current condition actually does NOT consider bonding when the
interface the packet came in from is the soft interface, which is the
opposite of what it should do (and the comment describes). Fix that and
slightly simplify the condition.

Reported-by: Ray Gibson <booray@gmail.com>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-06 11:05:07 +01:00
Daniel Borkmann 81164413ad net: tcp: add per route congestion control
This work adds the possibility to define a per route/destination
congestion control algorithm. Generally, this opens up the possibility
for a machine with different links to enforce specific congestion
control algorithms with optimal strategies for each of them based
on their network characteristics, even transparently for a single
application listening on all links.

For our specific use case, this additionally facilitates deployment
of DCTCP, for example, applications can easily serve internal
traffic/dsts in DCTCP and external one with CUBIC. Other scenarios
would also allow for utilizing e.g. long living, low priority
background flows for certain destinations/routes while still being
able for normal traffic to utilize the default congestion control
algorithm. We also thought about a per netns setting (where different
defaults are possible), but given its actually a link specific
property, we argue that a per route/destination setting is the most
natural and flexible.

The administrator can utilize this through ip-route(8) by appending
"congctl [lock] <name>", where <name> denotes the name of a
congestion control algorithm and the optional lock parameter allows
to enforce the given algorithm so that applications in user space
would not be allowed to overwrite that algorithm for that destination.

The dst metric lookups are being done when a dst entry is already
available in order to avoid a costly lookup and still before the
algorithms are being initialized, thus overhead is very low when the
feature is not being used. While the client side would need to drop
the current reference on the module, on server side this can actually
even be avoided as we just got a flat-copied socket clone.

Joint work with Florian Westphal.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:24 -05:00
Daniel Borkmann ea69763999 net: tcp: add RTAX_CC_ALGO fib handling
This patch adds the minimum necessary for the RTAX_CC_ALGO congestion
control metric to be set up and dumped back to user space.

While the internal representation of RTAX_CC_ALGO is handled as a u32
key, we avoided to expose this implementation detail to user space, thus
instead, we chose the netlink attribute that is being exchanged between
user space to be the actual congestion control algorithm name, similarly
as in the setsockopt(2) API in order to allow for maximum flexibility,
even for 3rd party modules.

It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as
it should have been stored in RTAX_FEATURES instead, we first thought
about reusing it for the congestion control key, but it brings more
complications and/or confusion than worth it.

Joint work with Florian Westphal.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:24 -05:00
Daniel Borkmann c5c6a8ab45 net: tcp: add key management to congestion control
This patch adds necessary infrastructure to the congestion control
framework for later per route congestion control support.

For a per route congestion control possibility, our aim is to store
a unique u32 key identifier into dst metrics, which can then be
mapped into a tcp_congestion_ops struct. We argue that having a
RTAX key entry is the most simple, generic and easy way to manage,
and also keeps the memory footprint of dst entries lower on 64 bit
than with storing a pointer directly, for example. Having a unique
key id also allows for decoupling actual TCP congestion control
module management from the FIB layer, i.e. we don't have to care
about expensive module refcounting inside the FIB at this point.

We first thought of using an IDR store for the realization, which
takes over dynamic assignment of unused key space and also performs
the key to pointer mapping in RCU. While doing so, we stumbled upon
the issue that due to the nature of dynamic key distribution, it
just so happens, arguably in very rare occasions, that excessive
module loads and unloads can lead to a possible reuse of previously
used key space. Thus, previously stale keys in the dst metric are
now being reassigned to a different congestion control algorithm,
which might lead to unexpected behaviour. One way to resolve this
would have been to walk FIBs on the actually rare occasion of a
module unload and reset the metric keys for each FIB in each netns,
but that's just very costly.

Therefore, we argue a better solution is to reuse the unique
congestion control algorithm name member and map that into u32 key
space through jhash. For that, we split the flags attribute (as it
currently uses 2 bits only anyway) into two u32 attributes, flags
and key, so that we can keep the cacheline boundary of 2 cachelines
on x86_64 and cache the precalculated key at registration time for
the fast path. On average we might expect 2 - 4 modules being loaded
worst case perhaps 15, so a key collision possibility is extremely
low, and guaranteed collision-free on LE/BE for all in-tree modules.
Overall this results in much simpler code, and all without the
overhead of an IDR. Due to the deterministic nature, modules can
now be unloaded, the congestion control algorithm for a specific
but unloaded key will fall back to the default one, and on module
reload time it will switch back to the expected algorithm
transparently.

Joint work with Florian Westphal.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:24 -05:00
Daniel Borkmann 29ba4fffd3 net: tcp: refactor reinitialization of congestion control
We can just move this to an extra function and make the code
a bit more readable, no functional change.

Joint work with Florian Westphal.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:24 -05:00
Florian Westphal e715b6d3a5 net: fib6: convert cfg metric to u32 outside of table write lock
Do the nla validation earlier, outside the write lock.

This is needed by followup patch which needs to be able to call
request_module (which can sleep) if needed.

Joint work with Daniel Borkmann.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:24 -05:00
Daniel Borkmann 0409c9a5a7 net: fib6: fib6_commit_metrics: fix potential NULL pointer dereference
When IPv6 host routes with metrics attached are being added, we fetch
the metrics store from the dst via COW through dst_metrics_write_ptr(),
added through commit e5fd387ad5.

One remaining problem here is that we actually call into inet_getpeer()
and may end up allocating/creating a new peer from the kmemcache, which
may fail.

Example trace from perf probe (inet_getpeer:41) where create is 1:

ip 6877 [002] 4221.391591: probe:inet_getpeer: (ffffffff8165e293)
  85e294 inet_getpeer.part.7 (<- kmem_cache_alloc())
  85e578 inet_getpeer
  8eb333 ipv6_cow_metrics
  8f10ff fib6_commit_metrics

Therefore, a check for NULL on the return of dst_metrics_write_ptr()
is necessary here.

Joint work with Florian Westphal.

Fixes: e5fd387ad5 ("ipv6: do not overwrite inetpeer metrics prematurely")
Cc: Michal Kubeček <mkubecek@suse.cz>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:24 -05:00
Hubert Sokolowski 6cb69742da net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined
Add checking whether the call to ndo_dflt_fdb_dump is needed.
It is not expected to call ndo_dflt_fdb_dump unconditionally
by some drivers (i.e. qlcnic or macvlan) that defines
own ndo_fdb_dump. Other drivers define own ndo_fdb_dump
and don't want ndo_dflt_fdb_dump to be called at all.
At the same time it is desirable to call the default dump
function on a bridge device.
Fix attributes that are passed to dev->netdev_ops->ndo_fdb_dump.
Add extra checking in br_fdb_dump to avoid duplicate entries
as now filter_dev can be NULL.

Following tests for filtering have been performed before
the change and after the patch was applied to make sure
they are the same and it doesn't break the filtering algorithm.

[root@localhost ~]# cd /root/iproute2-3.18.0/bridge
[root@localhost bridge]# modprobe dummy
[root@localhost bridge]# ./bridge fdb add f1:f2:f3:f4:f5:f6 dev dummy0
[root@localhost bridge]# brctl addbr br0
[root@localhost bridge]# brctl addif  br0 dummy0
[root@localhost bridge]# ip link set dev br0 address 02:00:00:12:01:04
[root@localhost bridge]# # show all
[root@localhost bridge]# ./bridge fdb show
33:33:00:00:00:01 dev p2p1 self permanent
01:00:5e:00:00:01 dev p2p1 self permanent
33:33:ff:ac:ce:32 dev p2p1 self permanent
33:33:00:00:02:02 dev p2p1 self permanent
01:00:5e:00:00:fb dev p2p1 self permanent
33:33:00:00:00:01 dev p7p1 self permanent
01:00:5e:00:00:01 dev p7p1 self permanent
33:33:ff:79:50:53 dev p7p1 self permanent
33:33:00:00:02:02 dev p7p1 self permanent
01:00:5e:00:00:fb dev p7p1 self permanent
f2:46:50:85:6d:d9 dev dummy0 master br0 permanent
f2:46:50:85:6d:d9 dev dummy0 vlan 1 master br0 permanent
33:33:00:00:00:01 dev dummy0 self permanent
f1:f2:f3:f4:f5:f6 dev dummy0 self permanent
33:33:00:00:00:01 dev br0 self permanent
02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent
02:00:00:12:01:04 dev br0 master br0 permanent
[root@localhost bridge]# # filter by bridge
[root@localhost bridge]# ./bridge fdb show br br0
f2:46:50:85:6d:d9 dev dummy0 master br0 permanent
f2:46:50:85:6d:d9 dev dummy0 vlan 1 master br0 permanent
33:33:00:00:00:01 dev dummy0 self permanent
f1:f2:f3:f4:f5:f6 dev dummy0 self permanent
33:33:00:00:00:01 dev br0 self permanent
02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent
02:00:00:12:01:04 dev br0 master br0 permanent
[root@localhost bridge]# # filter by port
[root@localhost bridge]# ./bridge fdb show brport dummy0
f2:46:50:85:6d:d9 master br0 permanent
f2:46:50:85:6d:d9 vlan 1 master br0 permanent
33:33:00:00:00:01 self permanent
f1:f2:f3:f4:f5:f6 self permanent
[root@localhost bridge]# # filter by port + bridge
[root@localhost bridge]# ./bridge fdb show br br0 brport dummy0
f2:46:50:85:6d:d9 master br0 permanent
f2:46:50:85:6d:d9 vlan 1 master br0 permanent
33:33:00:00:00:01 self permanent
f1:f2:f3:f4:f5:f6 self permanent
[root@localhost bridge]#

Signed-off-by: Hubert Sokolowski <hubert.sokolowski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:52:06 -05:00
Tom Herbert ad6f939ab1 ip: Add offset parameter to ip_cmsg_recv
Add ip_cmsg_recv_offset function which takes an offset argument
that indicates the starting offset in skb where data is being received
from. This will be useful in the case of UDP and provided checksum
to user space.

ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of
zero.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:44:46 -05:00
Tom Herbert 5961de9f19 ip: Add offset parameter to ip_cmsg_recv
Add ip_cmsg_recv_offset function which takes an offset argument
that indicates the starting offset in skb where data is being received
from. This will be useful in the case of UDP and provided checksum
to user space.

ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of
zero.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:44:46 -05:00
Tom Herbert c44d13d6f3 ip: IP cmsg cleanup
Move the IP_CMSG_* constants from ip_sockglue.c to inet_sock.h so that
they can be referenced in other source files.

Restructure ip_cmsg_recv to not go through flags using shift, check
for flags by 'and'. This eliminates both the shift and a conditional
per flag check.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:44:46 -05:00
Tom Herbert 224d019c4f ip: Move checksum convert defines to inet
Move convert_csum from udp_sock to inet_sock. This allows the
possibility that we can use convert checksum for different types
of sockets and also allows convert checksum to be enabled from
inet layer (what we'll want to do when enabling IP_CHECKSUM cmsg).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:44:46 -05:00
Gao feng b44b565cf5 netfilter: nf_ct_seqadj: print ack seq in the right host byte order
new_start_seq and new_end_seq are network byte order,
print the host byte order in debug message and print
seq number as the type of unsigned int.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@soleta.eu>
2015-01-05 13:52:20 +01:00
Chen Gang b18c5d15e8 netfilter: nfnetlink_cthelper: Remove 'const' and '&' to avoid warnings
The related code can be simplified, and also can avoid related warnings
(with allmodconfig under parisc):

    CC [M]  net/netfilter/nfnetlink_cthelper.o
  net/netfilter/nfnetlink_cthelper.c: In function ‘nfnl_cthelper_from_nlattr’:
  net/netfilter/nfnetlink_cthelper.c:97:9: warning: passing argument 1 o ‘memcpy’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-array-qualifiers]
    memcpy(&help->data, nla_data(attr), help->helper->data_len);
           ^
  In file included from include/linux/string.h:17:0,
                   from include/uapi/linux/uuid.h:25,
                   from include/linux/uuid.h:23,
                   from include/linux/mod_devicetable.h:12,
                   from ./arch/parisc/include/asm/hardware.h:4,
                   from ./arch/parisc/include/asm/processor.h:15,
                   from ./arch/parisc/include/asm/spinlock.h:6,
                   from ./arch/parisc/include/asm/atomic.h:21,
                   from include/linux/atomic.h:4,
                   from ./arch/parisc/include/asm/bitops.h:12,
                   from include/linux/bitops.h:36,
                   from include/linux/kernel.h:10,
                   from include/linux/list.h:8,
                   from include/linux/module.h:9,
                   from net/netfilter/nfnetlink_cthelper.c:11:
  ./arch/parisc/include/asm/string.h:8:8: note: expected ‘void *’ but argument is of type ‘const char (*)[]’
   void * memcpy(void * dest,const void *src,size_t count);
          ^

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@soleta.eu>
2015-01-05 12:42:54 +01:00
Duan Jiong 0f8162326f netfilter: nfnetlink: remove redundant variable nskb
Actually after netlink_skb_clone() is called, the nskb and
skb will point to the same thing, but they are used just like
they are different, sometimes this is confusing, so i think
there is no necessary to keep nskb anymore.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@soleta.eu>
2015-01-05 12:10:49 +01:00
Johannes Berg 1e359a5de8 Revert "mac80211: Fix accounting of the tailroom-needed counter"
This reverts commit ca34e3b5c8.

It turns out that the p54 and cw2100 drivers assume that there's
tailroom even when they don't say they really need it. However,
there's currently no way for them to explicitly say they do need
it, so for now revert this.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=90331.

Cc: stable@vger.kernel.org
Fixes: ca34e3b5c8 ("mac80211: Fix accounting of the tailroom-needed counter")
Reported-by: Christopher Chavez <chrischavez@gmx.us>
Bisected-by: Larry Finger <Larry.Finger@lwfinger.net>
Debugged-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-05 10:33:46 +01:00
Jesse Gross 46b1e4f911 geneve: Check family when reusing sockets.
When searching for an existing socket to reuse, the address family
is not taken into account - only port number. This means that an
IPv4 socket could be used for IPv6 traffic and vice versa, which
is sure to cause problems when passing packets.

It is not possible to trigger this problem currently because the
only user of Geneve creates just IPv4 sockets. However, that is
likely to change in the near future.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04 22:21:33 -05:00
Jesse Gross df5dba8e52 geneve: Remove socket hash table.
The hash table for open Geneve ports is used only on creation and
deletion time. It is not performance critical and is not likely to
grow to a large number of items. Therefore, this can be changed
to use a simple linked list.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04 22:21:33 -05:00
Jesse Gross 829a3ada9c geneve: Simplify locking.
The existing Geneve locking scheme was pulled over directly from
VXLAN. However, VXLAN has a number of built in mechanisms which make
the locking more complex and are unlikely to be necessary with Geneve.
This simplifies the locking to use a basic scheme of a mutex
when doing updates plus RCU on receive.

In addition to making the code easier to read, this also avoids the
possibility of a race when creating or destroying sockets since
UDP sockets and the list of Geneve sockets are protected by different
locks. After this change, the entire operation is atomic.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04 22:21:33 -05:00
Jesse Gross 61f3cade76 geneve: Remove workqueue.
The work queue is used only to free the UDP socket upon destruction.
This is not necessary with Geneve and generally makes the code more
difficult to reason about. It also introduces nondeterministic
behavior such as when a socket is rapidly deleted and recreated, which
could fail as the the deletion happens asynchronously.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04 22:21:33 -05:00
Marcel Holtmann 043ec9bf7b Bluetooth: Introduce HCI_QUIRK_FIXUP_INQUIRY_MODE option
The HCI_QUIRK_FIXUP_INQUIRY_MODE option allows to force Inquiry Result
with RSSI setting on controllers that do not indicate support for it,
but where it is known to be fully functional.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-03 22:31:09 +02:00
Marcel Holtmann 04422da990 Bluetooth: Remove dead code for manufacturer inquiry mode quirks
There are some old Bluetooth modules from Silicon Wave and Broadcom
which support Inquiry Result with RSSI, but do not advertise it. The
core has quirks in the code to enable that inquiry mode. However as
it stands right now, that code is not even executed since entering
the function to determine which inquiry mode requires that the device
has the feature bit for Inquiry Result with RSSI set in the first
place. So this makes this dead code that hasn't work for a long
time.

In conclusion, just remove these extra quirks and simplify the setup
of the inquiry mode to be inline and with that a lot easier to read
and understand.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-03 22:31:08 +02:00
Thomas Graf 21e4902aea netlink: Lockless lookup with RCU grace period in socket release
Defers the release of the socket reference using call_rcu() to
allow using an RCU read-side protected call to rhashtable_lookup()

This restores behaviour and performance gains as previously
introduced by e341694 ("netlink: Convert netlink_lookup() to use
RCU protected hash table") without the side effect of severely
delayed socket destruction.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03 14:32:57 -05:00
Thomas Graf 97defe1ecf rhashtable: Per bucket locks & deferred expansion/shrinking
Introduces an array of spinlocks to protect bucket mutations. The number
of spinlocks per CPU is configurable and selected based on the hash of
the bucket. This allows for parallel insertions and removals of entries
which do not share a lock.

The patch also defers expansion and shrinking to a worker queue which
allows insertion and removal from atomic context. Insertions and
deletions may occur in parallel to it and are only held up briefly
while the particular bucket is linked or unzipped.

Mutations of the bucket table pointer is protected by a new mutex, read
access is RCU protected.

In the event of an expansion or shrinking, the new bucket table allocated
is exposed as a so called future table as soon as the resize process
starts.  Lookups, deletions, and insertions will briefly use both tables.
The future table becomes the main table after an RCU grace period and
initial linking of the old to the new table was performed. Optimization
of the chains to make use of the new number of buckets follows only the
new table is in use.

The side effect of this is that during that RCU grace period, a bucket
traversal using any rht_for_each() variant on the main table will not see
any insertions performed during the RCU grace period which would at that
point land in the future table. The lookup will see them as it searches
both tables if needed.

Having multiple insertions and removals occur in parallel requires nelems
to become an atomic counter.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03 14:32:57 -05:00
Thomas Graf 897362e446 nft_hash: Remove rhashtable_remove_pprev()
The removal function of nft_hash currently stores a reference to the
previous element during lookup which is used to optimize removal later
on. This was possible because a lock is held throughout calling
rhashtable_lookup() and rhashtable_remove().

With the introdution of deferred table resizing in parallel to lookups
and insertions, the nftables lock will no longer synchronize all
table mutations and the stored pprev may become invalid.

Removing this optimization makes removal slightly more expensive on
average but allows taking the resize cost out of the insert and
remove path.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03 14:32:57 -05:00
Thomas Graf 88d6ed15ac rhashtable: Convert bucket iterators to take table and index
This patch is in preparation to introduce per bucket spinlocks. It
extends all iterator macros to take the bucket table and bucket
index. It also introduces a new rht_dereference_bucket() to
handle protected accesses to buckets.

It introduces a barrier() to the RCU iterators to the prevent
the compiler from caching the first element.

The lockdep verifier is introduced as stub which always succeeds
and properly implement in the next patch when the locks are
introduced.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03 14:32:56 -05:00
Thomas Graf 8d24c0b431 rhashtable: Do hashing inside of rhashtable_lookup_compare()
Hash the key inside of rhashtable_lookup_compare() like
rhashtable_lookup() does. This allows to simplify the hashing
functions and keep them private.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03 14:32:56 -05:00
Alexander Aring 36cf942adf mac802154: fix kbuild test robot warning
This patch fixs the following kbuild test robot warning:

coccinelle warnings: (new ones prefixed by >>)

>> net/mac802154/cfg.c:53:1-3: WARNING: PTR_ERR_OR_ZERO can be used

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-03 01:51:51 +01:00
Alexander Aring 9dc52d49e2 ieee802154: handle config as menuconfig
This patch handles the IEEE802154 Kconfig entry as menuconfig.
Furthermore we move this entry out of "Network Options" and put it into
"Networking" where the other networking subsystems are. This requires a
menuconfig entry like all other networking subsystems.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-03 01:49:24 +01:00
Alexander Aring 71e36b1b01 ieee802154: rename af_ieee802154.c to socket.c
This patch renames the "af_ieee802154.c" to "socket.c". This is just a
cleanup to have a short name for it which describes the implementationm
stuff more human understandable.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-03 01:49:24 +01:00
Alexander Aring df2f65de4b ieee802154: socket: fix checkpatch issue
This patch solves the following checkpatch issue:

CHECK: Comparison to NULL could be written "skb"
+		if (skb != NULL) {

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-03 01:49:24 +01:00
Alexander Aring 78f821b648 ieee802154: socket: put handling into one file
This patch puts all related socket handling into one file. This is just
a cleanup to do all socket handling stuff inside of one implementation
file.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-03 01:49:24 +01:00
Alexander Aring 79300e3a43 ieee802154: socket: change module name
This patch changes the module name of af_802154 to ieee802154_socket.
Just for keeping the name convention according to the 6LoWPAN module
ieee802154_6lowpan.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-03 01:49:24 +01:00
Alexander Aring 955d7fc93e ieee802154: handle socket functionality as module
This patch makes the ieee802154 socket handling as module. Currently
this is part of ieee802154 module. It pointed out that ieee802154 module
has also two module_init/module_exit functions. One inside of core.c and
the other in af_ieee802154.c. This patch will also solve this issue by
handle the af_802154 as separate module.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-03 01:49:23 +01:00
Marcel Holtmann ec6cef9cd9 Bluetooth: Fix SMP channel registration for unconfigured controllers
When the Bluetooth controllers requires an unconfigured state (for
example when the BD_ADDR is missing), then it is important to try
to register the SMP channels when the controller transitions to the
configured state.

This also fixes an issue with the debugfs entires that are not present
for controllers that start out as unconfigured.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-02 22:22:04 +01:00
Marcel Holtmann 203de21bf6 Bluetooth: Fix for a leftover debug of pairing credentials
One of the LE Secure Connections security credentials was still using
the BT_DBG instead of SMP_DBG.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-02 22:22:04 +01:00
Marcel Holtmann cb0d2faeb1 Bluetooth: Fix scope of sc_only_mode debugfs entry
The sc_only_mode debugfs entry is used to read the current state of the
Secure Connections Only mode. Before Bluetooth 4.2 this mode was only
for BR/EDR controllers and with that tight to the support Secure Simple
Pairing. Since Secure Connections is now available for BR/EDR and LE
this debugfs entry is no longer correctly place.

Move it to the common section and enable it when either BR/EDR Secure
Connections feature is supported or when the controller has LE support.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-02 22:22:04 +01:00
Marcel Holtmann 05b3c3e790 Bluetooth: Remove no longer needed force_sc_support debugfs option
The force_sc_support debugfs option was introduced to easily work with
pre-production Bluetooth 4.1 silicon. This option is no longer needed
since controllers supporting BR/EDR Secure Connections feature are now
available.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-02 22:22:04 +01:00
Marcel Holtmann 91389af67c Bluetooth: Remove broken force_lesc_support debugfs option
The force_lesc_support debugfs option never really worked. It has a race
condition between creating the debugfs entry and registering the L2CAP
fixed channel for BR/EDR SMP support.

Also this has been replaced with a working force_bredr_smp debugfs
switch that developers can use now.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-02 22:22:03 +01:00
Marcel Holtmann 300acfdec9 Bluetooth: Introduce force_bredr_smp debugfs option for testing
Testing cross-transport pairing that starts on BR/EDR is only valid when
using a controller with BR/EDR Secure Connections. Devices will indicate
this by providing BR/EDR SMP fixed channel over L2CAP. To allow testing
of this feature on Bluetooth 4.0 controller or controllers without the
BR/EDR Secure Connections features, introduce a force_bredr_smp debugfs
option that allows faking the required AES connection.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-02 22:22:03 +01:00
Ben Pfaff 24cc59d1eb openvswitch: Consistently include VLAN header in flow and port stats.
Until now, when VLAN acceleration was in use, the bytes of the VLAN header
were not included in port or flow byte counters.  They were however
included when VLAN acceleration was not used.  This commit corrects the
inconsistency, by always including the VLAN header in byte counters.

Previous discussion at
http://openvswitch.org/pipermail/dev/2014-December/049521.html

Reported-by: Motonori Shindo <mshindo@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Reviewed-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-02 16:14:20 -05:00
Herbert Xu 843925f33f tcp: Do not apply TSO segment limit to non-TSO packets
Thomas Jarosch reported IPsec TCP stalls when a PMTU event occurs.

In fact the problem was completely unrelated to IPsec.  The bug is
also reproducible if you just disable TSO/GSO.

The problem is that when the MSS goes down, existing queued packet
on the TX queue that have not been transmitted yet all look like
TSO packets and get treated as such.

This then triggers a bug where tcp_mss_split_point tells us to
generate a zero-sized packet on the TX queue.  Once that happens
we're screwed because the zero-sized packet can never be removed
by ACKs.

Fixes: 1485348d24 ("tcp: Apply device TSO segment limit earlier")
Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-02 16:13:20 -05:00
Florian Westphal e8768f9715 net: skbuff: don't zero tc members when freeing skb
Not needed, only four cases:
 - kfree_skb (or one of its aliases).
   Don't need to zero, memory will be freed.
 - kfree_skb_partial and head was stolen:  memory will be freed.
 - skb_morph:  The skb header fields (including tc ones) will be
   copied over from the 'to-be-morphed' skb right after
   skb_release_head_state returns.
 - skb_segment:  Same as before, all the skb header
   fields are copied over from the original skb right away.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-02 16:04:29 -05:00
David S. Miller 6c032edc8a Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg say:

====================
pull request: bluetooth-next 2014-12-31

Here's the first batch of bluetooth patches for 3.20.

 - Cleanups & fixes to ieee802154  drivers
 - Fix synchronization of mgmt commands with respective HCI commands
 - Add self-tests for LE pairing crypto functionality
 - Remove 'BlueFritz!' specific handling from core using a new quirk flag
 - Public address configuration support for ath3012
 - Refactor debugfs support into a dedicated file
 - Initial support for LE Data Length Extension feature from Bluetooth 4.2

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-02 15:58:21 -05:00
Joe Stringer a4c9ea5e8f geneve: Add Geneve GRO support
This results in an approximately 30% increase in throughput
when handling encapsulated bulk traffic.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-02 15:46:41 -05:00
Jesse Gross 9b174d88c2 net: Add Transparent Ethernet Bridging GRO support.
Currently the only tunnel protocol that supports GRO with encapsulated
Ethernet is VXLAN. This pulls out the Ethernet code into a proper layer
so that it can be used by other tunnel protocols such as GRE and Geneve.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-02 15:46:41 -05:00
Alexander Duyck 5405afd1a3 fib_trie: Add tracking value for suffix length
This change adds a tracking value for the maximum suffix length of all
prefixes stored in any given tnode.  With this value we can determine if we
need to backtrace or not based on if the suffix is greater than the pos
value.

By doing this we can reduce the CPU overhead for lookups in the local table
as many of the prefixes there are 32b long and have a suffix length of 0
meaning we can immediately backtrace to the root node without needing to
test any of the nodes between it and where we ended up.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:55 -05:00
Alexander Duyck 21d1f11db0 fib_trie: Remove checks for index >= tnode_child_length from tnode_get_child
For some reason the compiler doesn't seem to understand that when we are in
a loop that runs from tnode_child_length - 1 to 0 we don't expect the value
of tn->bits to change.  As such every call to tnode_get_child was rerunning
tnode_chile_length which ended up consuming quite a bit of space in the
resultant assembly code.

I have gone though and verified that in all cases where tnode_get_child
is used we are either winding though a fixed loop from tnode_child_length -
1 to 0, or are in a fastpath case where we are verifying the value by
either checking for any remaining bits after shifting index by bits and
testing for leaf, or by using tnode_child_length.

size net/ipv4/fib_trie.o
Before:
   text	   data	    bss	    dec	    hex	filename
  15506	    376	      8	  15890	   3e12	net/ipv4/fib_trie.o

After:
   text	   data	    bss	    dec	    hex	filename
  14827	    376	      8	  15211	   3b6b	net/ipv4/fib_trie.o

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:55 -05:00
Alexander Duyck 12c081a5c8 fib_trie: inflate/halve nodes in a more RCU friendly way
This change pulls the node_set_parent functionality out of put_child_reorg
and instead leaves that to the function to take care of as well.  By doing
this we can fully construct the new cluster of tnodes and all of the
pointers out of it before we start routing pointers into it.

I am suspecting this will likely fix some concurency issues though I don't
have a good test to show as such.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:55 -05:00
Alexander Duyck fc86a93b46 fib_trie: Push tnode flushing down to inflate/halve
This change pushes the tnode freeing down into the inflate and halve
functions.  It makes more sense here as we have a better grasp of what is
going on and when a given cluster of nodes is ready to be freed.

I believe this may address a bug in the freeing logic as well.  For some
reason if the freelist got to a certain size we would call
synchronize_rcu().  I'm assuming that what they meant to do is call
synchronize_rcu() after they had handed off that much memory via
call_rcu().  As such that is what I have updated the behavior to be.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:55 -05:00
Alexander Duyck ff181ed876 fib_trie: Push assignment of child to parent down into inflate/halve
This change makes it so that the assignment of the tnode to the parent is
handled directly within whatever function is currently handling the node be
it inflate, halve, or resize.  By doing this we can avoid some of the need
to set NULL pointers in the tree while we are resizing the subnodes.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:55 -05:00
Alexander Duyck f05a48198b fib_trie: Add functions should_inflate and should_halve
This change pulls the logic for if we should inflate/halve the nodes out
into separate functions.  It also addresses what I believe is a bug where 1
full node is all that is needed to keep a node from ever being halved.

Simple script to reproduce the issue:
	modprobe dummy;	ifconfig dummy0 up
	for i in `seq 0 255`; do ifconfig dummy0:$i 10.0.${i}.1/24 up; done
	ifconfig dummy0:256 10.0.255.33/16 up
	for i in `seq 0 254`; do ifconfig dummy0:$i down; done

Results from /proc/net/fib_triestat
Before:
	Local:
		Aver depth:     3.00
		Max depth:      4
		Leaves:         17
		Prefixes:       18
		Internal nodes: 11
		  1: 8  2: 2  10: 1
		Pointers: 1048
	Null ptrs: 1021
	Total size: 11  kB
After:
	Local:
		Aver depth:     3.41
		Max depth:      5
		Leaves:         17
		Prefixes:       18
		Internal nodes: 12
		  1: 8  2: 3  3: 1
		Pointers: 36
	Null ptrs: 8
	Total size: 3  kB

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck cf3637bb8f fib_trie: Move resize to after inflate/halve
This change consists of a cut/paste of resize to behind inflate and halve
so that I could remove the two function prototypes.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck 345e9b5426 fib_trie: Push rcu_read_lock/unlock to callers
This change is to start cleaning up some of the rcu_read_lock/unlock
handling.  I realized while reviewing the code there are several spots that
I don't believe are being handled correctly or are masking warnings by
locally calling rcu_read_lock/unlock instead of calling them at the correct
level.

A common example is a call to fib_get_table followed by fib_table_lookup.
The rcu_read_lock/unlock ought to wrap both but there are several spots where
they were not wrapped.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck 98293e8d2f fib_trie: Use unsigned long for anything dealing with a shift by bits
This change makes it so that anything that can be shifted by, or compared
to a value shifted by bits is updated to be an unsigned long.  This is
mostly a precaution against an insanely huge address space that somehow
starts coming close to the 2^32 root node size which would require
something like 1.5 billion addresses.

I chose unsigned long instead of unsigned long long since I do not believe
it is possible to allocate a 32 bit tnode on a 32 bit system as the memory
consumed would be 16GB + 28B which exceeds the addressible space for any
one process.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck e9b44019d4 fib_trie: Update meaning of pos to represent unchecked bits
This change moves the pos value to the other side of the "bits" field.  By
doing this it actually simplifies a significant amount of code in the trie.

For example when halving a tree we know that the bit lost exists at
oldnode->pos, and if we inflate the tree the new bit being add is at
tn->pos.  Previously to find those bits you would have to subtract pos and
bits from the keylength or start with a value of (1 << 31) and then shift
that.

There are a number of spots throughout the code that benefit from this.  In
the case of the hot-path searches the main advantage is that we can drop 2
or more operations from the search path as we no longer need to compute the
value for the index to be shifted by and can instead just use the raw pos
value.

In addition the tkey_extract_bits is now defunct and can be replaced by
get_index since the two operations were doing the same thing, but now
get_index does it much more quickly as it is only an xor and shift versus a
pair of shifts and a subtraction.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck 836a0123c9 fib_trie: Optimize fib_table_insert
This patch updates the fib_table_insert function to take advantage of the
changes made to improve the performance of fib_table_lookup.  As a result
the code should be smaller and run faster then the original.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck 939afb0657 fib_trie: Optimize fib_find_node
This patch makes use of the same features I made use of for
fib_table_lookup to streamline fib_find_node.  The resultant code should be
smaller and run faster than the original.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck 9f9e636d4f fib_trie: Optimize fib_table_lookup to avoid wasting time on loops/variables
This patch is meant to reduce the complexity of fib_table_lookup by reducing
the number of variables to the bare minimum while still keeping the same if
not improved functionality versus the original.

Most of this change was started off by the desire to rid the function of
chopped_off and current_prefix_length as they actually added very little to
the function since they only applied when computing the cindex.  I was able
to replace them mostly with just a check for the prefix match.  As long as
the prefix between the key and the node being tested was the same we know
we can search the tnode fully versus just testing cindex 0.

The second portion of the change ended up being a massive reordering.
Originally the calls to check_leaf were up near the start of the loop, and
the backtracing and descending into lower levels of tnodes was later.  This
didn't make much sense as the structure of the tree means the leaves are
always the last thing to be tested.  As such I reordered things so that we
instead have a loop that will delve into the tree and only exit when we
have either found a leaf or we have exhausted the tree.  The advantage of
rearranging things like this is that we can fully inline check_leaf since
there is now only one reference to it in the function.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00
Alexander Duyck adaf981685 fib_trie: Merge leaf into tnode
This change makes it so that leaf and tnode are the same struct.  As a
result there is no need for rt_trie_node anymore since everyting can be
merged into tnode.

On 32b systems this results in the leaf being 4 bytes larger, however I
don't know if that is really an issue as this and an eariler patch that
added bits & pos have increased the size from 20 to 28.  If I am not
mistaken slub/slab allocate on power of 2 sizes so 20 was likely being
rounded up to 32 anyway.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31 18:25:54 -05:00