Commit graph

8346 commits

Author SHA1 Message Date
Marcel Holtmann c91041dc4e Bluetooth: Add support for untrusted access to management commands
Some management commands are safe to be accessed from any user without
special permissions. First step for allowing access to any of these
commands from untrusted application is to mark them accordingly.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:57:35 +02:00
Marcel Holtmann c85be545ea Bluetooth: Add hci_sock_test_flag helper function
The management interface will need access to the socket flags and so
provide a helper function for checking them.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:57:31 +02:00
Marcel Holtmann c08b1a1dba Bluetooth: Consolidate socket channel sending function back into one
With the introduction of trusted socket flag for control and monitor
channels, it is now possible to use a single function for sending
packets to these sockets. And with that consolidate the handling.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:56:41 +02:00
Marcel Holtmann 50ebc055fa Bluetooth: Introduce trusted flag for management control sockets
Providing a global trusted flag for management control sockets provides
an easy way for identifying sockets and imposing restriction on it. For
now all management sockets are trusted since they require CAP_NET_ADMIN.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:56:00 +02:00
Marcel Holtmann 96f1474af0 Bluetooth: Add support for extended index management command
The Read Extended Contoller Index List command can be used for
retrieving the complete list of local available controllers. This
included configured, unconfigured and also AMP controllers.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:55:51 +02:00
Marcel Holtmann ced85549c3 Bluetooth: Add support for extended index management events
This introduces support for using Extended Index Added and Extended
Index Removed events. These events contain the controller type and
also the hardware bus information from the driver.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:53:08 +02:00
Marcel Holtmann f920733885 Bluetooth: Use special function to send filter management index events
For sending Index Added, Index Removed, Unconfigured Index Added and
Unconfigured Index Removed managment events the new helper functions
allows taking into account if these events are enabled for a certain
management socket or not.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:47:51 +02:00
Marcel Holtmann 17711c6291 Bluetooth: Provide hci_send_to_flagged_channel helper function
The hci_send_to_flagged_channel helper function can be used to send
packets to all channels that have a certain HCI socket flag set.

This is especially useful for managment events that are limited to
sockets that have first enabled certain functionality. This allows
for filtering of events without confusing existing users.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:46:41 +02:00
Marcel Holtmann 6befc6445f Bluetooth: Add flags field and setting function for HCI sockets
To filter out certain actions for certain HCI sockets introcuce a flags
field that allows to configure specific settings on individual sockets.

Since the hci_pinfo structure is private in hci_sock.c, provide helper
functions for setting and clearing a given flag.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-15 09:45:39 +02:00
David S. Miller 5f1764ddfe Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
Here's another set of Bluetooth & ieee802154 patches intended for 4.1:

 - Added support for QCA ROME chipset family in the btusb driver
 - at86rf230 driver fixes & cleanups
 - ieee802154 cleanups
 - Refactoring of Bluetooth mgmt API to allow new users
 - New setting for static Bluetooth address exposed to user space
 - Refactoring of hci_dev flags to remove limit of 32
 - Remove unnecessary fast-connectable setting usage restrictions
 - Fix behavior to be consistent when trying to pair already paired device
 - Service discovery corner-case fixes

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-14 14:29:45 -04:00
Marcel Holtmann b7cb93e528 Bluetooth: Merge hdev->dbg_flags fields into hdev->dev_flags
With the extension of hdev->dev_flags utilizing a bitmap now, the space
is no longer restricted. Merge the hdev->dbg_flags into hdev->dev_flags
to save space on 64-bit architectures. On 32-bit architectures no size
reduction happens.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 19:28:36 +02:00
Alexey Kodanev 40fb70f3aa vxlan: fix wrong usage of VXLAN_VID_MASK
commit dfd8645ea1 wrongly assumes that VXLAN_VDI_MASK includes
eight lower order reserved bits of VNI field that are using for remote
checksum offload.

Right now, when VNI number greater then 0xffff, vxlan_udp_encap_recv()
will always return with 'bad_flag' error, reducing the usable vni range
from 0..16777215 to 0..65535. Also, it doesn't really check whether RCO
bits processed or not.

Fix it by adding new VNI mask which has all 32 bits of VNI field:
24 bits for id and 8 bits for other usage.

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-13 13:08:07 -04:00
Marcel Holtmann eacb44dff9 Bluetooth: Use DECLARE_BITMAP for hdev->dev_flags field
The hdev->dev_flags field has outgrown itself on 32-bit systems. So
instead of hacking around it, switch to using DECLARE_BITMAP.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 18:35:45 +02:00
Marcel Holtmann 238be788fc Bluetooth: Introduce hci_dev_test_and_set_flag helper macro
Instead of manually coding test_and_set_bit on hdev->dev_flags all the
time, use hci_dev_test_and_set_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:33 +02:00
Marcel Holtmann a69d892726 Bluetooth: Introduce hci_dev_test_and_clear_flag helper macro
Instead of manually coding test_and_clear_bit on hdev->dev_flags all the
time, use hci_dev_test_and_clear_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:32 +02:00
Marcel Holtmann 516018a9c0 Bluetooth: Introduce hci_dev_test_and_change_flag helper macro
Instead of manually coding test_and_change_bit on hdev->dev_flags all the
time, use hci_dev_test_and_change_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:31 +02:00
Marcel Holtmann ce05d603af Bluetooth: Introduce hci_dev_change_flag helper macro
Instead of manually coding change_bit on hdev->dev_flags all the time,
use hci_dev_change_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:29 +02:00
Marcel Holtmann a358dc11d8 Bluetooth: Introduce hci_dev_clear_flag helper macro
Instead of manually coding clear_bit on hdev->dev_flags all the time,
use hci_dev_clear_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:27 +02:00
Marcel Holtmann a1536da255 Bluetooth: Introduce hci_dev_set_flag helper macro
Instead of manually coding set_bit on hdev->dev_flags all the time,
use hci_dev_set_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:26 +02:00
Marcel Holtmann d7a5a11d7f Bluetooth: Introduce hci_dev_test_flag helper macro
Instead of manually coding test_bit on hdev->dev_flags all the time,
use hci_dev_test_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:25 +02:00
Marcel Holtmann cc91cb042c Bluetooth: Add support connectable advertising setting
The patch adds a second advertising setting that allows switching of the
controller into connectable mode independent of the global connectable
setting.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:07:54 +02:00
Eric W. Biederman 098a697b49 tcp_metrics: Use a single hash table for all network namespaces.
Now that all of the operations are safe on a single hash table
accross network namespaces, allocate a single global hash table
and update the code to use it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-13 01:57:07 -04:00
Eric Dumazet 3f66b083a5 inet: introduce ireq_family
Before inserting request socks into general hash table,
fill their socket family.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:13 -04:00
Eric Dumazet 41b822c59e inet: prepare sock_edemux() & sock_gen_put() for new SYN_RECV state
sock_edemux() & sock_gen_put() should be ready to cope with request socks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:13 -04:00
Eric Dumazet 1e2e01172f inet: add rsk_refcnt/ireq_refcnt to request socks
When request socks will be in ehash, they'll need to be refcounted.

This patch adds rsk_refcnt/ireq_refcnt macros, and adds
reqsk_put() function, but nothing yet use them.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:13 -04:00
Eric Dumazet d34ac51b76 inet: add ireq_state field to inet_request_sock
We need to identify request sock when they'll be visible in
global ehash table.

ireq_state is an alias to req.__req_common.skc_state.

Its value is set to TCP_NEW_SYN_RECV

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:12 -04:00
Eric Dumazet 10feb428a5 inet: add TCP_NEW_SYN_RECV state
TCP_SYN_RECV state is currently used by fast open sockets.

Initial TCP requests (the pseudo sockets created when a SYN is received)
are not yet associated to a state. They are attached to their parent,
and the parent is in TCP_LISTEN state.

This commit adds TCP_NEW_SYN_RECV state, so that we can convert
TCP stack to a different schem gradually.

This state is not exported to user space.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:12 -04:00
Eric Dumazet bd337c581b ipv6: add missing ireq_net & ir_cookie initializations
I forgot to update dccp_v6_conn_request() & cookie_v6_check().
They both need to set ireq->ireq_net and ireq->ir_cookie

Lets clear ireq->ir_cookie in inet_reqsk_alloc()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 33cf7c90fe ("net: add real socket cookies")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:12 -04:00
Eric W. Biederman 0c5c9fb551 net: Introduce possible_net_t
Having to say
> #ifdef CONFIG_NET_NS
> 	struct net *net;
> #endif

in structures is a little bit wordy and a little bit error prone.

Instead it is possible to say:
> typedef struct {
> #ifdef CONFIG_NET_NS
>       struct net *net;
> #endif
> } possible_net_t;

And then in a header say:

> 	possible_net_t net;

Which is cleaner and easier to use and easier to test, as the
possible_net_t is always there no matter what the compile options.

Further this allows read_pnet and write_pnet to be functions in all
cases which is better at catching typos.

This change adds possible_net_t, updates the definitions of read_pnet
and write_pnet, updates optional struct net * variables that
write_pnet uses on to have the type possible_net_t, and finally fixes
up the b0rked users of read_pnet and write_pnet.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:39:40 -04:00
Eric W. Biederman efd7ef1c19 net: Kill hold_net release_net
hold_net and release_net were an idea that turned out to be useless.
The code has been disabled since 2008.  Kill the code it is long past due.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:39:40 -04:00
Simon Horman d299ce149c vxlan: Correct path typo in comment
Flags are used in the return path rather than the return patch.

Fixes: af33c1adae ("vxlan: Eliminate dependency on UDP socket in transmit path")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 23:05:38 -04:00
Eric Dumazet 33cf7c90fe net: add real socket cookies
A long standing problem in netlink socket dumps is the use
of kernel socket addresses as cookies.

1) It is a security concern.

2) Sockets can be reused quite quickly, so there is
   no guarantee a cookie is used once and identify
   a flow.

3) request sock, establish sock, and timewait socks
   for a given flow have different cookies.

Part of our effort to bring better TCP statistics requires
to switch to a different allocator.

In this patch, I chose to use a per network namespace 64bit generator,
and to use it only in the case a socket needs to be dumped to netlink.
(This might be refined later if needed)

Note that I tried to carry cookies from request sock, to establish sock,
then timewait sockets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Salo <salo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 21:55:28 -04:00
Alexander Duyck 0ddcf43d5d ipv4: FIB Local/MAIN table collapse
This patch is meant to collapse local and main into one by converting
tb_data from an array to a pointer.  Doing this allows us to point the
local table into the main while maintaining the same variables in the
table.

As such the tb_data was converted from an array to a pointer, and a new
array called data is added in order to still provide an object for tb_data
to point to.

In order to track the origin of the fib aliases a tb_id value was added in
a hole that existed on 64b systems.  Using this we can also reverse the
merge in the event that custom FIB rules are enabled.

With this patch I am seeing an improvement of 20ns to 30ns for routing
lookups as long as custom rules are not enabled, with custom rules enabled
we fall back to split tables and the original behavior.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:22:14 -04:00
Johan Hedberg 55e76b3898 Bluetooth: Add 'Already Paired' error for Pair Device command
To make the behavior predictable when attempting to pair with a device
for which we already have a Link Key or Long Term Key, this patch adds a
new 'Already Paired' error which gets sent in such a scenario.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-10 21:42:05 +01:00
Johan Hedberg 406ef2a67b Bluetooth: Make Fast Connectable available while powered off
To maximize the usability of the Fast Connectable feature we should make
it possible to set (or unset) it at any given moment. This means
removing the dependency on the 'connectable' setting as well as the
'powered' setting. The former makes also sense since page scan may get
enabled through add_device even if 'connectable' is false. To keep the
setting available over power cycles its flag also needs to be removed
from the flags that are cleared upon HCI_Reset.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-10 19:37:02 +01:00
David S. Miller 515fb5c317 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter fixes for net-next

The following batch contains a couple of fixes to address some fallout
from the previous pull request, they are:

1) Address link problems in the bridge code after e5de75b. Fix it by
   using rcu hook to address to avoid ifdef pollution and hard
   dependency between bridge and br_netfilter.

2) Address sparse warnings in the netfilter reject code, patch from
   Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 12:48:47 -04:00
Florian Westphal a03a8dbe20 netfilter: fix sparse warnings in reject handling
make C=1 CF=-D__CHECK_ENDIAN__ shows following:

net/bridge/netfilter/nft_reject_bridge.c:65:50: warning: incorrect type in argument 3 (different base types)
net/bridge/netfilter/nft_reject_bridge.c:65:50:    expected restricted __be16 [usertype] protocol [..]
net/bridge/netfilter/nft_reject_bridge.c:102:37: warning: cast from restricted __be16
net/bridge/netfilter/nft_reject_bridge.c:102:37: warning: incorrect type in argument 1 (different base types) [..]
net/bridge/netfilter/nft_reject_bridge.c:121:50: warning: incorrect type in argument 3 (different base types) [..]
net/bridge/netfilter/nft_reject_bridge.c:168:52: warning: incorrect type in argument 3 (different base types) [..]
net/bridge/netfilter/nft_reject_bridge.c:233:52: warning: incorrect type in argument 3 (different base types) [..]

Caused by two (harmless) errors:
1. htons() instead of ntohs()
2. __be16 for protocol in nf_reject_ipXhdr_put API, use u8 instead.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-10 15:01:32 +01:00
Scott Feldman f8f2147150 switchdev: add netlink flags to IPv4 FIB add op
Pass in the netlink flags (NLM_F_*) into switchdev driver for IPv4 FIB add op
to allow driver to 1) optimize hardware updates, 2) handle ip route prepend
and append commands correctly.

Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 23:56:52 -04:00
Florian Fainelli 769a020289 net: dsa: utilize of_find_net_device_by_node
Using of_find_device_by_node() restricts the search to platform_device that
match the specified device_node pointer. This is not even remotely true for
network devices backed by a pci_device for instance.

of_find_net_device_by_node() allows us to do a more thorough lookup to find the
struct net_device corresponding to a particular device_node pointer.

For symetry with the non-OF code path, we hold the net_device pointer in
dsa_probe() just like what dev_to_net_dev() does when we call this
function.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 23:50:21 -04:00
David S. Miller 3cef5c5b0b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/cadence/macb.c

Overlapping changes in macb driver, mostly fixes and cleanups
in 'net' overlapping with the integration of at91_ether into
macb in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 23:38:02 -04:00
Eric W. Biederman ddb3b6033c net: Remove protocol from struct dst_ops
After my change to neigh_hh_init to obtain the protocol from the
neigh_table there are no more users of protocol in struct dst_ops.
Remove the protocol field from dst_ops and all of it's initializers.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 16:06:10 -04:00
David S. Miller 5428aef811 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree. Basically, improvements for the packet rejection infrastructure,
deprecation of CLUSTERIP, cleanups for nf_tables and some untangling for
br_netfilter. More specifically they are:

1) Send packet to reset flow if checksum is valid, from Florian Westphal.

2) Fix nf_tables reject bridge from the input chain, also from Florian.

3) Deprecate the CLUSTERIP target, the cluster match supersedes it in
   functionality and it's known to have problems.

4) A couple of cleanups for nf_tables rule tracing infrastructure, from
   Patrick McHardy.

5) Another cleanup to place transaction declarations at the bottom of
   nf_tables.h, also from Patrick.

6) Consolidate Kconfig dependencies wrt. NF_TABLES.

7) Limit table names to 32 bytes in nf_tables.

8) mac header copying in bridge netfilter is already required when
   calling ip_fragment(), from Florian Westphal.

9) move nf_bridge_update_protocol() to br_netfilter.c, also from
   Florian.

10) Small refactor in br_netfilter in the transmission path, again from
    Florian.

11) Move br_nf_pre_routing_finish_bridge_slow() to br_netfilter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 15:58:21 -04:00
Cong Wang 1e052be69d net_sched: destroy proto tp when all filters are gone
Kernel automatically creates a tp for each
(kind, protocol, priority) tuple, which has handle 0,
when we add a new filter, but it still is left there
after we remove our own, unless we don't specify the
handle (literally means all the filters under
the tuple). For example this one is left:

  # tc filter show dev eth0
  filter parent 8001: protocol arp pref 49152 basic

The user-space is hard to clean up these for kernel
because filters like u32 are organized in a complex way.
So kernel is responsible to remove it after all filters
are gone.  Each type of filter has its own way to
store the filters, so each type has to provide its
way to check if all filters are gone.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 15:35:55 -04:00
Eric W. Biederman b79bda3d38 neigh: Use neigh table index for neigh_packet_xmit
Remove a little bit of unnecessary work when transmitting a packet with
neigh_packet_xmit.  Use the neighbour table index not the address family
as a parameter.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-08 19:30:06 -04:00
Shani Michaeli c93682477b net/dcb: Add IEEE QCN attribute
As specified in 802.1Qau spec. Add this optional attribute to the
DCB netlink layer. To allow for application to use the new attribute,
NIC drivers should implement and register the  callbacks ieee_getqcn,
ieee_setqcn and ieee_getqcnstats.

The QCN attribute holds a set of parameters for management, and
a set of statistics to provide informative data on Congestion-Control
defined by this spec.

Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 21:50:02 -05:00
Willem de Bruijn 89650ad004 fib: make netdev_switch_fib_ipv4_abort in header file static inline
When building without CONFIG_NET_SWITCHDEV,
netdev_switch_fib_ipv4_abort is defined in the header file. It must
be static inline to avoid build failure at link time.

Fixes: 8e05fd7166 ("fib: hook IPv4 fib for hardware offload")

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 15:44:35 -05:00
Fan Du 05cbc0db03 ipv4: Create probe timer for tcp PMTU as per RFC4821
As per RFC4821 7.3.  Selecting Probe Size, a probe timer should
be armed once probing has converged. Once this timer expired,
probing again to take advantage of any path PMTU change. The
recommended probing interval is 10 minutes per RFC1981. Probing
interval could be sysctled by sysctl_tcp_probe_interval.

Eric Dumazet suggested to implement pseudo timer based on 32bits
jiffies tcp_time_stamp instead of using classic timer for such
rare event.

Signed-off-by: Fan Du <fan.du@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 14:57:42 -05:00
Fan Du 6b58e0a5f3 ipv4: Use binary search to choose tcp PMTU probe_size
Current probe_size is chosen by doubling mss_cache,
the probing process will end shortly with a sub-optimal
mss size, and the link mtu will not be taken full
advantage of, in return, this will make user to tweak
tcp_base_mss with care.

Use binary search to choose probe_size in a fine
granularity manner, an optimal mss will be found
to boost performance as its maxmium.

In addition, introduce a sysctl_tcp_probe_threshold
to control when probing will stop in respect to
the width of search range.

Test env:
Docker instance with vxlan encapuslation(82599EB)
iperf -c 10.0.0.24  -t 60

before this patch:
1.26 Gbits/sec

After this patch: increase 26%
1.59 Gbits/sec

Signed-off-by: Fan Du <fan.du@intel.com>
Acked-by: John Heffner <johnwheffner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 14:57:41 -05:00
Fan Du dcd8fb8533 ipv4: Raise tcp PMTU probe mss base size
Quotes from RFC4821 7.2.  Selecting Initial Values

   It is RECOMMENDED that search_low be initially set to an MTU size
   that is likely to work over a very wide range of environments.  Given
   today's technologies, a value of 1024 bytes is probably safe enough.
   The initial value for search_low SHOULD be configurable.

Moreover, set a small value will introduce extra time for the search
to converge. So set the initial probe base mss size to 1024 Bytes.

Signed-off-by: Fan Du <fan.du@intel.com>
Acked-by: John Heffner <johnwheffner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 14:57:41 -05:00
Eric W. Biederman aaa4e70404 DECnet: Only use neigh_ops for adding the link layer header
Other users users of the neighbour table use neigh->output as the method
to decided when and which link-layer header to place on a packet.
DECnet has been using neigh->output to decide which DECnet headers to
place on a packet depending which neighbour the packet is destined for.

The DECnet usage isn't totally wrong but it can run into problems if the
neighbour output function is run for a second time as the teql driver
and the bridge netfilter code can do.

Therefore to avoid pathologic problems later down the line and make the
neighbour code easier to understand by refactoring the decnet output
code to only use a neighbour method to add a link layer header to a
packet.

This is done by moving the neigbhour operations lookup from
dn_to_neigh_output to dn_neigh_output_packet.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 14:54:22 -05:00
Johan Hedberg b9a245fb12 Bluetooth: Move all mgmt command quirks to handler table
In order to completely generalize the mgmt command handling we need to
move away command-specific information from mgmt_control() into the
actual command table. This patch adds a new 'flags' field to the handler
entries which can now contain the following command specific
information:

 - Command takes variable length parameters
 - Command doesn't target any specific HCI device
 - Command can be sent when the HCI device is unconfigured

After this the mgmt_control() function is completely generic and can
potentially be reused by new HCI channels.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-06 20:15:21 +01:00
Johan Hedberg 6d785aa345 Bluetooth: Convert mgmt to use HCI chan registration API
This patch converts the existing mgmt code to use the newly introduced
generic API for registering HCI channels with mgmt-like semantics.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-06 20:15:21 +01:00
Johan Hedberg 801c1e8da5 Bluetooth: Add mgmt HCI channel registration API
This patch adds an API for registering HCI channels with mgmt-like
semantics. For now the only user will be HCI_CHANNEL_CONTROL, but e.g.
6lowpan is intended to use this as well in the future.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-06 20:15:21 +01:00
Marcel Holtmann 93690c227a Bluetooth: Introduce controller setting information for static address
Currently it is not possible to determine if the static address is used
by the controller. It is also not possible to determine if using a
static on a dual-mode controller with disabled BR/EDR is possible or
not.

To address this issue, introduce a new setting called static-address. If
support for this setting is signaled that means that the kernel supports
using static addresses. And if used on dual-mode controllers with BR/EDR
disabled it means that a configured static address can be used.

In addition utilize the same setting for the list of current active
settings that indicates if a static address is configured and if that
address will be actually used.

With this in mind the existing Set Static Address management command
has been extended to return the current settings. That way the caller
of that command can easily determine if the programmed address will
be used or if extra steps are required.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-06 20:43:07 +02:00
Scott Feldman 8e05fd7166 fib: hook IPv4 fib for hardware offload
Call into the switchdev driver any time an IPv4 fib entry is
added/modified/deleted from the kernel's FIB.  The switchdev driver may or
may not install the route to the offload device.  In the case where the
driver tries to install the route and something goes wrong (device's routing
table is full, etc), then all of the offloaded routes will be flushed from the
device, route forwarding falls back to the kernel, and no more routes are
offloading.

We can refine this logic later.  For now, use the simplist model of offloading
routes up to the point of failure, and then on failure, undo everything and
mark IPv4 offloading disabled.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 00:24:58 -05:00
Scott Feldman 448b128a14 ipv4: add net bool fib_offload_disabled
If something goes wrong with IPv4 FIB offload, mark entire net offload
disabled.  This is brute force policy to basically shut down IPv4 FIB offload
permanently if there is a problem offloading any route to an external device.
We can refine the policy in the future, to handle failures on a per-device or
per-route basis, but for now, this policy is per-net.

What we're trying to avoid is an inconsistent split between the kernel's FIB
and the offload device's FIB.  We don't want the device to fwd a pkt
inconsitent with what the kernel would do.  An example of a split is if device
has 10.0.0.0/16 and kernel has 10.0.0.0/16 and 10.0.0.0/24, the device wouldn't
see the longest prefix 10.0.0.0/24 and potentially forward pkts incorrectly.

Limited capacity or limited capability are two ways a route may fail to install
to the offload device.  We'll not differentiate between failures at this time,
and treat any failure as fatal and mark the net as fib_offload_disabled.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 00:24:58 -05:00
Scott Feldman 104616e74e switchdev: don't support custom ip rules, for now
Keep switchdev FIB offload model simple for now and don't allow custom ip
rules.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 00:24:58 -05:00
Scott Feldman 5e8d90497d switchdev: add IPv4 fib ndo ops wrappers
Add IPv4 fib ndo wrapper funcs and stub them out for now.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 00:24:58 -05:00
Florian Fainelli 5929903103 net: dsa: let switches specify their tagging protocol
In order to support the new DSA device driver model, a dsa_switch should
be able to advertise the type of tagging protocol supported by the
underlying switch device. This also removes constraints on how tagging
can be stacked to each other.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 00:18:20 -05:00
Pablo Neira Ayuso 1cae565e8b netfilter: nf_tables: limit maximum table name length to 32 bytes
Set the same as we use for chain names, it should be enough.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-06 01:21:21 +01:00
Patrick McHardy 1a1e1a1219 netfilter: nf_tables: cleanup nf_tables.h
The transaction related definitions are squeezed in between the rule
and expression definitions, which are closely related and should be
next to each other. The transaction definitions actually don't belong
into that file at all since it defines the global objects and API and
transactions are internal to nf_tables_api, but for now simply move
them to a seperate section.

Similar, the chain types are in between a set of registration functions,
they belong to the chain section.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-06 01:21:13 +01:00
Pablo Neira Ayuso 43270b1bc5 netfilter: ipt_CLUSTERIP: deprecate it in favour of xt_cluster
xt_cluster supersedes ipt_CLUSTERIP since it can be also used in
gateway configurations (not only from the backend side).

ipt_CLUSTER is also known to leak the netdev that it uses on
device removal, which requires a rather large fix to workaround
the problem: http://patchwork.ozlabs.org/patch/358629/

So let's deprecate this so we can probably kill code this in the
future.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-06 01:21:05 +01:00
Jakub Pawlowski 82f8b651a9 Bluetooth: fix service discovery behaviour for empty uuids filter
This patch fixes service discovery behaviour, when provided uuid filter
is empty and HCI_QUIRK_STRICT_DUPLICATE_FILTER is set. Before this
patch, empty uuid filter was unable to trigger scan restart, and that
caused inconsistent behaviour in applications.

Example: two DBus clients call BlueZ, one to find all devices with
service abcd, second to find all devices with rssi smaller than -90.
Sum of those filters, that is passed to mgmt_service_scan is empty
filter, with no rssi or uuids set.
That caused kernel not to restart scan when quirk was set.
That was inconsistent with what happen when there's only one of those
two filters set (scan is restarted and reports devices).

To fix that, new variable hdev->discovery.result_filtering was
introduced. It can indicate that filtered scan is running, no matter
what uuid or rssi filter is set.

Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-05 09:50:50 +02:00
Alexander Duyck a7e5353123 fib_trie: Make fib_table rcu safe
The fib_table was wrapped in several places with an
rcu_read_lock/rcu_read_unlock however after looking over the code I found
several spots where the tables were being accessed as just standard
pointers without any protections.  This change fixes that so that all of
the proper protections are in place when accessing the table to take RCU
replacement or removal of the table into account.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 23:35:18 -05:00
Patrick McHardy 86f1ec3231 netfilter: nf_tables: fix userdata length overflow
The NFT_USERDATA_MAXLEN is defined to 256, however we only have a u8
to store its size. Introduce a struct nft_userdata which contains a
length field and indicate its presence using a single bit in the rule.

The length field of struct nft_userdata is also a u8, however we don't
store zero sized data, so the actual length is udata->len + 1.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-04 18:46:06 +01:00
SenthilKumar Jegadeesan 64a8cef41a mac80211: provide station PMF configuration to driver
Some device drivers offload part of aggregation including AddBA/DelBA
negotiations to firmware. In such scenario, the PMF configuration of
the station needs to be provided to driver to enable encryption of
AddBA/DelBA action frames.

Signed-off-by: SenthilKumar Jegadeesan <sjegadee@qti.qualcomm.com>
[fix commit log, documentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-04 10:34:12 +01:00
Arik Nemtsov 3384d757d4 mac80211: allow iterating inactive interfaces
Sometimes the driver might want to modify private data in interfaces
that are down. One possible use-case is cleaning up interface state
after HW recovery. Some interfaces that were up before the recovery took
place might be down now, but they might still be "dirty".

Introduce a new iterate_interfaces() API and a new ACTIVE iterator flag.
This way the internal implementation of the both active and inactive
APIs remains the same.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-04 10:34:12 +01:00
Eric W. Biederman 7720c01f3f mpls: Add a sysctl to control the size of the mpls label table
This sysctl gives two benefits.  By defaulting the table size to 0
mpls even when compiled in and enabled defaults to not forwarding
any packets.  This prevents unpleasant surprises for users.

The other benefit is that as mpls labels are allocated locally a dense
table a small dense label table may be used which saves memory and
is extremely simple and efficient to implement.

This sysctl allows userspace to choose the restrictions on the label
table size userspace applications need to cope with.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:26:06 -05:00
Eric W. Biederman 0189197f44 mpls: Basic routing support
This change adds a new Kconfig option MPLS_ROUTING.

The core of this change is the code to look at an mpls packet received
from another machine.  Look that packet up in a routing table and
forward the packet on.

Support of MPLS over ATM is not considered or attempted here.  This
implemntation follows RFC3032 and implements the MPLS shim header that
can pass over essentially any network.

What RFC3021 refers to as the as the Incoming Label Map (ILM) I call
net->mpls.platform_label[].  What RFC3031 refers to as the Next Label
Hop Forwarding Entry (NHLFE) I call mpls_route.  Though calling it the
label fordwarding information base (lfib) might also be valid.

Further the implemntation forwards packets as described in RFC3032.
There is no need and given the original motivation for MPLS a strong
discincentive to have a flexible label forwarding path.  In essence
the logic is the topmost label is read, looked up, removed, and
replaced by 0 or more new lables and the sent out the specified
interface to it's next hop.

Quite a few optional features are not implemented here.  Among them
are generation of ICMP errors when the TTL is exceeded or the packet
is larger than the next hop MTU (those conditions are detected and the
packets are dropped instead of generating an icmp error).  The traffic
class field is always set to 0.  The implementation focuses on IP over
MPLS and does not handle egress of other kinds of protocols.

Instead of implementing coordination with the neighbour table and
sorting out how to input next hops in a different address family (for
which there is value).  I was lazy and implemented a next hop mac
address instead.  The code is simpler and there are flavor of MPLS
such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is
appropriate so a next hop by mac address would need to be implemented
at some point.

Two new definitions AF_MPLS and PF_MPLS are exposed to userspace.

Decoding the mpls header must be done by first byeswapping a 32bit bit
endian word into the local cpu endian and then bit shifting to extract
the pieces.  There is no C bit-field that can represent a wire format
mpls header on a little endian machine as the low bits of the 20bit
label wind up in the wrong half of third byte.  Therefore internally
everything is deal with in cpu native byte order except when writing
to and reading from a packet.

For management simplicity if a label is configured to forward out
an interface that is down the packet is dropped early.  Similarly
if an network interface is removed rt_dev is updated to NULL
(so no reference is preserved) and any packets for that label
are dropped.  Keeping the label entries in the kernel allows
the kernel label table to function as the definitive source
of which labels are allocated and which are not.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:26:06 -05:00
Eric W. Biederman 4fd3d7d9e8 neigh: Add helper function neigh_xmit
For MPLS I am building the code so that either the neighbour mac
address can be specified or we can have a next hop in ipv4 or ipv6.

The kind of next hop we have is indicated by the neighbour table
pointer.  A neighbour table pointer of NULL is a link layer address.
A non-NULL neighbour table pointer indicates which neighbour table and
thus which address family the next hop address is in that we need to
look up.

The code either sends a packet directly or looks up the appropriate
neighbour table entry and sends the packet.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:23:23 -05:00
Eric W. Biederman 60395a20ff neigh: Factor out ___neigh_lookup_noref
While looking at the mpls code I found myself writing yet another
version of neigh_lookup_noref.  We currently have __ipv4_lookup_noref
and __ipv6_lookup_noref.

So to make my work a little easier and to make it a smidge easier to
verify/maintain the mpls code in the future I stopped and wrote
___neigh_lookup_noref.  Then I rewote __ipv4_lookup_noref and
__ipv6_lookup_noref in terms of this new function.  I tested my new
version by verifying that the same code is generated in
ip_finish_output2 and ip6_finish_output2 where these functions are
inlined.

To get to ___neigh_lookup_noref I added a new neighbour cache table
function key_eq.  So that the static size of the key would be
available.

I also added __neigh_lookup_noref for people who want to to lookup
a neighbour table entry quickly but don't know which neibhgour table
they are going to look up.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:23:23 -05:00
David S. Miller 71a83a6db6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/rocker/rocker.c

The rocker commit was two overlapping changes, one to rename
the ->vport member to ->pport, and another making the bitmask
expression use '1ULL' instead of plain '1'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03 21:16:48 -05:00
Eric W. Biederman 1d5da757da ax25: Stop using magic neighbour cache operations.
Before the ax25 stack calls dev_queue_xmit it always calls
ax25_type_trans which sets skb->protocol to ETH_P_AX25.

Which means that by looking at the protocol type it is possible to
detect IP packets that have not been munged by the ax25 stack in
ndo_start_xmit and call a function to munge them.

Rename ax25_neigh_xmit to ax25_ip_xmit and tweak the return type and
value to be appropriate for an ndo_start_xmit function.

Update all of the ax25 devices to test the protocol type for ETH_P_IP
and return ax25_ip_xmit as the first thing they do.  This preserves
the existing semantics of IP packet processing, but the timing will be
a little different as the IP packets now pass through the qdisc layer
before reaching the ax25 ip packet processing.

Remove the now unnecessary ax25 neighbour table operations.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03 14:44:41 -05:00
Alexander Bondar 2ecc3905e6 mac80211: Update beacon's timing and DTIM count on every beacon
Beacon's timestamp, device system time associated with this beacon and
DTIM count parameters are not updated in the associated vif context
if the latest beacon's content is identical to the previously received.
It make sense to update these changing parameters on every beacon so the
driver can get most updated values. This may be necessary, for example,
to avoid either beacons' drift effect or device time stamp overrun.
IMPORTANT: Three sync_* parameters - sync_ts, sync_device_ts and
sync_dtim_count would possibly be out of sync by the time the driver will
use them. The synchronized view is currently guaranteed only in certain
callbacks.

Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-03 15:56:06 +01:00
Ahmad Kholaif 6c09e791b2 cfg80211: Allow NL80211_ATTR_IFINDEX to be added to vendor events
This modifies cfg80211_vendor_event_alloc() with an additional argument
struct wireless_dev *wdev. __cfg80211_alloc_event_skb() is modified to
take in *wdev argument, if wdev != NULL, both the NL80211_ATTR_IFINDEX
and wdev identifier are added to the vendor event.

These changes make it easier for drivers to add ifindex indication in
vendor events cleanly.

This also updates all existing users of cfg80211_vendor_event_alloc()
and __cfg80211_alloc_event_skb() in the kernel tree.

Signed-off-by: Ahmad Kholaif <akholaif@qca.qualcomm.com>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-03 15:56:05 +01:00
Dedy Lansky 6eb1813764 cfg80211: add bss_type and privacy arguments in cfg80211_get_bss()
802.11ad adds new a network type (PBSS) and changes the capability
field interpretation for the DMG (60G) band.
The same 2 bits that were interpreted as "ESS" and "IBSS" before are
re-used as a 2-bit field with 3 valid values (and 1 reserved). Valid
values are: "IBSS", "PBSS" (new) and "AP".

In order to get the BSS struct for the new PBSS networks, change the
cfg80211_get_bss() function to take a new enum ieee80211_bss_type
argument with the valid network types, as "capa_mask" and "capa_val"
no longer work correctly (the search must be band-aware now.)

The remaining bits in "capa_mask" and "capa_val" are used only for
privacy matching so replace those two with a privacy enum as well.

Signed-off-by: Dedy Lansky <dlansky@codeaurora.org>
[rewrite commit log, tiny fixes]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-03-03 15:56:01 +01:00
Florian Westphal ee586bbc28 netfilter: reject: don't send icmp error if csum is invalid
tcp resets are never emitted if the packet that triggers the
reject/reset has an invalid checksum.

For icmp error responses there was no such check.
It allows to distinguish icmp response generated via

iptables -I INPUT -p udp --dport 42 -j REJECT

and those emitted by network stack (won't respond if csum is invalid,
REJECT does).

Arguably its possible to avoid this by using conntrack and only
using REJECT with -m conntrack NEW/RELATED.

However, this doesn't work when connection tracking is not in use
or when using nf_conntrack_checksum=0.

Furthermore, sending errors in response to invalid csums doesn't make
much sense so just add similar test as in nf_send_reset.

Validate csum if needed and only send the response if it is ok.

Reference: http://bugzilla.redhat.com/show_bug.cgi?id=1169829
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-03 02:10:35 +01:00
Eric W. Biederman bdf53c5849 neigh: Don't require dst in neigh_hh_init
- Add protocol to neigh_tbl so that dst->ops->protocol is not needed
- Acquire the device from neigh->dev

This results in a neigh_hh_init that will cache the samve values
regardless of the packets flowing through it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 16:43:41 -05:00
Eric W. Biederman 59b2af26b9 arp: Kill arp_find
There are no more callers so kill this function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 16:43:41 -05:00
Eric W. Biederman def6775369 neigh: Move neigh_compat_output into ax25_ip.c
The only caller is now is ax25_neigh_construct so move
neigh_compat_output into ax25_ip.c make it static and rename it
ax25_neigh_output.

Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 16:43:40 -05:00
Eric W. Biederman 3b6a94bed0 ax25: Refactor to use private neighbour operations.
AX25 already has it's own private arp cache operations to isolate
it's abuse of dev_rebuild_header to transmit packets.  Add a function
ax25_neigh_construct that will allow all of the ax25 devices to
force using these operations, so that the generic arp code does
not need to.

Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 16:43:40 -05:00
Eric W. Biederman 46d4e47abe ax25: Make ax25_header and ax25_rebuild_header static
The only user is in ax25_ip.c so stop exporting these functions.

Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 16:43:40 -05:00
David S. Miller 77f0379fa8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

A small batch with accumulated updates in nf-next, mostly IPVS updates,
they are:

1) Add 64-bits stats counters to IPVS, from Julian Anastasov.

2) Move NETFILTER_XT_MATCH_ADDRTYPE out of NETFILTER_ADVANCED as docker
seem to require this, from Anton Blanchard.

3) Use boolean instead of numeric value in set_match_v*(), from
coccinelle via Fengguang Wu.

4) Allows rescheduling of new connections in IPVS when port reuse is
detected, from Marcelo Ricardo Leitner.

5) Add missing bits to support arptables extensions from nft_compat,
from Arturo Borrero.

Patrick is preparing a large batch to enhance the set infrastructure,
named expressions among other things, that should follow up soon after
this batch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 14:55:05 -05:00
David S. Miller 70c836a4d1 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-03-02

Here's the first bluetooth-next pull request targeting the 4.1 kernel:

 - ieee802154/6lowpan cleanups
 - SCO routing to host interface support for the btmrvl driver
 - AMP code cleanups
 - Fixes to AMP HCI init sequence
 - Refactoring of the HCI callback mechanism
 - Added shutdown routine for Intel controllers in the btusb driver
 - New config option to enable/disable Bluetooth debugfs information
 - Fix for early data reception on L2CAP fixed channels

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 14:47:12 -05:00
Ying Xue 1b78414047 net: Remove iocb argument from sendmsg and recvmsg
After TIPC doesn't depend on iocb argument in its internal
implementations of sendmsg() and recvmsg() hooks defined in proto
structure, no any user is using iocb argument in them at all now.
Then we can drop the redundant iocb argument completely from kinds of
implementations of both sendmsg() and recvmsg() in the entire
networking stack.

Cc: Christoph Hellwig <hch@lst.de>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 13:06:31 -05:00
Eyal Birger 744d5a3e9f net: move skb->dropcount to skb->cb[]
Commit 977750076d ("af_packet: add interframe drop cmsg (v6)")
unionized skb->mark and skb->dropcount in order to allow recording
of the socket drop count while maintaining struct sk_buff size.

skb->dropcount was introduced since there was no available room
in skb->cb[] in packet sockets. However, its introduction led to
the inability to export skb->mark, or any other aliased field to
userspace if so desired.

Moving the dropcount metric to skb->cb[] eliminates this problem
at the expense of 4 bytes less in skb->cb[] for protocol families
using it.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:30 -05:00
Eyal Birger 3bc3b96f3b net: add common accessor for setting dropcount on packets
As part of an effort to move skb->dropcount to skb->cb[], use
a common function in order to set dropcount in struct sk_buff.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:30 -05:00
Eyal Birger b4772ef879 net: use common macro for assering skb->cb[] available size in protocol families
As part of an effort to move skb->dropcount to skb->cb[] use a common
macro in protocol families using skb->cb[] for ancillary data to
validate available room in skb->cb[].

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:30 -05:00
Eyal Birger 6368c23577 net: bluetooth: compact struct bt_skb_cb by converting boolean fields to bit fields
Convert boolean fields incoming and req_start to bit fields and move
force_active in order save space in bt_skb_cb in an effort to use
a portion of skb->cb[] for storing skb->dropcount.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:29 -05:00
Eyal Birger 49a6fe0557 net: bluetooth: compact struct bt_skb_cb by inlining struct hci_req_ctrl
struct hci_req_ctrl is never used outside of struct bt_skb_cb;
Inlining it frees 8 bytes on a 64 bit system in skb->cb[] allowing
the addition of more ancillary data.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:29 -05:00
Johannes Berg 36ef906ee8 wext: add checked wrappers for adding events/points to streams
These checked wrappers are necessary for the next patch, which
will use them to avoid sending out partial scan results.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-02-28 21:31:12 +01:00
Alexander Duyck 56315f9e6e fib_trie: Convert fib_alias to hlist from list
There isn't any advantage to having it as a list and by making it an hlist
we make the fib_alias more compatible with the list_info in terms of the
type of list used.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27 16:37:06 -05:00
Madhu Challa 93a714d6b5 multicast: Extend ip address command to enable multicast group join/leave on
Joining multicast group on ethernet level via "ip maddr" command would
not work if we have an Ethernet switch that does igmp snooping since
the switch would not replicate multicast packets on ports that did not
have IGMP reports for the multicast addresses.

Linux vxlan interfaces created via "ip link add vxlan" have the group option
that enables then to do the required join.

By extending ip address command with option "autojoin" we can get similar
functionality for openvswitch vxlan interfaces as well as other tunneling
mechanisms that need to receive multicast traffic. The kernel code is
structured similar to how the vxlan driver does a group join / leave.

example:
ip address add 224.1.1.10/24 dev eth5 autojoin
ip address del 224.1.1.10/24 dev eth5

Signed-off-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27 16:25:25 -05:00
Madhu Challa 46a4dee074 igmp v6: add __ipv6_sock_mc_join and __ipv6_sock_mc_drop
Based on the igmp v4 changes from Eric Dumazet.
959d10f6bbf6("igmp: add __ip_mc_{join|leave}_group()")

These changes are needed to perform igmp v6 join/leave while
RTNL is held.

Make ipv6_sock_mc_join and ipv6_sock_mc_drop wrappers around
__ipv6_sock_mc_join and  __ipv6_sock_mc_drop to avoid
proliferation of work queues.

Signed-off-by: Madhu Challa <challa@noironetworks.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27 16:25:24 -05:00
Tom Herbert 723b8e460d udp: In udp_flow_src_port use random hash value if skb_get_hash fails
In the unlikely event that skb_get_hash is unable to deduce a hash
in udp_flow_src_port we use a consistent random value instead.
This is specified in GRE/UDP draft section 3.2.1:
https://tools.ietf.org/html/draft-ietf-tsvwg-gre-in-udp-encap-04

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27 16:00:01 -05:00
Johan Hedberg 4cd3928a8b Bluetooth: Update New CSRK event to match latest specification
The 'master' parameter of the New CSRK event was recently renamed to
'type', with the old values kept for backwards compatibility as
unauthenticated local/remote keys. This patch updates the code to take
into account the two new (authenticated) values and ensures they get
used based on the security level of the connection that the respective
keys get distributed over.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-27 18:25:48 +01:00
Guenter Roeck d79d210736 net: dsa: Introduce dsa_is_port_initialized
To avoid race conditions when using the ds->ports[] array,
we need to check if the accessed port has been initialized.
Introduce and use helper function dsa_is_port_initialized
for that purpose and use it where needed.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-25 17:57:48 -05:00
Florian Fainelli b73adef677 net: dsa: integrate with SWITCHDEV for HW bridging
In order to support bridging offloads in DSA switch drivers, select
NET_SWITCHDEV to get access to the port_stp_update and parent_get_id
NDOs that we are required to implement.

To facilitate the integratation at the DSA driver level, we implement 3
types of operations:

- port_join_bridge
- port_leave_bridge
- port_stp_update

DSA will resolve which switch ports that are currently bridge port
members as some Switch hardware/drivers need to know about that to limit
the register programming to just the relevant registers (especially for
slow MDIO buses).

We also take care of setting the correct STP state when slave network
devices are brought up/down while being bridge members.

Finally, when a port is leaving the bridge, we make sure we set in
BR_STATE_FORWARDING state, otherwise the bridge layer would leave it
disabled as a result of having left the bridge.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-25 17:03:38 -05:00
Marcelo Ricardo Leitner d752c36457 ipvs: allow rescheduling of new connections when port reuse is detected
Currently, when TCP/SCTP port reusing happens, IPVS will find the old
entry and use it for the new one, behaving like a forced persistence.
But if you consider a cluster with a heavy load of small connections,
such reuse will happen often and may lead to a not optimal load
balancing and might prevent a new node from getting a fair load.

This patch introduces a new sysctl, conn_reuse_mode, that allows
controlling how to proceed when port reuse is detected. The default
value will allow rescheduling of new connections only if the old entry
was in TIME_WAIT state for TCP or CLOSED for SCTP.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-02-25 13:46:35 +09:00
Mahesh Bandewar 14c9551a32 bonding: Implement port churn-machine (AD standard 43.4.17).
The Churn Detection machines detect the situation where a port is operable,
but the Actor and Partner have not attached the link to an Aggregator and
brought the link into operation within a bound time period. Under normal
operation of the LACP, agreement between Actor and Partner should be reached
very rapidly. Continued failure to reach agreement can be symptomatic of
device failure.

Actor-churn-detection state-machine
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>

===================================

BEGIN=True + PortEnable=False
           |
           v
 +------------------------+   ActorPort.Sync=True  +------------------+
 |   ACTOR_CHURN_MONITOR  | ---------------------> |  NO_ACTOR_CHURN  |
 |========================|                        |==================|
 |    ActorChurn=False    |  ActorPort.Sync=False  | ActorChurn=False |
 | ActorChurn.Timer=Start | <--------------------- |                  |
 +------------------------+                        +------------------+
           |                                                ^
           |                                                |
  ActorChurn.Timer=Expired                                  |
           |                                       ActorPort.Sync=True
           |                                                |
           |                +-----------------+             |
           |                |   ACTOR_CHURN   |             |
           |                |=================|             |
           +--------------> | ActorChurn=True | ------------+
                            |                 |
                            +-----------------+

Similar for the Partner-churn-detection.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-24 16:05:48 -05:00
Dan Carpenter 278f7b4fff caif: fix a signedness bug in cfpkt_iterate()
The cfpkt_iterate() function can return -EPROTO on error, but the
function is a u16 so the negative value gets truncated to a positive
unsigned short.  This causes a static checker warning.

The only caller which might care is cffrml_receive(), when it's checking
the frame checksum.  I modified cffrml_receive() so that it never says
-EPROTO is a valid checksum.

Also this isn't ever going to be inlined so I removed the "inline".

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-20 17:35:14 -05:00
Johan Hedberg 7129069e84 Bluetooth: Rename hci_send_to_control to hci_send_to_channel
The hci_send_to_control() can be made more general purpose with a small
change of passing the desired HCI channel as a parameter to it. This
allows using it for the monitor channel as well as e.g. 6lowpan in the
future.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-20 18:20:17 +01:00
Johan Hedberg 3a6d576be9 Bluetooth: Convert disconn_cfm to be triggered through hci_cb
This patch moves all the disconn_cfm callbacks to be based on the hci_cb
list. This means making l2cap_disconn_cfm private to l2cap_core.c and
sco_conn_cb private to sco.c respectively. Since the hci_conn type
filtering isn't done any more on the wrapper level the callbacks
themselves need to check that they were passed a relevant type of
connection.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-19 08:44:29 +01:00
Johan Hedberg 539c496d88 Bluetooth: Convert connect_cfm to be triggered through hci_cb
This patch moves all the connect_cfm callbacks to be based on the hci_cb
list. This means making l2cap_connect_cfm private to l2cap_core.c and
sco_connect_cb private to sco.c respectively. Since the hci_conn type
filtering isn't done any more on the wrapper level the callbacks
themselves need to check that they were passed a relevant type of
connection.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-19 08:44:29 +01:00
Johan Hedberg 354fe804ed Bluetooth: Convert L2CAP security callback to use hci_cb
There's no reason to have the custom hci_proto_auth/encrypt_cfm helpers
when the hci_cb list works equally well. This patch adds L2CAP to the
hci_cb list and makes l2cap_security_cfm a private function of
l2cap_core.c.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-19 08:44:28 +01:00
Johan Hedberg fba7ecf09b Bluetooth: Convert hci_cb_list_lock to a mutex
We'll soon need to be able to sleep inside the loops that iterate the
hci_cb list, so neither a spinlock, rwlock or rcu are usable. This patch
changes the lock to a mutex which permits sleeping while holding the
lock.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-19 08:44:28 +01:00
Linus Torvalds f5af19d10d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates from David Miller:

 1) Missing netlink attribute validation in nft_lookup, from Patrick
    McHardy.

 2) Restrict ipv6 partial checksum handling to UDP, since that's the
    only case it works for.  From Vlad Yasevich.

 3) Clear out silly device table sentinal macros used by SSB and BCMA
    drivers.  From Joe Perches.

 4) Make sure the remote checksum code never creates a situation where
    the remote checksum is applied yet the tunneling metadata describing
    the remote checksum transformation is still present.  Otherwise an
    external entity might see this and apply the checksum again.  From
    Tom Herbert.

 5) Use msecs_to_jiffies() where applicable, from Nicholas Mc Guire.

 6) Don't explicitly initialize timer struct fields, use setup_timer()
    and mod_timer() instead.  From Vaishali Thakkar.

 7) Don't invoke tg3_halt() without the tp->lock held, from Jun'ichi
    Nomura.

 8) Missing __percpu annotation in ipvlan driver, from Eric Dumazet.

 9) Don't potentially perform skb_get() on shared skbs, also from Eric
    Dumazet.

10) Fix COW'ing of metrics for non-DST_HOST routes in ipv6, from Martin
    KaFai Lau.

11) Fix merge resolution error between the iov_iter changes in vhost and
    some bug fixes that occurred at the same time.  From Jason Wang.

12) If rtnl_configure_link() fails we have to perform a call to
    ->dellink() before unregistering the device.  From WANG Cong.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (39 commits)
  net: dsa: Set valid phy interface type
  rtnetlink: call ->dellink on failure when ->newlink exists
  com20020-pci: add support for eae single card
  vhost_net: fix wrong iter offset when setting number of buffers
  net: spelling fixes
  net/core: Fix warning while make xmldocs caused by dev.c
  net: phy: micrel: disable NAND-tree for KSZ8021, KSZ8031, KSZ8051, KSZ8081
  ipv6: fix ipv6_cow_metrics for non DST_HOST case
  openvswitch: Fix key serialization.
  r8152: restore hw settings
  hso: fix rx parsing logic when skb allocation fails
  tcp: make sure skb is not shared before using skb_get()
  bridge: netfilter: Move sysctl-specific error code inside #ifdef
  ipv6: fix possible deadlock in ip6_fl_purge / ip6_fl_gc
  ipvlan: add a missing __percpu pcpu_stats
  tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one()
  bgmac: fix device initialization on Northstar SoCs (condition typo)
  qlcnic: Delete existing multicast MAC list before adding new
  net/mlx5_core: Fix configuration of log_uar_page_sz
  sunvnet: don't change gso data on clones
  ...
2015-02-17 17:41:19 -08:00
Tedd Ho-Jeong An a44fecbd52 Bluetooth: Add shutdown callback before closing the device
This callback allows a vendor to send the vendor specific commands
before cloing the hci interface.

Signed-off-by: Tedd Ho-Jeong An <tedd.an@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-15 00:37:52 +01:00
Alexander Aring b976796950 ieee802154: cleanup ieee802154_le64_to_be64
This patch cleanups the ieee802154_le64_to_be64 function. This patch
removes an unnecessary temporary variable.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-14 05:19:58 +01:00
Alexander Aring ba5bf17e83 ieee802154: cleanup ieee802154_be64_to_le64
This patch cleanups the ieee802154_be64_to_le64 function. This patch
removes an unnecessary temporary variable.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-14 05:19:58 +01:00
Vladimir Davydov f48b80a5e2 memcg: cleanup static keys decrement
Move memcg_socket_limit_enabled decrement to tcp_destroy_cgroup (called
from memcg_destroy_kmem -> mem_cgroup_sockets_destroy) and zap a bunch of
wrapper functions.

Although this patch moves static keys decrement from __mem_cgroup_free to
mem_cgroup_css_free, it does not introduce any functional changes, because
the keys are incremented on setting the limit (tcp or kmem), which can
only happen after successful mem_cgroup_css_online.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:10 -08:00
huaibin Wang ac37e2515c xfrm: release dst_orig in case of error in xfrm_lookup()
dst_orig should be released on error. Function like __xfrm_route_forward()
expects that behavior.
Since a recent commit, xfrm_lookup() may also be called by xfrm_lookup_route(),
which expects the opposite.
Let's introduce a new flag (XFRM_LOOKUP_KEEP_DST_REF) to tell what should be
done in case of error.

Fixes: f92ee61982d("xfrm: Generate blackhole routes only from route lookup functions")
Signed-off-by: huaibin Wang <huaibin.wang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-02-12 07:10:56 +01:00
Linus Torvalds 8cc748aa76 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security layer updates from James Morris:
 "Highlights:

   - Smack adds secmark support for Netfilter
   - /proc/keys is now mandatory if CONFIG_KEYS=y
   - TPM gets its own device class
   - Added TPM 2.0 support
   - Smack file hook rework (all Smack users should review this!)"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (64 commits)
  cipso: don't use IPCB() to locate the CIPSO IP option
  SELinux: fix error code in policydb_init()
  selinux: add security in-core xattr support for pstore and debugfs
  selinux: quiet the filesystem labeling behavior message
  selinux: Remove unused function avc_sidcmp()
  ima: /proc/keys is now mandatory
  Smack: Repair netfilter dependency
  X.509: silence asn1 compiler debug output
  X.509: shut up about included cert for silent build
  KEYS: Make /proc/keys unconditional if CONFIG_KEYS=y
  MAINTAINERS: email update
  tpm/tpm_tis: Add missing ifdef CONFIG_ACPI for pnp_acpi_device
  smack: fix possible use after frees in task_security() callers
  smack: Add missing logging in bidirectional UDS connect check
  Smack: secmark support for netfilter
  Smack: Rework file hooks
  tpm: fix format string error in tpm-chip.c
  char/tpm/tpm_crb: fix build error
  smack: Fix a bidirectional UDS connect check typo
  smack: introduce a special case for tmpfs in smack_d_instantiate()
  ...
2015-02-11 20:25:11 -08:00
Tom Herbert 0ace2ca89c vxlan: Use checksum partial with remote checksum offload
Change remote checksum handling to set checksum partial as default
behavior. Added an iflink parameter to configure not using
checksum partial (calling csum_partial to update checksum).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-11 15:12:13 -08:00
Tom Herbert 26c4f7da3e net: Fix remcsum in GRO path to not change packet
Remote checksum offload processing is currently the same for both
the GRO and non-GRO path. When the remote checksum offload option
is encountered, the checksum field referred to is modified in
the packet. So in the GRO case, the packet is modified in the
GRO path and then the operation is skipped when the packet goes
through the normal path based on skb->remcsum_offload. There is
a problem in that the packet may be modified in the GRO path, but
then forwarded off host still containing the remote checksum option.
A remote host will again perform RCO but now the checksum verification
will fail since GRO RCO already modified the checksum.

To fix this, we ensure that GRO restores a packet to it's original
state before returning. In this model, when GRO processes a remote
checksum option it still changes the checksum per the algorithm
but on return from lower layer processing the checksum is restored
to its original value.

In this patch we add define gro_remcsum structure which is passed
to skb_gro_remcsum_process to save offset and delta for the checksum
being changed. After lower layer processing, skb_gro_remcsum_cleanup
is called to restore the checksum before returning from GRO.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-11 15:12:09 -08:00
Paul Moore 04f81f0154 cipso: don't use IPCB() to locate the CIPSO IP option
Using the IPCB() macro to get the IPv4 options is convenient, but
unfortunately NetLabel often needs to examine the CIPSO option outside
of the scope of the IP layer in the stack.  While historically IPCB()
worked above the IP layer, due to the inclusion of the inet_skb_param
struct at the head of the {tcp,udp}_skb_cb structs, recent commit
971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
reordered the tcp_skb_cb struct and invalidated this IPCB() trick.

This patch fixes the problem by creating a new function,
cipso_v4_optptr(), which locates the CIPSO option inside the IP header
without calling IPCB().  Unfortunately, this isn't as fast as a simple
lookup so some additional tweaks were made to limit the use of this
new function.

Cc: <stable@vger.kernel.org> # 3.18
Reported-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
2015-02-11 14:46:37 -05:00
Fan Du b0f9ca53cb ipv4: Namespecify TCP PMTU mechanism
Packetization Layer Path MTU Discovery works separately beside
Path MTU Discovery at IP level, different net namespace has
various requirements on which one to chose, e.g., a virutalized
container instance would require TCP PMTU to probe an usable
effective mtu for underlying tunnel, while the host would
employ classical ICMP based PMTU to function.

Hence making TCP PMTU mechanism per net namespace to decouple
two functionality. Furthermore the probe base MSS should also
be configured separately for each namespace.

Signed-off-by: Fan Du <fan.du@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09 18:45:00 -08:00
David S. Miller 2573beec56 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09 14:35:57 -08:00
Vlad Yasevich 8381eacf5c ipv6: Make __ipv6_select_ident static
Make __ipv6_select_ident() static as it isn't used outside
the file.

Fixes: 0508c07f5e (ipv6: Select fragment id during UFO segmentation if not set.)
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09 14:21:03 -08:00
Moni Shoua 92e584fe44 net/bonding: Fix potential bad memory access during bonding events
When queuing work to send the NETDEV_BONDING_INFO netdev event, it's
possible that when the work is executed, the pointer to the slave
becomes invalid. This can happen if between queuing the event and the
execution of the work, the net-device was un-ensvaled and re-enslaved.

Fix that by queuing a work with the data of the slave instead of the
slave structure.

Fixes: 69e6113343 ('net/bonding: Notify state change on slaves')
Reported-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09 14:03:53 -08:00
Julian Anastasov cd67cd5eb2 ipvs: use 64-bit rates in stats
IPVS stats are limited to 2^(32-10) conns/s and packets/s,
2^(32-5) bytes/s. It is time to use 64 bits:

* Change all conn/packet kernel counters to 64-bit and update
them in u64_stats_update_{begin,end} section

* In kernel use struct ip_vs_kstats instead of the user-space
struct ip_vs_stats_user and use new func ip_vs_export_stats_user
to export it to sockopt users to preserve compatibility with
32-bit values

* Rename cpu counters "ustats" to "cnt"

* To netlink users provide additionally 64-bit stats:
IPVS_SVC_ATTR_STATS64 and IPVS_DEST_ATTR_STATS64. Old stats
remain for old binaries.

* We can use ip_vs_copy_stats in ip_vs_stats_percpu_show

Thanks to Chris Caputo for providing initial patch for ip_vs_est.c

Signed-off-by: Chris Caputo <ccaputo@alt.net>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-02-09 16:59:03 +09:00
Eric Dumazet 567e4b7973 net: rfs: add hash collision detection
Receive Flow Steering is a nice solution but suffers from
hash collisions when a mix of connected and unconnected traffic
is received on the host, when flow hash table is populated.

Also, clearing flow in inet_release() makes RFS not very good
for short lived flows, as many packets can follow close().
(FIN , ACK packets, ...)

This patch extends the information stored into global hash table
to not only include cpu number, but upper part of the hash value.

I use a 32bit value, and dynamically split it in two parts.

For host with less than 64 possible cpus, this gives 6 bits for the
cpu number, and 26 (32-6) bits for the upper part of the hash.

Since hash bucket selection use low order bits of the hash, we have
a full hash match, if /proc/sys/net/core/rps_sock_flow_entries is big
enough.

If the hash found in flow table does not match, we fallback to RPS (if
it is enabled for the rxqueue).

This means that a packet for an non connected flow can avoid the
IPI through a unrelated/victim CPU.

This also means we no longer have to clear the table at socket
close time, and this helps short lived flows performance.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 16:53:57 -08:00
Neal Cardwell a9b2c06dbe tcp: mitigate ACK loops for connections as tcp_request_sock
In the SYN_RECV state, where the TCP connection is represented by
tcp_request_sock, we now rate-limit SYNACKs in response to a client's
retransmitted SYNs: we do not send a SYNACK in response to client SYN
if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms)
since we last sent a SYNACK in response to a client's retransmitted
SYN.

This allows the vast majority of legitimate client connections to
proceed unimpeded, even for the most aggressive platforms, iOS and
MacOS, which actually retransmit SYNs 1-second intervals for several
times in a row. They use SYN RTO timeouts following the progression:
1,1,1,1,1,2,4,8,16,32.

Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 01:03:12 -08:00
Neal Cardwell 032ee42369 tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks
Helpers for mitigating ACK loops by rate-limiting dupacks sent in
response to incoming out-of-window packets.

This patch includes:

- rate-limiting logic
- sysctl to control how often we allow dupacks to out-of-window packets
- SNMP counter for cases where we rate-limited our dupack sending

The rate-limiting logic in this patch decides to not send dupacks in
response to out-of-window segments if (a) they are SYNs or pure ACKs
and (b) the remote endpoint is sending them faster than the configured
rate limit.

We rate-limit our responses rather than blocking them entirely or
resetting the connection, because legitimate connections can rely on
dupacks in response to some out-of-window segments. For example, zero
window probes are typically sent with a sequence number that is below
the current window, and ZWPs thus expect to thus elicit a dupack in
response.

We allow dupacks in response to TCP segments with data, because these
may be spurious retransmissions for which the remote endpoint wants to
receive DSACKs. This is safe because segments with data can't
realistically be part of ACK loops, which by their nature consist of
each side sending pure/data-less ACKs to each other.

The dupack interval is controlled by a new sysctl knob,
tcp_invalid_ratelimit, given in milliseconds, in case an administrator
needs to dial this upward in the face of a high-rate DoS attack. The
name and units are chosen to be analogous to the existing analogous
knob for ICMP, icmp_ratelimit.

The default value for tcp_invalid_ratelimit is 500ms, which allows at
most one such dupack per 500ms. This is chosen to be 2x faster than
the 1-second minimum RTO interval allowed by RFC 6298 (section 2, rule
2.4). We allow the extra 2x factor because network delay variations
can cause packets sent at 1 second intervals to be compressed and
arrive much closer.

Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 01:03:12 -08:00
David S. Miller 3c09e92fb6 NFC: 3.20 second pull request
This is the second NFC pull request for 3.20.
 
 It brings:
 
 - NCI NFCEE (NFC Execution Environment, typically an embedded or
   external secure element) discovery and enabling/disabling support.
   In order to communicate with an NFCEE, we also added NCI's logical
   connections support to the NCI stack.
 
 - HCI over NCI protocol support. Some secure elements only understand
   HCI and thus we need to send them HCI frames when they're part of
   an NCI chipset.
 
 - NFC_EVT_TRANSACTION userspace API addition. Whenever an application
   running on a secure element needs to notify its host counterpart,
   we send an NFC_EVENT_SE_TRANSACTION event to userspace through the
   NFC netlink socket.
 
 - Secure element and HCI transaction event support for the st21nfcb
   chipset.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU06ozAAoJEIqAPN1PVmxKdtkP/2CeQ0+xJnJWgyh4xjkVxfPb
 wPYrz3cHeHdnVX36mnIdRoQZRjxh/Ef/vgK8RsXohcldzHA9D8GjxRpj0GrkVGQ9
 xoDkiHsFDdsZHeezSrlnXNnObvZrkeMsmnUDJxfwLtcX1nzFKzBgW2GaDiNjUazC
 r5Ip4YIfoCaVLq5MSqYRqJ6uhplXd/ePdtPoo54L/E81y4ptQi8pUsQBy4K3GrFn
 FbXoemcymhi9AVBGl+OlppZ93t6bdOV1D5tcOwpytAbfa3jp2oUnJPZay7EoWruR
 aj8l5v3P+SSphAJwaGSbny0L83tTS7sq+N4QG+VxxdMkw2YjpmxgN3AGguNBtPGG
 SUThpZV6QhrkAOnwzP2BJnh68WSU1eJrGvk8zUWW0SanA8k2jaHO/VzXCA3/ZkC4
 IFcXl4Jr6R3iUxDFgS/6MB5E9EGedzFTacsWoHVyZPtaDAQp4WVdAIRQ4QKgJCLw
 1yRmCeVunTsrs6O0aenU1xHinPlnx+tkuiNh4H64APQYTz54WwXH1/S04OL1SkqI
 mTzUvFgWqJvSIOuU19g0HBw+xZ/9vvX8LLkm9kSTbe7tcmFH5CI7ZCN/hNDHvMSd
 sjNYOyag/SAHJ01tTKcSPAgWO49bHWPodI04zP0lK0gMLjKvRknVMdTgIDfu49HN
 2da9E23iBmNNgG79iVHN
 =8caD
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

NFC: 3.20 second pull request

This is the second NFC pull request for 3.20.

It brings:

- NCI NFCEE (NFC Execution Environment, typically an embedded or
  external secure element) discovery and enabling/disabling support.
  In order to communicate with an NFCEE, we also added NCI's logical
  connections support to the NCI stack.

- HCI over NCI protocol support. Some secure elements only understand
  HCI and thus we need to send them HCI frames when they're part of
  an NCI chipset.

- NFC_EVT_TRANSACTION userspace API addition. Whenever an application
  running on a secure element needs to notify its host counterpart,
  we send an NFC_EVENT_SE_TRANSACTION event to userspace through the
  NFC netlink socket.

- Secure element and HCI transaction event support for the st21nfcb
  chipset.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07 22:22:25 -08:00
Erik Kline c58da4c659 net: ipv6: allow explicitly choosing optimistic addresses
RFC 4429 ("Optimistic DAD") states that optimistic addresses
should be treated as deprecated addresses.  From section 2.1:

   Unless noted otherwise, components of the IPv6 protocol stack
   should treat addresses in the Optimistic state equivalently to
   those in the Deprecated state, indicating that the address is
   available for use but should not be used if another suitable
   address is available.

Optimistic addresses are indeed avoided when other addresses are
available (i.e. at source address selection time), but they have
not heretofore been available for things like explicit bind() and
sendmsg() with struct in6_pktinfo, etc.

This change makes optimistic addresses treated more like
deprecated addresses than tentative ones.

Signed-off-by: Erik Kline <ek@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 15:37:41 -08:00
David S. Miller 6e03f896b5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/vxlan.c
	drivers/vhost/net.c
	include/linux/if_vlan.h
	net/core/dev.c

The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.

In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.

In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.

In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 14:33:28 -08:00
Eric Dumazet 677651462c ipv6: fix sparse errors in ip6_make_flowlabel()
include/net/ipv6.h:713:22: warning: incorrect type in assignment (different base types)
include/net/ipv6.h:713:22:    expected restricted __be32 [usertype] hash
include/net/ipv6.h:713:22:    got unsigned int
include/net/ipv6.h:719:25: warning: restricted __be32 degrades to integer
include/net/ipv6.h:719:22: warning: invalid assignment: ^=
include/net/ipv6.h:719:22:    left side has type restricted __be32
include/net/ipv6.h:719:22:    right side has type unsigned int

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 00:42:28 -08:00
Eric Dumazet f4575d3534 flow_keys: n_proto type should be __be16
(struct flow_keys)->n_proto is in network order, use
proper type for this.

Fixes following sparse errors :

net/core/flow_dissector.c:139:39: warning: incorrect type in assignment (different base types)
net/core/flow_dissector.c:139:39:    expected unsigned short [unsigned] [usertype] n_proto
net/core/flow_dissector.c:139:39:    got restricted __be16 [assigned] [usertype] proto
net/core/flow_dissector.c:237:23: warning: incorrect type in assignment (different base types)
net/core/flow_dissector.c:237:23:    expected unsigned short [unsigned] [usertype] n_proto
net/core/flow_dissector.c:237:23:    got restricted __be16 [assigned] [usertype] proto

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: e0f31d8498 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 00:40:22 -08:00
David S. Miller f2683b743f Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
More iov_iter work from Al Viro.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 20:46:55 -08:00
Eric Dumazet 9878196578 tcp: do not pace pure ack packets
When we added pacing to TCP, we decided to let sch_fq take care
of actual pacing.

All TCP had to do was to compute sk->pacing_rate using simple formula:

sk->pacing_rate = 2 * cwnd * mss / rtt

It works well for senders (bulk flows), but not very well for receivers
or even RPC :

cwnd on the receiver can be less than 10, rtt can be around 100ms, so we
can end up pacing ACK packets, slowing down the sender.

Really, only the sender should pace, according to its own logic.

Instead of adding a new bit in skb, or call yet another flow
dissection, we tweak skb->truesize to a small value (2), and
we instruct sch_fq to use new helper and not pace pure ack.

Note this also helps TCP small queue, as ack packets present
in qdisc/NIC do not prevent sending a data packet (RPC workload)

This helps to reduce tx completion overhead, ack packets can use regular
sock_wfree() instead of tcp_wfree() which is a bit more expensive.

This has no impact in the case packets are sent to loopback interface,
as we do not coalesce ack packets (were we would detect skb->truesize
lie)

In case netem (with a delay) is used, skb_orphan_partial() also sets
skb->truesize to 1.

This patch is a combination of two patches we used for about one year at
Google.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 20:36:31 -08:00
Moni Shoua 69e6113343 net/bonding: Notify state change on slaves
Use notifier chain to dispatch an event upon a change in slave state.
Event is dispatched with slave specific info.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:14:24 -08:00
Moni Shoua 69a2338e05 net/bonding: Move slave state changes to a helper function
Move slave state changes to a helper function, this is a pre-step for adding
functionality of dispatching an event when this helper is called.

This commit doesn't add new functionality.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:14:24 -08:00
David S. Miller 940288b6a5 Last round of updates for net-next:
* revert a patch that caused a regression with mesh userspace (Bob)
  * fix a number of suspend/resume related races
    (from Emmanuel, Luca and myself - we'll look at backporting later)
  * add software implementations for new ciphers (Jouni)
  * add a new ACPI ID for Broadcom's rfkill (Mika)
  * allow using netns FD for wireless (Vadim)
  * some other cleanups (various)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJU0NEaAAoJEDBSmw7B7bqr2DAP/3nbk6WB2Lz4Vwi8fh9C4y6X
 4ZzN2v7NHaimC+Wpxg3wP2wCdX2VG3ZWwF3yLm6qgVHGnE35RFLxURen1eqNeW77
 Yf5gvZ066nCGEQ8l8J6YK9vrLX4qp5c4lyE00bbxpZTA4Qq71SgTg+rmGC1be8uX
 vacfaLwfDWffuOOnjBAPfanj7f4AQaUEY2uN1WkBFC7iEeOtPcWVkHAFVIeGjJfQ
 vfgQJcwOjgWjYbwZdQQi7Aj+k1Yzda4pg1yEWn3CkZ8zyeCYGs1gk2ovoBgPgXBP
 yT9ypIdFsN242VJvy7nkFnCKA8mhKyltMQ1Xjs0Q9lAxWdaq9U+iqt5cUn4jxVIG
 T9Vi3PCbx/nVOqcfR81dBTQ3uDU5AyosPsmh2YTxi5lpRBrsjNY2FtaKE0sm2Om3
 wRiSPOdPrXBeEnU0KssI9e6euXgS4JQV78Naq85OWZDd2yZ1fT5U2fi8y4drRGlz
 rucbbobdVhQch5L4FStPz1uW5pNuJrhekXeZIE8MruTNg2A2oBAK3ApO7hxn68sE
 RnbAnkxVLwgedC9042JF5eiS1PDIU46w4e782j/+XskVKMEakqd23iJycx3tmgHZ
 cxDi/qKZ2RCE74YsT61o/9ErSVPvCfNPL3+CVov918jidQQg2WHfKm/jFlIxHIxk
 4wBP7p2VGlENMgw/R8GJ
 =wOaR
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2015-02-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Last round of updates for net-next:
 * revert a patch that caused a regression with mesh userspace (Bob)
 * fix a number of suspend/resume related races
   (from Emmanuel, Luca and myself - we'll look at backporting later)
 * add software implementations for new ciphers (Jouni)
 * add a new ACPI ID for Broadcom's rfkill (Mika)
 * allow using netns FD for wireless (Vadim)
 * some other cleanups (various)

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 14:57:45 -08:00
David S. Miller 45e826fd57 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-02-03

Here's what's likely the last bluetooth-next pull request for 3.20.
Notable changes include:

 - xHCI workaround + a new id for the ath3k driver
 - Several new ids for the btusb driver
 - Support for new Intel Bluetooth controllers
 - Minor cleanups to ieee802154 code
 - Nested sleep warning fix in socket accept() code path
 - Fixes for Out of Band pairing handling
 - Support for LE scan restarting for HCI_QUIRK_STRICT_DUPLICATE_FILTER
 - Improvements to data we expose through debugfs
 - Proper handling of Hardware Error HCI events

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 13:56:37 -08:00
Christophe Ricard 15d4a8da0e NFC: nci: Move logical connection structure allocation
conn_info is currently allocated only after nfcee_discovery_ntf
which is not generic enough for logical connection other than
NFCEE. The corresponding conn_info is now created in
nci_core_conn_create_rsp().

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-04 09:14:09 +01:00
Christophe Ricard 3ba5c8466b NFC: nci: Change credits field to credits_cnt
For consistency sake change nci_core_conn_create_rsp structure
credits field to credits_cnt.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-04 09:13:15 +01:00
Christophe Ricard b16ae7160a NFC: nci: Support all destinations type when creating a connection
The current implementation limits nci_core_conn_create_req()
to only manage NCI_DESTINATION_NFCEE.
Add new parameters to nci_core_conn_create() to support all
destination types described in the NCI specification.
Because there are some parameters with variable size dynamic
buffer allocation is needed.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-04 09:10:50 +01:00
Christophe Ricard 12bdf27d46 NFC: nci: Add reference to the RF logical connection
The NCI_STATIC_RF_CONN_ID logical connection is the most used
connection. Keeping it directly accessible in the nci_dev
structure will simplify and optimize the access.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-04 09:09:53 +01:00
Vlad Yasevich 0508c07f5e ipv6: Select fragment id during UFO segmentation if not set.
If the IPv6 fragment id has not been set and we perform
fragmentation due to UFO, select a new fragment id.
We now consider a fragment id of 0 as unset and if id selection
process returns 0 (after all the pertrubations), we set it to
0x80000000, thus giving us ample space not to create collisions
with the next packet we may have to fragment.

When doing UFO integrity checking, we also select the
fragment id if it has not be set yet.   This is stored into
the skb_shinfo() thus allowing UFO to function correclty.

This patch also removes duplicate fragment id generation code
and moves ipv6_select_ident() into the header as it may be
used during GSO.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-03 23:06:43 -08:00
Al Viro 21226abb4e net: switch memcpy_fromiovec()/memcpy_fromiovecend() users to copy_from_iter()
That takes care of the majority of ->sendmsg() instances - most of them
via memcpy_to_msg() or assorted getfrag() callbacks.  One place where we
still keep memcpy_fromiovecend() is tipc - there we potentially read the
same data over and over; separate patch, that...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04 01:34:15 -05:00
Al Viro 57be5bdad7 ip: convert tcp_sendmsg() to iov_iter primitives
patch is actually smaller than it seems to be - most of it is unindenting
the inner loop body in tcp_sendmsg() itself...

the bit in tcp_input.c is going to get reverted very soon - that's what
memcpy_from_msg() will become, but not in this commit; let's keep it
reasonably contained...

There's one potentially subtle change here: in case of short copy from
userland, mainline tcp_send_syn_data() discards the skb it has allocated
and falls back to normal path, where we'll send as much as possible after
rereading the same data again.  This patch trims SYN+data skb instead -
that way we don't need to copy from the same place twice.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04 01:34:14 -05:00
Al Viro cacdc7d2f9 ip: stash a pointer to msghdr in struct ping_fakehdr
... instead of storing its ->mgs_iter.iov there

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04 01:34:14 -05:00
David S. Miller 3ae55826ae Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for your net tree,
they are:

1) Validate hooks for nf_tables NAT expressions, otherwise users can
   crash the kernel when using them from the wrong hook. We already
   got one user trapped on this when configuring masquerading.

2) Fix a BUG splat in nf_tables with CONFIG_DEBUG_PREEMPT=y. Reported
   by Andreas Schultz.

3) Avoid unnecessary reroute of traffic in the local input path
   in IPVS that triggers a crash in in xfrm. Reported by Florian
   Wiessner and fixes by Julian Anastasov.

4) Fix memory and module refcount leak from the error path of
   nf_tables_newchain().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 19:30:53 -08:00
Vlad Yasevich 6422398c2a ipv6: introduce ipv6_make_skb
This commit is very similar to
commit 1c32c5ad6f
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Tue Mar 1 02:36:47 2011 +0000

    inet: Add ip_make_skb and ip_finish_skb

It adds IPv6 version of the helpers ip6_make_skb and ip6_finish_skb.

The job of ip6_make_skb is to collect messages into an ipv6 packet
and poplulate ipv6 eader.  The job of ip6_finish_skb is to transmit
the generated skb.  Together they replicated the job of
ip6_push_pending_frames() while also provide the capability to be
called independently.  This will be needed to add lockless UDP sendmsg
support.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 19:28:04 -08:00
Willem de Bruijn b245be1f4d net-timestamp: no-payload only sysctl
Tx timestamps are looped onto the error queue on top of an skb. This
mechanism leaks packet headers to processes unless the no-payload
options SOF_TIMESTAMPING_OPT_TSONLY is set.

Add a sysctl that optionally drops looped timestamp with data. This
only affects processes without CAP_NET_RAW.

The policy is checked when timestamps are generated in the stack.
It is possible for timestamps with data to be reported after the
sysctl is set, if these were queued internally earlier.

No vulnerability is immediately known that exploits knowledge
gleaned from packet headers, but it may still be preferable to allow
administrators to lock down this path at the cost of possible
breakage of legacy applications.

Signed-off-by: Willem de Bruijn <willemb@google.com>

----

Changes
  (v1 -> v2)
  - test socket CAP_NET_RAW instead of capable(CAP_NET_RAW)
  (rfc -> v1)
  - document the sysctl in Documentation/sysctl/net.txt
  - fix access control race: read .._OPT_TSONLY only once,
        use same value for permission check and skb generation.
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 18:46:51 -08:00
Christophe Ricard a41bb8448e NFC: nci: Add RF NFCEE action notification support
The NFCC sends an NCI_OP_RF_NFCEE_ACTION_NTF notification
to the host (DH) to let it know that for example an RF
transaction with a payment reader is done.
For now the notification handler is empty.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:41 +01:00
Christophe Ricard 447b27c4f2 NFC: Forward NFC_EVT_TRANSACTION to user space
NFC_EVT_TRANSACTION is sent through netlink in order for a
specific application running on a secure element to notify
userspace of an event. Typically the secure element application
counterpart on the host could interpret that event and act
upon it.

Forwarded information contains:
- SE host generating the event
- Application IDentifier doing the operation
- Applications parameters

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:40 +01:00
Christophe Ricard 11f54f2286 NFC: nci: Add HCI over NCI protocol support
According to the NCI specification, one can use HCI over NCI
to talk with specific NFCEE. The HCI network is viewed as one
logical NFCEE.
This is needed to support secure element running HCI only
firmwares embedded on an NCI capable chipset, like e.g. the
st21nfcb.
There is some duplication between this piece of code and the
HCI core code, but the latter would need to be abstracted even
more to be able to use NCI as a logical transport for HCP packets.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:40 +01:00
Christophe Ricard 736bb95774 NFC: nci: Support logical connections management
In order to communicate with an NFCEE, we need to open a logical
connection to it, by sending the NCI_OP_CORE_CONN_CREATE_CMD
command to the NFCC. It's left up to the drivers to decide when
to close an already opened logical connection.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:39 +01:00