Commit graph

7177 commits

Author SHA1 Message Date
Marcel Holtmann eb9a8f3fb6 Bluetooth: Track Secure Connections support of remote devices
It is important to know if Secure Connections support has been enabled
for a given remote device. The information is provided in the remote
host features page. So track this information and provide a simple
helper function to extract the status.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:35 +02:00
Marcel Holtmann ec1091131f Bluetooth: Add support for remote OOB input of P-256 data
The current management interface only allows to provide the remote
OOB input of P-192 data. This extends the command to also accept
P-256 data as well. To make this backwards compatible, the userspace
can decide to only provide P-192 data or the combined P-192 and P-256
data. It is also allowed to leave the P-192 data empty if userspace
only has the remote P-256 data.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:34 +02:00
Marcel Holtmann 0798872ef1 Bluetooth: Add internal function for storing P-192 and P-256 data
Add function to allow adding P-192 and P-256 data to the internal
storage. This also fixes a few coding style issues from the previous
helper functions for the out-of-band credentials storage.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:34 +02:00
Marcel Holtmann 519ca9d017 Bluetooth: Provide remote OOB data for Secure Connections
When Secure Connections has been enabled it is possible to provide P-192
and/or P-256 data during the pairing process. The internal out-of-band
credentials storage has been extended to also hold P-256 data.

Initially the P-256 data will be empty and with Secure Connections enabled
no P-256 data will be provided. This is according to the specification
since it might be possible that the remote side did not provide either
of the out-of-band credentials.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:33 +02:00
Marcel Holtmann 5afeac149e Bluetooth: Add debugfs quirk for forcing Secure Connections support
The Bluetooth 4.1 specification with Secure Connections support has
just been released and controllers with this feature are still in
an early stage.

A handful of controllers have already support for it, but they do
not always identify this feature correctly. This debugfs entry
allows to tell the kernel that the controller can be treated as
it would fully support Secure Connections.

Using debugfs to force Secure Connections support of course does
not make this feature magically appear in all controllers. This
is a debug functionality for early adopters. Once the majority
of controllers matures this quirk will be removed.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:33 +02:00
Marcel Holtmann 4d2d279626 Bluetooth: Add support for local OOB data with Secure Connections
For Secure Connections support and the usage of out-of-band pairing,
it is needed to read the P-256 hash and randomizer or P-192 hash and
randomizer. This change will read P-192 data when Secure Connections
is disabled and P-192 and P-256 data when it is enabled.

The difference is between using HCI Read Local OOB Data and using the
new HCI Read Local OOB Extended Data command. The first one has been
introduced with Bluetooth 2.1 and returns only the P-192 data.

< HCI Command: Read Local OOB Data (0x03|0x0057) plen 0
> HCI Event: Command Complete (0x0e) plen 36
      Read Local OOB Data (0x03|0x0057) ncmd 1
        Status: Success (0x00)
        Hash C from P-192: 975a59baa1c4eee391477cb410b23e6d
        Randomizer R with P-192: 9ee63b7dec411d3b467c5ae446df7f7d

The second command has been introduced with Bluetooth 4.1 and will
return P-192 and P-256 data.

< HCI Command: Read Local OOB Extended Data (0x03|0x007d) plen 0
> HCI Event: Command Complete (0x0e) plen 68
      Read Local OOB Extended Data (0x03|0x007d) ncmd 1
        Status: Success (0x00)
        Hash C from P-192: 6489731804b156fa6355efb8124a1389
        Randomizer R with P-192: 4781d5352fb215b2958222b3937b6026
        Hash C from P-256: 69ef8a928b9d07fc149e630e74ecb991
        Randomizer R with P-256: 4781d5352fb215b2958222b3937b6026

The change for the management interface is transparent and no change
is required for existing userspace. The Secure Connections feature
needs to be manually enabled. When it is disabled, then userspace
only gets the P-192 returned and with Secure Connections enabled,
userspace gets P-192 and P-256 in an extended structure.

It is also acceptable to just ignore the P-256 data since it is not
required to support them. The pairing with out-of-band credentials
will still succeed. However then of course no Secure Connection will
b established.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:33 +02:00
Marcel Holtmann eac83dc632 Bluetooth: Add management command for enabling Secure Connections
The support for Secure Connections need to be explicitly enabled by
userspace. This is required since only userspace that can handle the
new link key types should enable support for Secure Connections.

This command handling is similar to how Secure Simple Pairing enabling
is done. It also tracks the case when Secure Connections support is
enabled via raw HCI commands. This makes sure that the host features
page is updated as well.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:32 +02:00
Marcel Holtmann e98d2ce293 Bluetooth: Add flags and setting for Secure Connections support
The MGMT_SETTING_SECURE_CONN setting is used to track the support and
status for Secure Connections from the management interface. For HCI
based tracking HCI_SC_ENABLED flag is used.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:32 +02:00
Marcel Holtmann 11015c7903 Bluetooth: Add definitions for new link key types
With the introduction of Secure Connections, the list of link key types
got extended by P-256 versions of authenticated and unauthenticated
link keys.

To avoid any confusion the previous authenticated and unauthenticated
link key types got ammended with a P912 postfix. And the two new keys
have a P256 postfix now. Existing code using the previous definitions
has been adjusted.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:31 +02:00
Marcel Holtmann e2f9913157 Bluetooth: Add HCI command definition for extended OOB data
The Secure Connections feature introduces the support for P-256 strength
pairings (compared to P-192 with Secure Simple Pairing). This however
means that for out-of-band pairing the hash and randomizer needs to be
differentiated. Two new commands are introduced to handle the possible
combinations of P-192 and P-256. This add the HCI command definition
for both.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:31 +02:00
Marcel Holtmann eb4b95c627 Bluetooth: Add HCI command definition for Secure Connections enabling
The Secure Connections feature is optional and host stacks have to
manually enable it. This add the HCI command definiton for reading
and writing this setting.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:31 +02:00
Marcel Holtmann d5991585d0 Bluetooth: Add LMP feature definitions for Secure Connections support
The support for Secure Connections introduces two new controller
features and one new host feature.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-02-13 09:51:31 +02:00
Horia Geanta 0f24558e91 xfrm: avoid creating temporary SA when there are no listeners
In the case when KMs have no listeners, km_query() will fail and
temporary SAs are garbage collected immediately after their allocation.
This causes strain on memory allocation, leading even to OOM since
temporary SA alloc/free cycle is performed for every packet
and garbage collection does not keep up the pace.

The sane thing to do is to make sure we have audience before
temporary SA allocation.

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-02-13 07:40:30 +01:00
WANG Cong 55334a5db5 net_sched: act: refuse to remove bound action outside
When an action is bonnd to a filter, there is no point to
remove it outside. Currently we just silently decrease the refcnt,
we should reject this explicitly with EPERM.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 19:23:32 -05:00
WANG Cong 4f1e9d8949 net_sched: act: move tcf_hashinfo_init() into tcf_register_action()
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 19:23:32 -05:00
WANG Cong a5b5c958ff net_sched: act: refactor cleanup ops
For bindcnt and refcnt etc., they are common for all actions,
not need to repeat such operations for their own, they can be unified
now. Actions just need to do its specific cleanup if needed.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 19:23:32 -05:00
WANG Cong 86062033fe net_sched: act: hide struct tcf_common from API
Now we can totally hide it from modules. tcf_hash_*() API's
will operate on struct tc_action, modules don't need to care about
the details.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 19:23:32 -05:00
Johannes Berg 06d181a8fd mac80211: add NAPI support back
NAPI was originally added to mac80211 a long time ago (by John in
commit 4e6cbfd09c in July 2010), but then removed years later
(by Stanislaw in commit 30c97120c6 in February 2013). No driver
ever used it, so that was fine.

Now I'm adding support for NAPI to our driver, so add some code
to mac80211 again  to support NAPI. John was originally wrapping
some (but not nearly all NAPI-related functions), but that doesn't
scale very well with the number of functions that are there, some
of which are even only inlines. Thus, instead of doing that, let
the drivers manage the NAPI struct, except for napi_add() which is
needed so mac80211 knows how to call napi_gro_receive().

Also remove some no longer needed definitions that were left when
NAPI support was removed.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Eyal Shapira <eyal@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-12 21:37:27 +01:00
Fan Du ca925cf153 flowcache: Make flow cache name space aware
Inserting a entry into flowcache, or flushing flowcache should be based
on per net scope. The reason to do so is flushing operation from fat
netns crammed with flow entries will also making the slim netns with only
a few flow cache entries go away in original implementation.

Since flowcache is tightly coupled with IPsec, so it would be easier to
put flow cache global parameters into xfrm namespace part. And one last
thing needs to do is bumping flow cache genid, and flush flow cache should
also be made in per net style.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-02-12 07:02:11 +01:00
Eliad Peller 448cd2e248 mac80211: reset probe_send_count also in HW_CONNECTION_MONITOR case
In case of beacon_loss with IEEE80211_HW_CONNECTION_MONITOR
device, mac80211 probes the ap (and disconnects on timeout)
but ignores the ack.

If we already got an ack, there's no reason to continue
disconnecting. this can help devices that supports
IEEE80211_HW_CONNECTION_MONITOR only partially (e.g. take
care of keep alives, but does not probe the ap.

In case the device wants to disconnect without probing,
it can just call ieee80211_connection_loss.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-11 12:58:32 +01:00
Rashika Kheria 535d3ae9c8 net: Move prototype declaration to header file include/net/net_namespace.h from net/ipx/af_ipx.c
Move prototype declaration of function to header file
include/net/net_namespace.h from net/ipx/af_ipx.c because they are used
by more than one file.

This eliminates the following warning in net/ipx/sysctl_net_ipx.c:
net/ipx/sysctl_net_ipx.c:33:6: warning: no previous prototype for ‘ipx_register_sysctl’ [-Wmissing-prototypes]
net/ipx/sysctl_net_ipx.c:38:6: warning: no previous prototype for ‘ipx_unregister_sysctl’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:50 -08:00
Rashika Kheria 7780d8ae4a net: Move prototype declaration to header file include/net/datalink.h from net/ipx/af_ipx.c
Move prototype declarations of function to header file
include/net/datalink.h from net/ipx/af_ipx.c because they are used by
more than one file.

This eliminates the following warning in net/ipx/pe2.c:
net/ipx/pe2.c:20:24: warning: no previous prototype for ‘make_EII_client’ [-Wmissing-prototypes]
net/ipx/pe2.c:32:6: warning: no previous prototype for ‘destroy_EII_client’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:50 -08:00
Rashika Kheria 578efbc19f net: Move prototype declaration to header file include/net/ipx.h from net/ipx/af_ipx.c
Move prototype declaration of functions to header file include/net/ipx.h
from net/ipx/af_ipx.c because they are used by more than one file.

This eliminates the following warning in
net/ipx/ipx_route.c:33:19: warning: no previous prototype for ‘ipxrtr_lookup’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:52:5: warning: no previous prototype for ‘ipxrtr_add_route’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:94:6: warning: no previous prototype for ‘ipxrtr_del_routes’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:149:5: warning: no previous prototype for ‘ipxrtr_route_skb’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:171:5: warning: no previous prototype for ‘ipxrtr_route_packet’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:261:5: warning: no previous prototype for ‘ipxrtr_ioctl’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:50 -08:00
Rashika Kheria 493cc5e5ba net: Move prototype declaration to include/net/ipx.h from net/ipx/ipx_route.c
Move prototype definition of function to header file include/net/ipx.h
from net/ipx/ipx_route.c because they are used by more than one file.

This eliminates the following warning from net/ipx/af_ipx.c:
net/ipx/af_ipx.c:193:23: warning: no previous prototype for ‘ipxitf_find_using_net’ [-Wmissing-prototypes]
net/ipx/af_ipx.c:577:5: warning: no previous prototype for ‘ipxitf_send’ [-Wmissing-prototypes]
net/ipx/af_ipx.c:1219:8: warning: no previous prototype for ‘ipx_cksum’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:49 -08:00
Rashika Kheria ab3301bd96 net: Move prototype declaration to header file include/net/dn.h from net/decnet/af_decnet.c
Move prototype declaration of functions to header file include/net/dn.h
from net/decnet/af_decnet.c because they are used by more than one file.

This eliminates the following warning in net/decnet/af_decnet.c:
net/decnet/sysctl_net_decnet.c:354:6: warning: no previous prototype for ‘dn_register_sysctl’ [-Wmissing-prototypes]
net/decnet/sysctl_net_decnet.c:359:6: warning: no previous prototype for ‘dn_unregister_sysctl’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:49 -08:00
Rashika Kheria f56b8bf6e4 net: Move prototype declaration to appropriate header file from decnet/af_decnet.c
Move prototype declaration of functions to header file include/net/dn_route.h
from net/decnet/af_decnet.c because it is used by more than one file.

This eliminates the following warning in net/decnet/dn_route.c:
net/decnet/dn_route.c:629:5: warning: no previous prototype for ‘dn_route_rcv’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 17:32:49 -08:00
David S. Miller f41f031960 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/nftables/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes, mostly nftables
fixes, most relevantly they are:

* Fix a crash in the h323 conntrack NAT helper due to expectation list
  corruption, from Alexey Dobriyan.

* A couple of RCU race fixes for conntrack, one manifests by hitting BUG_ON
  in nf_nat_setup_info() and the destroy path, patches from Andrey Vagin and
  me.

* Dump direction attribute in nft_ct only if it is set, from Arturo
  Borrero.

* Fix IPVS bug in its own connection tracking system that may lead to
  copying only 4 bytes of the IPv6 address when initializing the
  ip_vs_conn object, from Michal Kubecek.

* Fix -EBUSY errors in nftables when deleting the rules, chain and tables
  in a row due mixture of asynchronous and synchronous object releasing,
  from me.

* Three fixes for the nf_tables set infrastructure when using intervals and
  mappings, from me.

* Four patches to fixing the nf_tables log, reject and ct expressions from
  the new inet table, from Patrick McHardy.

* Fix memory overrun in the map that is used to dynamically allocate names
  from anonymous sets, also from Patrick.

* Fix a potential oops if you dump a set with NFPROTO_UNSPEC and a table
  name, from Patrick McHardy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-09 14:20:00 -08:00
Tejun Heo 073219e995 cgroup: clean up cgroup_subsys names and initialization
cgroup_subsys is a bit messier than it needs to be.

* The name of a subsys can be different from its internal identifier
  defined in cgroup_subsys.h.  Most subsystems use the matching name
  but three - cpu, memory and perf_event - use different ones.

* cgroup_subsys_id enums are postfixed with _subsys_id and each
  cgroup_subsys is postfixed with _subsys.  cgroup.h is widely
  included throughout various subsystems, it doesn't and shouldn't
  have claim on such generic names which don't have any qualifier
  indicating that they belong to cgroup.

* cgroup_subsys->subsys_id should always equal the matching
  cgroup_subsys_id enum; however, we require each controller to
  initialize it and then BUG if they don't match, which is a bit
  silly.

This patch cleans up cgroup_subsys names and initialization by doing
the followings.

* cgroup_subsys_id enums are now postfixed with _cgrp_id, and each
  cgroup_subsys with _cgrp_subsys.

* With the above, renaming subsys identifiers to match the userland
  visible names doesn't cause any naming conflicts.  All non-matching
  identifiers are renamed to match the official names.

  cpu_cgroup -> cpu
  mem_cgroup -> memory
  perf -> perf_event

* controllers no longer need to initialize ->subsys_id and ->name.
  They're generated in cgroup core and set automatically during boot.

* Redundant cgroup_subsys declarations removed.

* While updating BUG_ON()s in cgroup_init_early(), convert them to
  WARN()s.  BUGging that early during boot is stupid - the kernel
  can't print anything, even through serial console and the trap
  handler doesn't even link stack frame properly for back-tracing.

This patch doesn't introduce any behavior changes.

v2: Rebased on top of fe1217c4f3 ("net: net_cls: move cgroupfs
    classid handling into core").

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
2014-02-08 10:36:58 -05:00
Tejun Heo af6363374c cgroup: make CONFIG_CGROUP_NET_PRIO bool and drop unnecessary init_netclassid_cgroup()
net_prio is the only cgroup which is allowed to be built as a module.
The savings from allowing one controller to be built as a module are
tiny especially given that cgroup module support itself adds quite a
bit of complexity.

Given that none of other controllers has much chance of being made a
module and that we're unlikely to add new modular controllers, the
added complexity is simply not justifiable.

As a first step to drop cgroup module support, this patch changes the
config option to bool from tristate and drops module related code from
it.

Also, while an earlier commit fe1217c4f3 ("net: net_cls: move
cgroupfs classid handling into core") dropped module support from
net_cls cgroup, it retained a call to cgroup_load_subsys(), which is
noop for built-in controllers.  Drop it along with
init_netclassid_cgroup().

v2: Removed modular version of task_netprioidx() in
    include/net/netprio_cgroup.h as suggested by Li Zefan.

v3: Rebased on top of fe1217c4f3 ("net: net_cls: move cgroupfs
    classid handling into core").  net_cls cgroup part is mostly
    dropped except for removal of init_netclassid_cgroup().

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Thomas Graf <tgraf@suug.ch>
2014-02-08 10:36:58 -05:00
Pablo Neira Ayuso 0165d9325d netfilter: nf_tables: fix racy rule deletion
We may lost race if we flush the rule-set (which happens asynchronously
via call_rcu) and we try to remove the table (that userspace assumes
to be empty).

Fix this by recovering synchronous rule and chain deletion. This was
introduced time ago before we had no batch support, and synchronous
rule deletion performance was not good. Now that we have the batch
support, we can just postpone the purge of old rule in a second step
in the commit phase. All object deletions are synchronous after this
patch.

As a side effect, we save memory as we don't need rcu_head per rule
anymore.

Cc: Patrick McHardy <kaber@trash.net>
Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-06 11:46:06 +01:00
Patrick McHardy 05513e9e33 netfilter: nf_tables: add reject module for NFPROTO_INET
Add a reject module for NFPROTO_INET. It does nothing but dispatch
to the AF-specific modules based on the hook family.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-06 09:44:18 +01:00
Patrick McHardy cc4723ca31 netfilter: nft_reject: split up reject module into IPv4 and IPv6 specifc parts
Currently the nft_reject module depends on symbols from ipv6. This is
wrong since no generic module should force IPv6 support to be loaded.
Split up the module into AF-specific and a generic part.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-06 09:44:10 +01:00
Emmanuel Grumbach 63c361f511 mac80211: propagate STBC / LDPC flags to radiotap
This capabilities weren't propagated to the radiotap header.
We don't set here the VHT_KNOWN / MCS_HAVE flag because not
all the low level drivers will know how to properly flag
the frames, hence the low level driver will be in charge
of setting IEEE80211_RADIOTAP_MCS_HAVE_FEC,
IEEE80211_RADIOTAP_MCS_HAVE_STBC and / or
IEEE80211_RADIOTAP_VHT_KNOWN_STBC according to its
capabilities.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-06 09:34:58 +01:00
Emmanuel Grumbach 1b8d242adb mac80211: move VHT related RX_FLAG to another variable
ieee80211_rx_status.flags is full. Define a new vht_flag
variable to be able to set more VHT related flags and make
room in flags.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Acked-by: Kalle Valo <kvalo@qca.qualcomm.com> [ath10k]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-06 09:34:10 +01:00
Emmanuel Grumbach 0059b2b142 mac80211: remove unused radiotap vendor fields in ieee80211_rx_status
The purpose of this housekeeping is to make some room for
VHT flags. The radiotap vendor fields weren't in use.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-06 09:33:46 +01:00
Patrick McHardy 64d46806b6 netfilter: nf_tables: add AF specific expression support
For the reject module, we need to add AF-specific implementations to
get rid of incorrect module dependencies. Try to load an AF-specific
module first and fall back to generic modules.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-06 00:05:36 +01:00
Pablo Neira Ayuso e53376bef2 netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt
With this patch, the conntrack refcount is initially set to zero and
it is bumped once it is added to any of the list, so we fulfill
Eric's golden rule which is that all released objects always have a
refcount that equals zero.

Andrey Vagin reports that nf_conntrack_free can't be called for a
conntrack with non-zero ref-counter, because it can race with
nf_conntrack_find_get().

A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
ref-counter says that this conntrack is used. So when we release
a conntrack with non-zero counter, we break this assumption.

CPU1                                    CPU2
____nf_conntrack_find()
                                        nf_ct_put()
                                         destroy_conntrack()
                                        ...
                                        init_conntrack
                                         __nf_conntrack_alloc (set use = 1)
atomic_inc_not_zero(&ct->use) (use = 2)
                                         if (!l4proto->new(ct, skb, dataoff, timeouts))
                                          nf_conntrack_free(ct); (use = 2 !!!)
                                        ...
                                        __nf_conntrack_alloc (set use = 1)
 if (!nf_ct_key_equal(h, tuple, zone))
  nf_ct_put(ct); (use = 0)
   destroy_conntrack()
                                        /* continue to work with CT */

After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
race in nf_conntrack_find_get" another bug was triggered in
destroy_conntrack():

<4>[67096.759334] ------------[ cut here ]------------
<2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
...
<4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G         C ---------------    2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
<4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>]  [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
<4>[67096.760255] Call Trace:
<4>[67096.760255]  [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30
<4>[67096.760255]  [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
<4>[67096.760255]  [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
<4>[67096.760255]  [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
<4>[67096.760255]  [<ffffffff81484419>] nf_iterate+0x69/0xb0
<4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255]  [<ffffffff814845d4>] nf_hook_slow+0x74/0x110
<4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255]  [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910
<4>[67096.760255]  [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0
<4>[67096.760255]  [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140
<4>[67096.760255]  [<ffffffff81444f97>] sock_sendmsg+0x117/0x140
<4>[67096.760255]  [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60
<4>[67096.760255]  [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20
<4>[67096.760255]  [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40
<4>[67096.760255]  [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff814457c9>] sys_sendto+0x139/0x190
<4>[67096.760255]  [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200
<4>[67096.760255]  [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290
<4>[67096.760255]  [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210
<4>[67096.760255]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5

I have reused the original title for the RFC patch that Andrey posted and
most of the original patch description.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Andrew Vagin <avagin@parallels.com>
Cc: Florian Westphal <fw@strlen.de>
Reported-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-02-05 17:46:06 +01:00
Max Filippov a13aff0641 net: ethoc: set up MII management bus clock
MII management bus clock is derived from the MAC clock by dividing it by
MIIMODER register CLKDIV field value. This value may need to be set up
in case it is undefined or its default value is too high (and
communication with PHY is too slow) or too low (and communication with
PHY is impossible). The value of CLKDIV is not specified directly, but
is derived from the MAC clock for the default MII management bus frequency
of 2.5MHz. The MAC clock may be specified in the platform data, or in
the 'clocks' device tree attribute.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-04 20:19:51 -08:00
Michal Kazior 9e0e29615a cfg80211: consider existing DFS interfaces
It was possible to break interface combinations in
the following way:

 combo 1: iftype = AP, num_ifaces = 2, num_chans = 2,
 combo 2: iftype = AP, num_ifaces = 1, num_chans = 1, radar = HT20

With the above interface combinations it was
possible to:

 step 1. start AP on DFS channel by matching combo 2
 step 2. start AP on non-DFS channel by matching combo 1

This was possible beacuse (step 2) did not consider
if other interfaces require radar detection.

The patch changes how cfg80211 tracks channels -
instead of channel itself now a complete chandef
is stored.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:58:17 +01:00
Antonio Quartulli fe94f3a4ff cfg80211: fix channel configuration in IBSS join
When receiving an IBSS_JOINED event select the BSS object
based on the {bssid, channel} couple rather than the bssid
only.
With the current approach if another cell having the same
BSSID (but using a different channel) exists then cfg80211
picks up the wrong BSS object.
The result is a mismatching channel configuration between
cfg80211 and the driver, that can lead to any sort of
problem.

The issue can be triggered by having an IBSS sitting on
given channel and then asking the driver to create a new
cell using the same BSSID but with a different frequency.
By passing the channel to cfg80211_get_bss() we can solve
this ambiguity and retrieve/create the correct BSS object.
All the users of cfg80211_ibss_joined() have been changed
accordingly.

Moreover WARN when cfg80211_ibss_joined() gets a NULL
channel as argument and remove a bogus call of the same
function in ath6kl (it does not make sense to call
cfg80211_ibss_joined() with a zero BSSID on ibss-leave).

Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Arend van Spriel <arend@broadcom.com>
Cc: Bing Zhao <bzhao@marvell.com>
Cc: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Cc: libertas-dev@lists.infradead.org
Acked-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
[minor code cleanup in ath6kl]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:58:16 +01:00
Johannes Berg ea73cbce4e nl80211: fix scheduled scan RSSI matchset attribute confusion
The scheduled scan matchsets were intended to be a list of filters,
with the found BSS having to pass at least one of them to be passed
to the host. When the RSSI attribute was added, however, this was
broken and currently wpa_supplicant adds that attribute in its own
matchset; however, it doesn't intend that to mean that anything
that passes the RSSI filter should be passed to the host, instead
it wants it to mean that everything needs to also have higher RSSI.

This is semantically problematic because we have a list of filters
like [ SSID1, SSID2, SSID3, RSSI ] with no real indication which
one should be OR'ed and which one AND'ed.

To fix this, move the RSSI filter attribute into each matchset. As
we need to stay backward compatible, treat a matchset with only the
RSSI attribute as a "default RSSI filter" for all other matchsets,
but only if there are other matchsets (an RSSI-only matchset by
itself is still desirable.)

To make driver implementation easier, keep a global min_rssi_thold
for the entire request as well. The only affected driver is ath6kl.

I found this when I looked into the code after Raja Mani submitted
a patch fixing the n_match_sets calculation to disregard the RSSI,
but that patch didn't address the semantic issue.

Reported-by: Raja Mani <rmani@qti.qualcomm.com>
Acked-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:58:12 +01:00
Johannes Berg cc01f9b55f mac80211: remove module handling from rate control ops
There's not a single rate control algorithm actually in
a separate module where the module refcount would be
required. Similarly, there's no specific rate control
module.

Therefore, all the module handling code in rate control
is really just dead code, so remove it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:48:26 +01:00
Johannes Berg 631ad703ba mac80211: make rate control ops const
Change the code to allow making all the rate control ops
const, nothing ever needs to change them. Also change all
drivers to make use of this and mark the ops const.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:48:21 +01:00
Janusz Dziedzic 0b9323f600 nl80211: add Guard Interval support for set_bitrate_mask
Allow to force SGI, LGI.
Mainly for test purpose.

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:48:11 +01:00
Johannes Berg 4b5800fec6 cfg80211: make connect ie param const
This required liberally sprinkling 'const' over brcmfmac
and mwifiex but seems like a useful thing to do since the
pointer can't really be written.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:48:10 +01:00
Jouni Malinen 664834dee6 cfg80211: Clean up connect params and channel fetching
Addition of the frequency hints showed up couple of places in cfg80211
where pointers could be marked const and a shared function could be used
to fetch a valid channel.

Signed-off-by: Jouni Malinen <j@w1.fi>
[fix mwifiex]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:48:09 +01:00
Jouni Malinen b43504cf75 cfg80211: Advertise maximum associated STAs in AP mode
This allows drivers to advertise the maximum number of associated
stations they support in AP mode (including P2P GO). User space
applications can use this for cleaner way of handling the limit (e.g.,
hostapd rejecting IEEE 802.11 authentication without manual
configuration of the limit) or to figure out what type of use cases can
be executed with multiple devices before trying and failing.

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:48:08 +01:00
Jouni Malinen 1df4a51082 cfg80211: Allow BSS hint to be provided for connect
This clarifies the expected driver behavior on the older
NL80211_ATTR_MAC and NL80211_ATTR_WIPHY_FREQ attributes and adds a new
set of similar attributes with _HINT postfix to enable use of a
recommendation of the initial BSS to choose. This can be helpful for
some drivers that can avoid an additional full scan on connection
request if the information is provided to them (user space tools like
wpa_supplicant already has that information available based on earlier
scans).

In addition, this can be used to get more expected behavior for cases
where a specific BSS should be picked first based on operations like
Interworking network selection or WPS. These cases were already easily
addressed with drivers that leave BSS selection to user space, but there
was no convenient way to do this with drivers that take care of BSS
selection internally without using the NL80211_ATTR_MAC which is not
really desired since it is needed for other purposes to force the
association to remain with the same BSS.

Signed-off-by: Jouni Malinen <j@w1.fi>
[add const, fix policy]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:48:07 +01:00
Luciano Coelho 66e01cf99e mac80211: only set CSA beacon when at least one beacon must be transmitted
A beacon should never have a Channel Switch Announcement information
element with a count of 0, because a count of 1 means switch just
before the next beacon.  So, if a count of 0 was valid in a beacon, it
would have been transmitted in the next channel already, which is
useless.  A CSA count equal to zero is only meaningful in action
frames or probe_responses.

Fix the ieee80211_csa_is_complete() and ieee80211_update_csa()
functions accordingly.

With a CSA count of 0, we won't transmit any CSA beacons, because the
switch will happen before the next TBTT.  To avoid extra work and
potential confusion in the drivers, complete the CSA immediately,
instead of waiting for the driver to call ieee80211_csa_finish().

To keep things simpler, we also switch immediately when the CSA count
is 1, while in theory we should delay the switch until just before the
next TBTT.

Additionally, move the ieee80211_csa_finish() function to cfg.c,
where it makes more sense.

Tested-by: Simon Wunderlich <sw@simonwunderlich.de>
Acked-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-04 21:48:06 +01:00
Linus Torvalds 4ba9920e5e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) BPF debugger and asm tool by Daniel Borkmann.

 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

 3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

 4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match.  From Ben Hutchings.

 5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

 6) Implement auto corking in TCP, from Eric Dumazet.  Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

 9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

10) Fix ipv6 router reachability probing, from Jiri Benc.

11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

12) Support tunneling in GRO layer, from Jerry Chu.

13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI.  From Atzm Watanabe.

15) New "Heavy Hitter" qdisc, from Terry Lam.

16) Significantly improve the IPSEC support in pktgen, from Fan Du.

17) Allow ipv4 tunnels to cache routes, just like sockets.  From Tom
    Herbert.

18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

20) Key TCP metrics blobs also by source address, not just destination
    address.  From Christoph Paasch.

21) Support 10G in generic phylib.  From Andy Fleming.

22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided.  From Tom Herbert.

The wireless and netfilter folks have been busy little bees too.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
  net/cxgb4: Fix referencing freed adapter
  ipv6: reallocate addrconf router for ipv6 address when lo device up
  fib_frontend: fix possible NULL pointer dereference
  rtnetlink: remove IFLA_BOND_SLAVE definition
  rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
  qlcnic: update version to 5.3.55
  qlcnic: Enhance logic to calculate msix vectors.
  qlcnic: Refactor interrupt coalescing code for all adapters.
  qlcnic: Update poll controller code path
  qlcnic: Interrupt code cleanup
  qlcnic: Enhance Tx timeout debugging.
  qlcnic: Use bool for rx_mac_learn.
  bonding: fix u64 division
  rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
  sfc: Use the correct maximum TX DMA ring size for SFC9100
  Add Shradha Shah as the sfc driver maintainer.
  net/vxlan: Share RX skb de-marking and checksum checks with ovs
  tulip: cleanup by using ARRAY_SIZE()
  ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
  net/cxgb4: Don't retrieve stats during recovery
  ...
2014-01-25 11:17:34 -08:00
Linus Torvalds 6dd9158ae8 Merge git://git.infradead.org/users/eparis/audit
Pull audit update from Eric Paris:
 "Again we stayed pretty well contained inside the audit system.
  Venturing out was fixing a couple of function prototypes which were
  inconsistent (didn't hurt anything, but we used the same value as an
  int, uint, u32, and I think even a long in a couple of places).

  We also made a couple of minor changes to when a couple of LSMs called
  the audit system.  We hoped to add aarch64 audit support this go
  round, but it wasn't ready.

  I'm disappearing on vacation on Thursday.  I should have internet
  access, but it'll be spotty.  If anything goes wrong please be sure to
  cc rgb@redhat.com.  He'll make fixing things his top priority"

* git://git.infradead.org/users/eparis/audit: (50 commits)
  audit: whitespace fix in kernel-parameters.txt
  audit: fix location of __net_initdata for audit_net_ops
  audit: remove pr_info for every network namespace
  audit: Modify a set of system calls in audit class definitions
  audit: Convert int limit uses to u32
  audit: Use more current logging style
  audit: Use hex_byte_pack_upper
  audit: correct a type mismatch in audit_syscall_exit()
  audit: reorder AUDIT_TTY_SET arguments
  audit: rework AUDIT_TTY_SET to only grab spin_lock once
  audit: remove needless switch in AUDIT_SET
  audit: use define's for audit version
  audit: documentation of audit= kernel parameter
  audit: wait_for_auditd rework for readability
  audit: update MAINTAINERS
  audit: log task info on feature change
  audit: fix incorrect set of audit_sock
  audit: print error message when fail to create audit socket
  audit: fix dangling keywords in audit_log_set_loginuid() output
  audit: log on errors from filter user rules
  ...
2014-01-23 18:08:10 -08:00
Jiri Pirko ba7d49b1f0 rtnetlink: provide api for getting and setting slave info
Recent patch
bonding: add netlink attributes to slave link dev (1d3ee88ae0)

Introduced yet another device specific way to access slave information
over rtnetlink. There is one already there for bridge.

This patch introduces generic way to do this, for getting and setting
info as well by extending link_ops. Later on, this new interface will
be used for bridge ports as well.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 21:57:05 -08:00
FX Le Bail 7c90cc2d40 ipv6: enable anycast addresses as source addresses for datagrams
This change allows to consider an anycast address valid as source address
when given via an IPV6_PKTINFO or IPV6_2292PKTINFO ancillary data item.
So, when sending a datagram with ancillary data, the unicast and anycast
addresses are handled in the same way.

- Adds ipv6_chk_acast_addr_src() to check if an anycast address is link-local
  on given interface or is global.
- Uses it in ip6_datagram_send_ctl().

Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 21:57:05 -08:00
Hannes Frederic Sowa 809fa972fd reciprocal_divide: update/correction of the algorithm
Jakub Zawadzki noticed that some divisions by reciprocal_divide()
were not correct [1][2], which he could also show with BPF code
after divisions are transformed into reciprocal_value() for runtime
invariance which can be passed to reciprocal_divide() later on;
reverse in BPF dump ended up with a different, off-by-one K in
some situations.

This has been fixed by Eric Dumazet in commit aee636c480
("bpf: do not use reciprocal divide"). This follow-up patch
improves reciprocal_value() and reciprocal_divide() to work in
all cases by using Granlund and Montgomery method, so that also
future use is safe and without any non-obvious side-effects.
Known problems with the old implementation were that division by 1
always returned 0 and some off-by-ones when the dividend and divisor
where very large. This seemed to not be problematic with its
current users, as far as we can tell. Eric Dumazet checked for
the slab usage, we cannot surely say so in the case of flex_array.
Still, in order to fix that, we propose an extension from the
original implementation from commit 6a2d7a955d resp. [3][4],
by using the algorithm proposed in "Division by Invariant Integers
Using Multiplication" [5], Torbjörn Granlund and Peter L.
Montgomery, that is, pseudocode for q = n/d where q, n, d is in
u32 universe:

1) Initialization:

  int l = ceil(log_2 d)
  uword m' = floor((1<<32)*((1<<l)-d)/d)+1
  int sh_1 = min(l,1)
  int sh_2 = max(l-1,0)

2) For q = n/d, all uword:

  uword t = (n*m')>>32
  q = (t+((n-t)>>sh_1))>>sh_2

The assembler implementation from Agner Fog [6] also helped a lot
while implementing. We have tested the implementation on x86_64,
ppc64, i686, s390x; on x86_64/haswell we're still half the latency
compared to normal divide.

Joint work with Daniel Borkmann.

  [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
  [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
  [3] https://gmplib.org/~tege/division-paper.pdf
  [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
  [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
  [6] http://www.agner.org/optimize/asmlib.zip

Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: Jesse Gross <jesse@nicira.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 23:17:20 -08:00
Daniel Borkmann 89770b0a69 net: introduce reciprocal_scale helper and convert users
As David Laight suggests, we shouldn't necessarily call this
reciprocal_divide() when users didn't requested a reciprocal_value();
lets keep the basic idea and call it reciprocal_scale(). More
background information on this topic can be found in [1].

Joint work with Hannes Frederic Sowa.

  [1] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html

Suggested-by: David Laight <david.laight@aculab.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 23:17:20 -08:00
wangweidong 5bc1d1b4a2 sctp: remove macros sctp_bh_[un]lock_sock
Redefined bh_[un]lock_sock to sctp_bh[un]lock_sock for user
space friendly code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:41:36 -08:00
wangweidong 048ed4b626 sctp: remove macros sctp_{lock|release}_sock
Redefined {lock|release}_sock to sctp_{lock|release}_sock for user space friendly
code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:41:36 -08:00
wangweidong 1b0de194f1 sctp: remove macros sctp_read_[un]lock
Redefined read_[un]lock to sctp_read_[un]lock for user space
friendly code which we haven't use in years, and the macros
we never used, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:40:41 -08:00
wangweidong 387602dfdc sctp: remove macros sctp_write_[un]_lock
Redefined write_[un]lock to sctp_write_[un]lock for user space
friendly code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:40:41 -08:00
wangweidong 3c8e43ba9f sctp: remove macros sctp_spin_[un]lock
Redefined spin_[un]lock to sctp_spin_[un]lock for user space friendly
code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:40:41 -08:00
wangweidong 79b91130a2 sctp: remove macros sctp_local_bh_{disable|enable}
Redefined local_bh_{disable|enable} to sctp_local_bh_{disable|enable}
for user space friendly code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:40:40 -08:00
wangweidong 940287ee10 sctp: remove macros sctp_spin_[un]lock_irqrestore
Redefined spin_[un]lock_irqstore to sctp_spin_[un]lock_irqrestore for user
space friendly code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:40:40 -08:00
Or Gerlitz dc01e7d344 net: Add GRO support for vxlan traffic
Add GRO handlers for vxlann, by using the UDP GRO infrastructure.

For single TCP session that goes through vxlan tunneling I got nice
improvement from 6.8Gbs to 11.5Gbs

--> UDP/VXLAN GRO disabled
$ netperf  -H 192.168.52.147 -c -C

$ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  65536  65536    10.00      6799.75   12.54    24.79    0.604   1.195

--> UDP/VXLAN GRO enabled

$ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  65536  65536    10.00      11562.72   24.90    20.34    0.706   0.577

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:05:04 -08:00
Or Gerlitz b582ef0990 net: Add GRO support for UDP encapsulating protocols
Add GRO handlers for protocols that do UDP encapsulation, with the intent of
being able to coalesce packets which encapsulate packets belonging to
the same TCP session.

For GRO purposes, the destination UDP port takes the role of the ether type
field in the ethernet header or the next protocol in the IP header.

The UDP GRO handler will only attempt to coalesce packets whose destination
port is registered to have gro handler.

Use a mark on the skb GRO CB data to disallow (flush) running the udp gro receive
code twice on a packet. This solves the problem of udp encapsulated packets whose
inner VM packet is udp and happen to carry a port which has registered offloads.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:05:04 -08:00
Hannes Frederic Sowa 82b276cd2b ipv6: protect protocols not handling ipv4 from v4 connection/bind attempts
Some ipv6 protocols cannot handle ipv4 addresses, so we must not allow
connecting and binding to them. sendmsg logic does already check msg->name
for this but must trust already connected sockets which could be set up
for connection to ipv4 address family.

Per-socket flag ipv6only is of no use here, as it is under users control
by setsockopt.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:59:19 -08:00
WANG Cong 6e6a50c254 net_sched: act: export tcf_hash_search() instead of tcf_hash_lookup()
So that we will not expose struct tcf_common to modules.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 14:43:16 -08:00
WANG Cong c779f7af99 net_sched: act: fetch hinfo from a->ops->hinfo
Every action ops has a pointer to hash info, so we don't need to
hard-code it in each module.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 14:43:16 -08:00
Linus Torvalds a0fa1dd3cd Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:

 - Add the initial implementation of SCHED_DEADLINE support: a real-time
   scheduling policy where tasks that meet their deadlines and
   periodically execute their instances in less than their runtime quota
   see real-time scheduling and won't miss any of their deadlines.
   Tasks that go over their quota get delayed (Available to privileged
   users for now)

 - Clean up and fix preempt_enable_no_resched() abuse all around the
   tree

 - Do sched_clock() performance optimizations on x86 and elsewhere

 - Fix and improve auto-NUMA balancing

 - Fix and clean up the idle loop

 - Apply various cleanups and fixes

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  sched: Fix __sched_setscheduler() nice test
  sched: Move SCHED_RESET_ON_FORK into attr::sched_flags
  sched: Fix up attr::sched_priority warning
  sched: Fix up scheduler syscall LTP fails
  sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls
  sched/core: Fix htmldocs warnings
  sched/deadline: No need to check p if dl_se is valid
  sched/deadline: Remove unused variables
  sched/deadline: Fix sparse static warnings
  m68k: Fix build warning in mac_via.h
  sched, thermal: Clean up preempt_enable_no_resched() abuse
  sched, net: Fixup busy_loop_us_clock()
  sched, net: Clean up preempt_enable_no_resched() abuse
  sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
  sched/preempt, locking: Rework local_bh_{dis,en}able()
  sched/clock, x86: Avoid a runtime condition in native_sched_clock()
  sched/clock: Fix up clear_sched_clock_stable()
  sched/clock, x86: Use a static_key for sched_clock_stable
  sched/clock: Remove local_irq_disable() from the clocks
  sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
  ...
2014-01-20 10:42:08 -08:00
WANG Cong 671314a5ab net_sched: act: remove capab from struct tc_action_ops
It is not actually implemented.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 19:58:07 -08:00
Hannes Frederic Sowa 4b261c75a9 ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams
We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data
for IPv4 datagrams on IPv6 sockets.

This patch splits the ip6_datagram_recv_ctl into two functions, one
which handles both protocol families, AF_INET and AF_INET6, while the
ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data.

ip6_datagram_recv_*_ctl never reported back any errors, so we can make
them return void. Also provide a helper for protocols which don't offer dual
personality to further use ip6_datagram_recv_ctl, which is exported to
modules.

I needed to shuffle the code for ping around a bit to make it easier to
implement dual personality for ping ipv6 sockets in future.

Reported-by: Gert Doering <gert@space.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 19:53:18 -08:00
Florent Fourcot 6444f72b4b ipv6: add flowlabel_consistency sysctl
With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of
flow label unicity. This patch introduces a new sysctl to protect the old
behaviour, enable by default.

Changelog of V3:
 * rename ip6_flowlabel_consistency to flowlabel_consistency
 * use net_info_ratelimited()
 * checkpatch cleanups

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 17:12:31 -08:00
Florent Fourcot 46e5f40176 ipv6: add a flag to get the flow label used remotly
This information is already available via IPV6_FLOWINFO
of IPV6_2292PKTOPTIONS, and them a filtering to get the flow label
information. But it is probably logical and easier for users to add this
here, and to control both sent/received flow label values with the
IPV6_FLOWLABEL_MGR option.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 17:12:31 -08:00
David S. Miller 4180442058 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
	net/ipv4/tcp_metrics.c

Overlapping changes between the "don't create two tcp metrics objects
with the same key" race fix in net and the addition of the destination
address in the lookup key in net-next.

Minor overlapping changes in bnx2x driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 00:55:41 -08:00
Eric Dumazet 6c7e7610ff ipv4: fix a dst leak in tunnels
This patch :

1) Remove a dst leak if DST_NOCACHE was set on dst
   Fix this by holding a reference only if dst really cached.

2) Remove a lockdep warning in __tunnel_dst_set()
    This was reported by Cong Wang.

3) Remove usage of a spinlock where xchg() is enough

4) Remove some spurious inline keywords.
   Let compiler decide for us.

Fixes: 7d442fab0a ("ipv4: Cache dst in tunnels")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:36:39 -08:00
Hannes Frederic Sowa 11ffff752c ipv6: simplify detection of first operational link-local address on interface
In commit 1ec047eb47 ("ipv6: introduce per-interface counter for
dad-completed ipv6 addresses") I build the detection of the first
operational link-local address much to complex. Additionally this code
now has a race condition.

Replace it with a much simpler variant, which just scans the address
list when duplicate address detection completes, to check if this is
the first valid link local address and send RS and MLD reports then.

Fixes: 1ec047eb47 ("ipv6: introduce per-interface counter for dad-completed ipv6 addresses")
Reported-by: Jiri Pirko <jiri@resnulli.us>
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:10:01 -08:00
Florent Fourcot 1d13a96c74 ipv6: tcp: fix flowlabel value in ACK messages send from TIME_WAIT
This patch is following the commit b903d324be (ipv6: tcp: fix TCLASS
value in ACK messages sent from TIME_WAIT).

For the same reason than tclass, we have to store the flow label in the
inet_timewait_sock to provide consistency of flow label on the last ACK.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 17:56:33 -08:00
John W. Linville 7916a07557 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-01-17 14:43:17 -05:00
Jiri Pirko 89740ca74f neigh: use NEIGH_VAR_INIT in ndo_neigh_setup functions.
When ndo_neigh_setup is called, the bitfield used by NEIGH_VAR_SET is
not initialized yet. This might cause confusion for the people who use
NEIGH_VAR_SET in ndo_neigh_setup. So rather introduce NEIGH_VAR_INIT for
usage in ndo_neigh_setup.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 11:31:58 -08:00
Li RongQing d76ed22b22 ipv6: move IPV6_TCLASS_SHIFT into ipv6.h and define a helper
Two places defined IPV6_TCLASS_SHIFT, so we should move it into ipv6.h,
and use this macro as possible. And define ip6_tclass helper to return
tclass

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 15:53:18 -08:00
FX Le Bail ec35b61ea5 IPv6: move the anycast_src_echo_reply sysctl to netns_sysctl_ipv6
This change move anycast_src_echo_reply sysctl with other ipv6 sysctls.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:18:22 -08:00
Aruna-Hewapathirane 63862b5bef net: replace macros net_random and net_srandom with direct calls to prandom
This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.

Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:15:25 -08:00
WANG Cong 72c1d3bdd5 ipv4: register igmp_notifier even when !CONFIG_PROC_FS
We still need this notifier even when we don't config
PROC_FS.

It should be rare to have a kernel without PROC_FS,
so just for completeness.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:03:33 -08:00
David S. Miller aef2b45fe4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Conflicts:
	net/xfrm/xfrm_policy.c

Steffen Klassert says:

====================
This pull request has a merge conflict between commits be7928d20b
("net: xfrm: xfrm_policy: fix inline not at beginning of declaration") and
da7c224b1b ("net: xfrm: xfrm_policy: silence compiler warning") from
the net-next tree and commit 2f3ea9a95c ("xfrm: checkpatch erros with
inline keyword position") from the ipsec-next tree.

The version from net-next can be used, like it is done in linux-next.

1) Checkpatch cleanups, from Weilong Chen.

2) Fix lockdep complaints when pktgen is used with IPsec,
   from Fan Du.

3) Update pktgen to allow any combination of IPsec transport/tunnel mode
   and AH/ESP/IPcomp type, from Fan Du.

4) Make pktgen_dst_metrics static, Fengguang Wu.

5) Compile fix for pktgen when CONFIG_XFRM is not set,
   from Fan Du.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 23:14:25 -08:00
Eric Paris 4440e85481 audit: convert all sessionid declaration to unsigned int
Right now the sessionid value in the kernel is a combination of u32,
int, and unsigned int.  Just use unsigned int throughout.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13 22:31:46 -05:00
stephen hemminger 6daaf0de2f sctp: make sctp_addto_chunk_fixed local
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 14:42:30 -08:00
WANG Cong 7eb8896df0 net_sched: act: remove struct tcf_act_hdr
It is not necessary at all.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:15 -08:00
WANG Cong 2519a602c2 net_sched: optimize tcf_match_indev()
tcf_match_indev() is called in fast path, it is not wise to
search for a netdev by ifindex and then compare by its name,
just compare the ifindex.

Also, dev->name could be changed by user-space, therefore
the match would be always fail, but dev->ifindex could
be consistent.

BTW, this will also save some bytes from the core struct of u32.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:15 -08:00
WANG Cong 832d1d5bfa net_sched: add struct net pointer to tcf_proto_ops->dump
It will be needed by the next patch.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:14 -08:00
WANG Cong ddafd34f41 net_sched: act: move idx_gen into struct tcf_hashinfo
There is no need to store the index separatedly
since tcf_hashinfo is allocated statically too.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:14 -08:00
Luis R. Rodriguez 4f7b91404c cfg80211: make regulatory_hint() remove REGULATORY_CUSTOM_REG
The REGULATORY_CUSTOM_REG can be used during early init with
the goal of overriding the wiphy's default regulatory settings
in case the alpha2 of the device is not known. In the case that
the alpha2 becomes known lets avoid having drivers having to
clear the REGULATORY_CUSTOM_REG flag by doing it for them
when regulatory_hint() is used.

Cc: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-01-13 14:46:58 -05:00
John W. Linville f13352519e Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2014-01-13 14:40:59 -05:00
John W. Linville ec665facde This is the first NFC pull request for 3.14
It includes:
 
 * A new NFC driver for Marvell's 8897, and a few NCI fixes and
   improvements needed to support this chipset.
 
 * An LLCP fix for how we were setting the default MIU on a p2p link. If
   there is no explicit MIU extension announced at connection time, we
   must use the default one and not the one announced at LLCP link
   establishement time.
 
 * A pn544 EEPROM config update. Some of the currently EEPROM configured
   values are overwriting the firmware ones while other should not be set
   by the driver itself.
 
 * Some NFC digital stack fixes and improvements. Asynchronous functions
   are better documented, RF technologies and CRC functions are set upon
   PSL_REQ reception, and a few minor bugs are fixed.
 
 * Minor and miscelaneous pn533, mei_phy and port100 fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJSzfOKAAoJEIqAPN1PVmxKODEP/i1tmx6bwSjuR0gMyvIkqcBJ
 1mM7BwdXlTKTvS/HaKTqaftS5S9Kj/IYSsHPjqRAJp3ipZdc39D8rR3jiyhWzKyD
 A/o6whTBTnyAgt8/enNp+h8S/Iq+E3itL/51KUOeeFIKSpGqqfcssZ1/3qhvoYZQ
 75zck2OPiEs8KBl1bCrrzK1kP4s8aEH6PepmXd7WS8njKe+dcyl3erw0IVN4WPfP
 FKFemvL/HP8+cUyshdiQGRiSw+TyD1VLaZinhyoeJxVRcXUjcodLwtCIATwqvu54
 2fMk1ccFineAQZGFfZGbtMAjHQLUeOpHHxFfdkW1g7P9IBp4zjtEiNOhNvPnKlR2
 p4g4R/vPdXxbQWjIoWzXI8qw/eFq8xIVC0ap37W/Y65532ParnXESAwk29BJ6770
 kqpHTjfZTUmW2POuvqhEKUKPPVp5nt0ArgfnjvHOS1wxcT885vWeu/YOxpOm9VdU
 rjFSBBaBDC43vGkCHn5szU9sEwu4O1/JFHElSToXsu+bRtS0tA3O62Kv732RZmbm
 1SCbZ63o1ivZr8Q37bY1NDW1/YdwUJMNbEb/t/wDLBqYx0vQcD0aDUWCwoACi2Du
 FbUElc975E1ChvM7VfV7uqFN0Pc++M1IEHLcM2BiXmSIGjziFY02FFDkqHiea/tC
 xlYfYrrUoIj0v/u7q1t4
 =ydBy
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz <sameo@linux.intel.com> says:

"This is the first NFC pull request for 3.14

It includes:

* A new NFC driver for Marvell's 8897, and a few NCI fixes and
  improvements needed to support this chipset.

* An LLCP fix for how we were setting the default MIU on a p2p link. If
  there is no explicit MIU extension announced at connection time, we
  must use the default one and not the one announced at LLCP link
  establishement time.

* A pn544 EEPROM config update. Some of the currently EEPROM configured
  values are overwriting the firmware ones while other should not be set
  by the driver itself.

* Some NFC digital stack fixes and improvements. Asynchronous functions
  are better documented, RF technologies and CRC functions are set upon
  PSL_REQ reception, and a few minor bugs are fixed.

* Minor and miscelaneous pn533, mei_phy and port100 fixes."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-01-13 14:36:42 -05:00
Hannes Frederic Sowa 8ed1dc44d3 ipv4: introduce hardened ip_no_pmtu_disc mode
This new ip_no_pmtu_disc mode only allowes fragmentation-needed errors
to be honored by protocols which do more stringent validation on the
ICMP's packet payload. This knob is useful for people who e.g. want to
run an unmodified DNS server in a namespace where they need to use pmtu
for TCP connections (as they are used for zone transfers or fallback
for requests) but don't want to use possibly spoofed UDP pmtu information.

Currently the whitelisted protocols are TCP, SCTP and DCCP as they check
if the returned packet is in the window or if the association is valid.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: John Heffner <johnwheffner@gmail.com>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:22:55 -08:00
Hannes Frederic Sowa f87c10a8aa ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing
While forwarding we should not use the protocol path mtu to calculate
the mtu for a forwarded packet but instead use the interface mtu.

We mark forwarded skbs in ip_forward with IPSKB_FORWARDED, which was
introduced for multicast forwarding. But as it does not conflict with
our usage in unicast code path it is perfect for reuse.

I moved the functions ip_sk_accept_pmtu, ip_sk_use_pmtu and ip_skb_dst_mtu
along with the new ip_dst_mtu_maybe_forward to net/ip.h to fix circular
dependencies because of IPSKB_FORWARDED.

Because someone might have written a software which does probe
destinations manually and expects the kernel to honour those path mtus
I introduced a new per-namespace "ip_forward_use_pmtu" knob so someone
can disable this new behaviour. We also still use mtus which are locked on a
route for forwarding.

The reason for this change is, that path mtus information can be injected
into the kernel via e.g. icmp_err protocol handler without verification
of local sockets. As such, this could cause the IPv4 forwarding path to
wrongfully emit fragmentation needed notifications or start to fragment
packets along a path.

Tunnel and ipsec output paths clear IPCB again, thus IPSKB_FORWARDED
won't be set and further fragmentation logic will use the path mtu to
determine the fragmentation size. They also recheck packet size with
help of path mtu discovery and report appropriate errors.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: John Heffner <johnwheffner@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:22:54 -08:00
Peter Zijlstra 3708983452 sched, net: Fixup busy_loop_us_clock()
The only valid use of preempt_enable_no_resched() is if the very next
line is schedule() or if we know preemption cannot actually be enabled
by that statement due to known more preempt_count 'refs'.

This busy_poll stuff looks to be completely and utterly broken,
sched_clock() can return utter garbage with interrupts enabled (rare
but still) and it can drift unbounded between CPUs.

This means that if you get preempted/migrated and your new CPU is
years behind on the previous CPU we get to busy spin for a _very_ long
time.

There is a _REASON_ sched_clock() warns about preemptability -
papering over it with a preempt_disable()/preempt_enable_no_resched()
is just terminal brain damage on so many levels.

Replace sched_clock() usage with local_clock() which has a bounded
drift between CPUs (<2 jiffies).

There is a further problem with the entire busy wait poll thing in
that the spin time is additive to the syscall timeout, not inclusive.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: rui.zhang@intel.com
Cc: jacob.jun.pan@linux.intel.com
Cc: Mike Galbraith <bitbucket@online.de>
Cc: hpa@zytor.com
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: lenb@kernel.org
Cc: rjw@rjwysocki.net
Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 17:39:11 +01:00
John W. Linville 235f939228 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	net/ieee802154/6lowpan.c
2014-01-10 10:59:40 -05:00
Johannes Berg b77cf4f8e1 mac80211: handle MMPDUs at EOSP correctly
If a uAPSD service period ends with an MMPDU, we currently just
send that MMPDU, but it obviously won't get the EOSP bit set as
it doesn't have a QoS header. This contradicts the standard, so
add a QoS-nulldata frame after the MMPDU to properly terminate
the service period with a frame that has EOSP set.

Also fix a bug wrt. the TID for the MMPDU, it shouldn't be set
to 0 unconditionally but use the actual TID that was assigned.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-10 09:50:02 +01:00
Patrick McHardy 3876d22dba netfilter: nf_tables: rename nft_do_chain_pktinfo() to nft_do_chain()
We don't encode argument types into function names and since besides
nft_do_chain() there are only AF-specific versions, there is no risk
of confusion.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:16 +01:00
Patrick McHardy fa2c1de0bb netfilter: nf_tables: minor nf_chain_type cleanups
Minor nf_chain_type cleanups:

- reorder struct to plug a hoe
- rename struct module member to "owner" for consistency
- rename nf_hookfn array to "hooks" for consistency
- reorder initializers for better readability

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:15 +01:00
Patrick McHardy 2a37d755b8 netfilter: nf_tables: constify chain type definitions and pointers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:15 +01:00
Patrick McHardy baae3e62f3 netfilter: nf_tables: fix chain type module reference handling
The chain type module reference handling makes no sense at all: we take
a reference immediately when the module is registered, preventing the
module from ever being unloaded.

Fix by taking a reference when we're actually creating a chain of the
chain type and release the reference when destroying the chain.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:14 +01:00
Ilan Peer bdfbec2d2d cfg80211: Add a function to get the number of supported channels
Add a utility function to get the number of channels supported by
the device, and update the places in the code that need this data.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
[replace another occurrence in libertas, fix kernel-doc, fix bugs]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-09 14:24:24 +01:00
John W. Linville 300e5fd160 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-01-08 13:44:29 -05:00
Patrick McHardy 4566bf2706 netfilter: nft_meta: add l4proto support
For L3-proto independant rules we need to get at the L4 protocol value
directly. Add it to the nft_pktinfo struct and use the meta expression
to retrieve it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:57:31 +01:00
Patrick McHardy 1d49144c0a netfilter: nf_tables: add "inet" table for IPv4/IPv6
This patch adds a new table family and a new filter chain that you can
use to attach IPv4 and IPv6 rules. This should help to simplify
rule-set maintainance in dual-stack setups.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:57:25 +01:00
Patrick McHardy 115a60b173 netfilter: nf_tables: add support for multi family tables
Add support to register chains to multiple hooks for different address
families for mixed IPv4/IPv6 tables.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2014-01-07 23:55:46 +01:00
Patrick McHardy c9484874e7 netfilter: nf_tables: add hook ops to struct nft_pktinfo
Multi-family tables need the AF from the hook ops. Add a pointer to the
hook ops and replace usage of the hooknum member in struct nft_pktinfo.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:50:43 +01:00
Johannes Berg 685328b296 mac80211: remove channel_change_time
This value is no longer used by mac80211, and practically no
driver ever set it to a correct value anyway, so remove it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-07 23:16:39 +01:00
FX Le Bail 509aba3b0d IPv6: add the option to use anycast addresses as source addresses in echo reply
This change allows to follow a recommandation of RFC4942.

- Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses
  as source addresses for ICMPv6 echo reply. This sysctl is false by default
  to preserve existing behavior.
- Add inline check ipv6_anycast_destination().
- Use them in icmpv6_echo_reply().

Reference:
RFC4942 - IPv6 Transition/Coexistence Security Considerations
   (http://tools.ietf.org/html/rfc4942#section-2.1.6)

2.1.6. Anycast Traffic Identification and Security

   [...]
   To avoid exposing knowledge about the internal structure of the
   network, it is recommended that anycast servers now take advantage of
   the ability to return responses with the anycast address as the
   source address if possible.

Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 15:51:39 -05:00
Eric Dumazet 438e38fadc gre_offload: statically build GRE offloading support
GRO/GSO layers can be enabled on a node, even if said
node is only forwarding packets.

This patch permits GSO (and upcoming GRO) support for GRE
encapsulated packets, even if the host has no GRE tunnel setup.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 20:28:34 -05:00
David S. Miller 39b6b2992f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Jesse Gross says:

====================
[GIT net-next] Open vSwitch

Open vSwitch changes for net-next/3.14. Highlights are:
 * Performance improvements in the mechanism to get packets to userspace
   using memory mapped netlink and skb zero copy where appropriate.
 * Per-cpu flow stats in situations where flows are likely to be shared
   across CPUs. Standard flow stats are used in other situations to save
   memory and allocation time.
 * A handful of code cleanups and rationalization.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 19:48:38 -05:00
Amitkumar Karwar 22c15bf30b NFC: NCI: Add set_config API
This API can be used by drivers to send their custom
configuration using SET_CONFIG NCI command to the device.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-07 01:32:40 +01:00
Amitkumar Karwar 86e8586ed5 NFC: NCI: Add setup handler
Some drivers require special configuration while initializing.
This patch adds setup handler for this custom configuration.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-07 01:32:40 +01:00
Thomas Graf bb9b18fb55 genl: Add genlmsg_new_unicast() for unicast message allocation
Allocates a new sk_buff large enough to cover the specified payload
plus required Netlink headers. Will check receiving socket for
memory mapped i/o capability and use it if enabled. Will fall back
to non-mapped skb if message size exceeds the frame size of the ring.

Signed-of-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:51:53 -08:00
David S. Miller 56a4342dfe Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
	net/ipv6/ip6_tunnel.c
	net/ipv6/ip6_vti.c

ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.

qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 17:37:45 -05:00
David S. Miller 9aa28f2b71 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says: <pablo@netfilter.org>

====================
nftables updates for net-next

The following patchset contains nftables updates for your net-next tree,
they are:

* Add set operation to the meta expression by means of the select_ops()
  infrastructure, this allows us to set the packet mark among other things.
  From Arturo Borrero Gonzalez.

* Fix wrong format in sscanf in nf_tables_set_alloc_name(), from Daniel
  Borkmann.

* Add new queue expression to nf_tables. These comes with two previous patches
  to prepare this new feature, one to add mask in nf_tables_core to
  evaluate the queue verdict appropriately and another to refactor common
  code with xt_NFQUEUE, from Eric Leblond.

* Do not hide nftables from Kconfig if nfnetlink is not enabled, also from
  Eric Leblond.

* Add the reject expression to nf_tables, this adds the missing TCP RST
  support. It comes with an initial patch to refactor common code with
  xt_NFQUEUE, again from Eric Leblond.

* Remove an unused variable assignment in nf_tables_dump_set(), from Michal
  Nazarewicz.

* Remove the nft_meta_target code, now that Arturo added the set operation
  to the meta expression, from me.

* Add help information for nf_tables to Kconfig, also from me.

* Allow to dump all sets by specifying NFPROTO_UNSPEC, similar feature is
  available to other nf_tables objects, requested by Arturo, from me.

* Expose the table usage counter, so we can know how many chains are using
  this table without dumping the list of chains, from Tomasz Bursztyka.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 13:29:30 -05:00
David S. Miller 855404efae Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for your net-next tree,
they are:

* Add full port randomization support. Some crazy researchers found a way
  to reconstruct the secure ephemeral ports that are allocated in random mode
  by sending off-path bursts of UDP packets to overrun the socket buffer of
  the DNS resolver to trigger retransmissions, then if the timing for the
  DNS resolution done by a client is larger than usual, then they conclude
  that the port that received the burst of UDP packets is the one that was
  opened. It seems a bit aggressive method to me but it seems to work for
  them. As a result, Daniel Borkmann and Hannes Frederic Sowa came up with a
  new NAT mode to fully randomize ports using prandom.

* Add a new classifier to x_tables based on the socket net_cls set via
  cgroups. These includes two patches to prepare the field as requested by
  Zefan Li. Also from Daniel Borkmann.

* Use prandom instead of get_random_bytes in several locations of the
  netfilter code, from Florian Westphal.

* Allow to use the CTA_MARK_MASK in ctnetlink when mangling the conntrack
  mark, also from Florian Westphal.

* Fix compilation warning due to unused variable in IPVS, from Geert
  Uytterhoeven.

* Add support for UID/GID via nfnetlink_queue, from Valentina Giusti.

* Add IPComp extension to x_tables, from Fan Du.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-05 20:18:50 -05:00
Li RongQing 8f84985fec net: unify the pcpu_tstats and br_cpu_netstats as one
They are same, so unify them as one, pcpu_sw_netstats.

Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats
from if_tunnel and remove br_cpu_netstats from br_private.h

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 20:10:24 -05:00
Marcel Holtmann f9f462faa0 Bluetooth: Add quirk for disabling Delete Stored Link Key command
Some controller pretend they support the Delete Stored Link Key command,
but in reality they really don't support it.

  < HCI Command: Delete Stored Link Key (0x03|0x0012) plen 7
      bdaddr 00:00:00:00:00:00 all 1
  > HCI Event: Command Complete (0x0e) plen 4
      Delete Stored Link Key (0x03|0x0012) ncmd 1
      status 0x11 deleted 0
      Error: Unsupported Feature or Parameter Value

Not correctly supporting this command causes the controller setup to
fail and will make a device not work. However sending the command for
controller that handle stored link keys is important. This quirk
allows a driver to disable the command if it knows that this command
handling is broken.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-01-04 20:10:40 +02:00
Thierry Escande 444fb98eed NFC: digital: Add a note about asynchronous functions
This explains how and why the timeout parameter must be handled by the
driver implementation.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-04 03:35:34 +01:00
stephen hemminger 5e419e68a6 llc: make lock static
The llc_sap_list_lock does not need to be global, only acquired
in core.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 20:56:48 -05:00
stephen hemminger 8f09898bf0 socket: cleanups
Namespace related cleaning

 * make cred_to_ucred static
 * remove unused sock_rmalloc function

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 20:55:58 -05:00
Tom Herbert 9a4aa9af44 ipv4: Use percpu Cache route in IP tunnels
percpu route cache eliminates share of dst refcnt between CPUs.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 19:40:57 -05:00
Tom Herbert 7d442fab0a ipv4: Cache dst in tunnels
Avoid doing a route lookup on every packet being tunneled.

In ip_tunnel.c cache the route returned from ip_route_output if
the tunnel is "connected" so that all the rouitng parameters are
taken from tunnel parms for a packet. Specifically, not NBMA tunnel
and tos is from tunnel parms (not inner packet).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 19:38:45 -05:00
Daniel Borkmann 86f8515f97 net: netprio: rename config to be more consistent with cgroup configs
While we're at it and introduced CGROUP_NET_CLASSID, lets also make
NETPRIO_CGROUP more consistent with the rest of cgroups and rename it
into CONFIG_CGROUP_NET_PRIO so that for networking, we now have
CONFIG_CGROUP_NET_{PRIO,CLASSID}. This not only makes the CONFIG
option consistent among networking cgroups, but also among cgroups
CONFIG conventions in general as the vast majority has a prefix of
CONFIG_CGROUP_<SUBSYS>.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:42 +01:00
Daniel Borkmann fe1217c4f3 net: net_cls: move cgroupfs classid handling into core
Zefan Li requested [1] to perform the following cleanup/refactoring:

- Split cgroupfs classid handling into net core to better express a
  possible more generic use.

- Disable module support for cgroupfs bits as the majority of other
  cgroupfs subsystems do not have that, and seems to be not wished
  from cgroup side. Zefan probably might want to follow-up for netprio
  later on.

- By this, code can be further reduced which previously took care of
  functionality built when compiled as module.

cgroupfs bits are being placed under net/core/netclassid_cgroup.c, so
that we are consistent with {netclassid,netprio}_cgroup naming that is
under net/core/ as suggested by Zefan.

No change in functionality, but only code refactoring that is being
done here.

 [1] http://patchwork.ozlabs.org/patch/304825/

Suggested-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:41 +01:00
stephen hemminger dcd93ed4cd netfilter: nf_conntrack: remove dead code
The following code is not used in current upstream code.
Some of this seems to be old hooks, other might be used by some
out of tree module (which I don't care about breaking), and
the need_ipv4_conntrack was used by old NAT code but no longer
called.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:37 +01:00
Fan Du c454997e68 {pktgen, xfrm} Introduce xfrm_state_lookup_byspi for pktgen
Introduce xfrm_state_lookup_byspi to find user specified by custom
from "pgset spi xxx". Using this scheme, any flow regardless its
saddr/daddr could be transform by SA specified with configurable
spi.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-01-03 07:29:12 +01:00
Vlad Yasevich 619a60ee04 sctp: Remove outqueue empty state
The SCTP outqueue structure maintains a data chunks
that are pending transmission, the list of chunks that
are pending a retransmission and a length of data in
flight.  It also tries to keep the emtpy state so that
it can performe shutdown sequence or notify user.

The problem is that the empy state is inconsistently
tracked.  It is possible to completely drain the queue
without sending anything when using PR-SCTP.  In this
case, the empty state will not be correctly state as
report by Jamal Hadi Salim <jhs@mojatatu.com>.  This
can cause an association to be perminantly stuck in the
SHUTDOWN_PENDING state.

Additionally, SCTP is incredibly inefficient when setting
the empty state.  Even though all the data is availaible
in the outqueue structure, we ignore it and walk a list
of trasnports.

In the end, we can completely remove the extra empty
state and figure out if the queue is empty by looking
at 3 things:  length of pending data, length of in-flight
data, and exisiting of retransmit data.  All of these
are already in the strucutre.

Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 17:22:48 -05:00
stephen hemminger 9c75f4029c sched action: make local function static
No need to export functions only used in one file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 03:30:36 -05:00
Li RongQing 0c3584d589 ipv6: remove prune parameter for fib6_clean_all
since the prune parameter for fib6_clean_all always is 0, remove it.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 03:30:35 -05:00
stephen hemminger e82435341f ipv6: namespace cleanups
Running 'make namespacecheck' shows:
  net/ipv6/route.o
    ipv6_route_table_template
    rt6_bind_peer
  net/ipv6/icmp.o
    icmpv6_route_lookup
    ipv6_icmp_table_template

This addresses some of those warnings by:
 * make icmpv6_route_lookup static
 * move inline's out of ip6_route.h since only used into route.c
 * move rt6_bind_peer into route.c

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 23:46:09 -05:00
stephen hemminger 3678a9d863 netlink: cleanup rntl_af_register
The function __rtnl_af_register is never called outside this
code, and the return value is always 0.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 23:42:19 -05:00
Daniel Borkmann 7e0309631e net: llc: fix order of evaluation in llc_conn_ac_inc_vr_by_1
Function llc_conn_ac_inc_vr_by_1() evaluates via macro
PDU_GET_NEXT_Vr() into ...

  llc_sk(sk)->vR = ++llc_sk(sk)->vR & 0xffffffffffffff7f

... but the order in which the side effects take place is
undefined because there is no intervening sequence point.

As llc_sk(sk)->vR is written in llc_sk(sk)->vR (assignment
left-hand side) and written in ++llc_sk(sk)->vR & 0xffffffffffffff7f
this might possibly yield undefined behavior.

The final value of llc_sk(sk)->vR is ambiguous, because,
depending on the order of expression evaluation, the
increment may occur before, after, or interleaved with
the assignment. In C, evaluating such an expression yields
undefined behavior.

Since we're doing the increment via PDU_GET_NEXT_Vr() macro
and the only place it is being used is from
llc_conn_ac_inc_vr_by_1(), in order to increment vR by 1
with a follow-up optimized modulo, rewrite the expression
into ((vR + 1) & CONST) in order to fix this.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 22:22:43 -05:00
John W. Linville ad86c55bac Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-01-01 15:39:56 -05:00
Zhi Yong Wu c9d8ca0454 net, rps: fix build failure when CONFIG_RPS isn't set
In file included from net/socket.c:99:0:
include/net/sock.h: In function ‘sock_rps_record_flow’:
include/net/sock.h:849:30: error: ‘const struct sock’ has no member named ‘sk_rxhash’
include/net/sock.h: In function ‘sock_rps_reset_flow’:
include/net/sock.h:854:29: error: ‘const struct sock’ has no member named ‘sk_rxhash’

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31 15:59:27 -05:00
Tom Herbert fe47755852 net: Allow setting sock flow hash without a sock
This patch adds sock_rps_record_flow_hash and sock_rps_reset_flow_hash
which take a hash value as an argument and sets the sock_flow_table
accordingly.  This allows the table to be populated in cases where flow
is being tracked outside of a sock structure.

sock_rps_record_flow and sock_rps_reset_flow call this function
where the hash is taken from sk_rxhash.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31 13:31:34 -05:00
Eric Leblond cc70d069e2 netfilter: REJECT: separate reusable code
This patch prepares the addition of TCP reset support in
the nft_reject module by moving reusable code into a header
file.

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-30 15:04:41 +01:00
stephen hemminger f7e56a76ac tcp: make local functions static
The following are only used in one file:
  tcp_connect_init
  tcp_set_rto

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-29 16:34:24 -05:00
Stephen Hemminger ea074b3495 ipv4: ping make local stuff static
Don't export ping_table or ping_v4_sendmsg. Both are only used
inside ping code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-28 17:05:45 -05:00
Stephen Hemminger 068a6e1834 ipv4: remove unused function
inetpeer_invalidate_family defined but never used

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-28 17:03:20 -05:00
Stephen Hemminger 7195cf7221 arp: make arp_invalidate static
Don't export arp_invalidate, only used in arp.c

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-28 17:02:46 -05:00
Rashika Kheria b6835f9c5d drivers: net: Include new header file in sbni.c
Create a new header file include/net/Space.h which contains
prototype declaration of sbni_probe().

Include the new header file in drivers/net/Space.c and
drivers/net/wan/sbni.c because they use this function.

This eliminates the following warning in wan/sbni.c:
drivers/net/wan/sbni.c:224:12: warning: no previous prototype for ‘sbni_probe’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19 18:51:20 -05:00
David S. Miller 1669cb9855 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2013-12-19

1) Use the user supplied policy index instead of a generated one
   if present. From Fan Du.

2) Make xfrm migration namespace aware. From Fan Du.

3) Make the xfrm state and policy locks namespace aware. From Fan Du.

4) Remove ancient sleeping when the SA is in acquire state,
   we now queue packets to the policy instead. This replaces the
   sleeping code.

5) Remove FLOWI_FLAG_CAN_SLEEP. This was used to notify xfrm about the
   posibility to sleep. The sleeping code is gone, so remove it.

6) Check user specified spi for IPComp. Thr spi for IPcomp is only
   16 bit wide, so check for a valid value. From Fan Du.

7) Export verify_userspi_info to check for valid user supplied spi ranges
   with pfkey and netlink. From Fan Du.

8) RFC3173 states that if the total size of a compressed payload and the IPComp
   header is not smaller than the size of the original payload, the IP datagram
   must be sent in the original non-compressed form. These packets are dropped
   by the inbound policy check because they are not transformed. Document the need
   to set 'level use' for IPcomp to receive such packets anyway. From Fan Du.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19 18:37:49 -05:00
Kyeyoon Park fa9ffc7456 cfg80211: Add support for QoS mapping
This allows QoS mapping from external networks to be implemented as
defined in IEEE Std 802.11-2012, 10.24.9. APs can use this to advertise
DSCP ranges and exceptions for mapping frames to a specific UP over
Wi-Fi.

The payload of the QoS Map Set element (IEEE Std 802.11-2012, 8.4.2.97)
is sent to the driver through the new NL80211_ATTR_QOS_MAP attribute to
configure the local behavior either on the AP (based on local
configuration) or on a station (based on information received from the
AP).

Signed-off-by: Kyeyoon Park <kyeyoonp@qca.qualcomm.com>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-19 16:29:22 +01:00
Johannes Berg 567ffc3509 nl80211: support vendor-specific events
In addition to vendor-specific commands, also support vendor-specific
events. These must be registered with cfg80211 before they can be used.
They're also advertised in nl80211 in the wiphy information so that
userspace knows can be expected. The events themselves are sent on a
new multicast group called "vendor".

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-19 13:40:31 +01:00
Felix Fietkau a7022e65c6 mac80211: add helper functions for tracking P2P NoA state
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-19 13:37:46 +01:00
Hannes Frederic Sowa 93b36cf342 ipv6: support IPV6_PMTU_INTERFACE on sockets
IPV6_PMTU_INTERFACE is the same as IPV6_PMTU_PROBE for ipv6. Add it
nontheless for symmetry with IPv4 sockets. Also drop incoming MTU
information if this mode is enabled.

The additional bit in ipv6_pinfo just eats in the padding behind the
bitfield. There are no changes to the layout of the struct at all.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 17:37:05 -05:00
Hannes Frederic Sowa 974eda11c5 inet: make no_pmtu_disc per namespace and kill ipv4_config
The other field in ipv4_config, log_martians, was converted to a
per-interface setting, so we can just remove the whole structure.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 16:58:20 -05:00
David S. Miller 143c905494 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/i40e/i40e_main.c
	drivers/net/macvtap.c

Both minor merge hassles, simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 16:42:06 -05:00
WANG Cong 3627287463 net_sched: convert tcf_proto_ops to use struct list_head
We don't need to maintain our own singly linked list code.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:08 -05:00
WANG Cong 1f747c26c4 net_sched: convert tc_action_ops to use struct list_head
We don't need to maintain our own singly linked list code.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
WANG Cong 89819dc01f net_sched: convert tcf_hashinfo to hlist and use spinlock
So that we don't need to play with singly linked list,
and since the code is not on hot path, we can use spinlock
instead of rwlock.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
WANG Cong 369ba56787 net_sched: init struct tcf_hashinfo at register time
It looks weird to store the lock out of the struct but
still points to a static variable. Just move them into the struct.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
WANG Cong 5da57f422d net_sched: cls: refactor out struct tcf_ext_map
These information can be saved in tcf_exts, and this will
simplify the code.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
WANG Cong 33be627159 net_sched: act: use standard struct list_head
Currently actions are chained by a singly linked list,
therefore it is a bit hard to add and remove a specific
entry. Convert it to struct list_head so that in the
latter patch we can remove an action without finding
its head.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
WANG Cong d84231d3a2 net_sched: remove get_stats from tc_action_ops
It is not used.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
Tom Herbert 7539fadcb8 net: Add utility functions to clear rxhash
In several places 'skb->rxhash = 0' is being done to clear the
rxhash value in an skb.  This does not clear l4_rxhash which could
still be set so that the rxhash wouldn't be recalculated on subsequent
call to skb_get_rxhash.  This patch adds an explict function to clear
all the rxhash related information in the skb properly.

skb_clear_hash_if_not_l4 clears the rxhash only if it is not marked as
l4_rxhash.

Fixed up places where 'skb->rxhash = 0' was being called.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 16:36:21 -05:00
Eric Dumazet d4589926d7 tcp: refine TSO splits
While investigating performance problems on small RPC workloads,
I noticed linux TCP stack was always splitting the last TSO skb
into two parts (skbs). One being a multiple of MSS, and a small one
with the Push flag. This split is done even if TCP_NODELAY is set,
or if no small packet is in flight.

Example with request/response of 4K/4K

IP A > B: . ack 68432 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: . 65537:68433(2896) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: P 68433:69633(1200) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
IP B > A: . ack 68433 win 2768 <nop,nop,timestamp 6525001 6524593>
IP B > A: . 69632:72528(2896) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
IP B > A: P 72528:73728(1200) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
IP A > B: . ack 72528 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: . 69633:72529(2896) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: P 72529:73729(1200) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>

We can avoid this split by including the Nagle tests at the right place.

Note : If some NIC had trouble sending TSO packets with a partial
last segment, we would have hit the problem in GRO/forwarding workload already.

tcp_minshall_update() is moved to tcp_output.c and is updated as we might
feed a TSO packet with a partial last segment.

This patch tremendously improves performance, as the traffic now looks
like :

IP A > B: . ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
IP A > B: P 94209:98305(4096) ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
IP B > A: . ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
IP B > A: P 98304:102400(4096) ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
IP A > B: . ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
IP A > B: P 98305:102401(4096) ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
IP B > A: . ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
IP B > A: P 102400:106496(4096) ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
IP A > B: . ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
IP A > B: P 102401:106497(4096) ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
IP B > A: . ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>
IP B > A: P 106496:110592(4096) ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>

Before :

lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
280774

 Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':

     205719.049006 task-clock                #    9.278 CPUs utilized
         8,449,968 context-switches          #    0.041 M/sec
         1,935,997 CPU-migrations            #    0.009 M/sec
           160,541 page-faults               #    0.780 K/sec
   548,478,722,290 cycles                    #    2.666 GHz                     [83.20%]
   455,240,670,857 stalled-cycles-frontend   #   83.00% frontend cycles idle    [83.48%]
   272,881,454,275 stalled-cycles-backend    #   49.75% backend  cycles idle    [66.73%]
   166,091,460,030 instructions              #    0.30  insns per cycle
                                             #    2.74  stalled cycles per insn [83.39%]
    29,150,229,399 branches                  #  141.699 M/sec                   [83.30%]
     1,943,814,026 branch-misses             #    6.67% of all branches         [83.32%]

      22.173517844 seconds time elapsed

lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
IpOutRequests                   16851063           0.0
IpExtOutOctets                  23878580777        0.0

After patch :

lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
280877

 Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':

     107496.071918 task-clock                #    4.847 CPUs utilized
         5,635,458 context-switches          #    0.052 M/sec
         1,374,707 CPU-migrations            #    0.013 M/sec
           160,920 page-faults               #    0.001 M/sec
   281,500,010,924 cycles                    #    2.619 GHz                     [83.28%]
   228,865,069,307 stalled-cycles-frontend   #   81.30% frontend cycles idle    [83.38%]
   142,462,742,658 stalled-cycles-backend    #   50.61% backend  cycles idle    [66.81%]
    95,227,712,566 instructions              #    0.34  insns per cycle
                                             #    2.40  stalled cycles per insn [83.43%]
    16,209,868,171 branches                  #  150.795 M/sec                   [83.20%]
       874,252,952 branch-misses             #    5.39% of all branches         [83.37%]

      22.175821286 seconds time elapsed

lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
IpOutRequests                   11239428           0.0
IpExtOutOctets                  23595191035        0.0

Indeed, the occupancy of tx skbs (IpExtOutOctets/IpOutRequests) is higher :
2099 instead of 1417, thus helping GRO to be more efficient when using FQ packet
scheduler.

Many thanks to Neal for review and ideas.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Van Jacobson <vanj@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 15:15:25 -05:00
David S. Miller 6ea09d8a09 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please pull this batch of updates for the 3.14 stream...

For the Bluetooth bits, Gustavo says:

"This is the first batch of patches intended for 3.14. There is
nothing big here.  Most of the code are refactors, clean up, small
fixes, plus some new device id support."

And...

"More patches to 3.14. Here we have the support for Low Energy
Connection Oriented Channels (LE CoC). Basically, as the name says,
this adds supports for connection oriented channels in the same way
we already have them for BR/EDR connections so profiles/protocols
that work on top of BR/EDR can now work on LE plus a plenty of new
possibilities for LE."

For the ath10k bits, Kalle says:

"Janusz and Marek implemented DFS support to ath10k, but the code is
not enabled yet due to missing cfg80211/mac80211 patches (it will be
enabled in the next pull request). Michal did some device reset fixes
and made it possible for ath10k to share an interrupt with another
device. And lots of smaller fixes from different people."

For the iwlwifi bits, Emmanuel says:

"I have here a big rework of the rate control by Eyal. This is obviously
the biggest part of this batch.
I also have enhancement of protection flags by Avri and a few bits for
WoWLAN by Eliad and Luca. Johannes cleans up the debugfs plus a few
fixes. I provided a few things for Bluetooth coexistence.
Besides this we have an implementation for low priority scan."

Along with all that, there are big batches of updates to mwifiex and
ath9k, Jeff Kirsher's FSF address fix patches, and a handful of other
bits here and there.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 15:08:17 -05:00
wangweidong be78cfcb25 sctp: Reorder 'struc association' members to reduce its size
Members of 'struct association' are not in appropriate order to
reuse compiler added padding on 64bit architectures. In this patch
we reorder those struct members and help reduce the size of the
structure from 2776 bytes to 2720 bytes on 64 bit architectures.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:32:43 -05:00
Janusz Dziedzic 204e35a91c nl80211: add VHT support for set_bitrate_mask
Add VHT MCS/NSS set support for nl80211_set_tx_bitrate_mask().
This should be used mainly for test purpose, to check
different MCS/NSS VHT combinations.

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-16 16:05:17 +01:00
Fan Du 776e9dd90c xfrm: export verify_userspi_info for pkfey and netlink interface
In order to check against valid IPcomp spi range, export verify_userspi_info
for both pfkey and netlink interface.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-12-16 12:54:02 +01:00
Felix Fietkau 70dabeb74e mac80211: let the driver reserve extra tailroom in beacons
Can be used to add extra IEs (such as P2P NoA) without having to
reallocate the buffer.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-16 12:14:07 +01:00
Johannes Berg 6a9d1b91f3 mac80211: add pre-RCU-sync sta removal driver operation
Currently, mac80211 allows drivers to keep RCU-protected station
references that are cleared when the station is removed from the
driver and consequently needs to synchronize twice, once before
removing the station from the driver (so it can guarantee that
the station is no longer used in TX towards the driver) and once
after the station is removed from the driver.

Add a new pre-RCU-synchronisation station removal operation to
the API to allow drivers to clear/invalidate their RCU-protected
station pointers before the RCU synchronisation.

This will allow removing the second synchronisation by changing
the driver API so that the driver may no longer assume a valid
RCU-protected pointer after sta_remove/sta_state returns.

The alternative to this would be to synchronize_rcu() in all the
drivers that currently rely on this behaviour (only iwlmvm) but
that would defeat the purpose.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-16 11:29:44 +01:00
Johannes Berg c4de673b77 Merge remote-tracking branch 'wireless-next/master' into mac80211-next 2013-12-16 11:23:45 +01:00
John W. Linville f647a52e15 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-12-13 13:14:28 -05:00
Jesper Dangaard Brouer 8cf4d6a224 net: reorder struct netns_ct for better cache-line usage
Reorder struct netns_ct so that atomic_t "count" changes don't
slowdown users of read mostly fields.

This is based on Eric Dumazet's proposed patch:
 "netfilter: conntrack: remove the central spinlock"
 http://thread.gmane.org/gmane.linux.network/268758/focus=47306

The tricky part of cache-aligning this structure, that it is getting
inlined in struct net (include/net/net_namespace.h), thus changes to
other netns_xxx structures affects our alignment.

Eric's original patch contained an ambiguity on 32-bit regarding
alignment in struct net.  This patch also takes 32-bit into account,
and in case of changed (struct net) alignment sysctl_xxx entries have
been ordered according to how often they are accessed.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-13 12:55:55 +01:00
Jiri Benc 7e98056964 ipv6: router reachability probing
RFC 4191 states in 3.5:

   When a host avoids using any non-reachable router X and instead sends
   a data packet to another router Y, and the host would have used
   router X if router X were reachable, then the host SHOULD probe each
   such router X's reachability by sending a single Neighbor
   Solicitation to that router's address.  A host MUST NOT probe a
   router's reachability in the absence of useful traffic that the host
   would have sent to the router if it were reachable.  In any case,
   these probes MUST be rate-limited to no more than one per minute per
   router.

Currently, when the neighbour corresponding to a router falls into
NUD_FAILED, it's never considered again. Introduce a new rt6_nud_state
value, RT6_NUD_FAIL_PROBE, which suggests the route should not be used but
should be probed with a single NS. The probe is ratelimited by the existing
code. To better distinguish meanings of the failure values, rename
RT6_NUD_FAIL_SOFT to RT6_NUD_FAIL_DO_RR.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 16:02:58 -05:00
Jukka Rissanen 18722c2470 Bluetooth: Enable 6LoWPAN support for BT LE devices
This is initial version of
http://tools.ietf.org/html/draft-ietf-6lo-btle-00

By default the 6LoWPAN support is not activated and user
needs to tweak /sys/kernel/debug/bluetooth/hci0/6lowpan
file.

The kernel needs IPv6 support before 6LoWPAN is usable.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-11 12:57:55 -08:00
Neil Horman 9f70f46bd4 sctp: properly latch and use autoclose value from sock to association
Currently, sctp associations latch a sockets autoclose value to an association
at association init time, subject to capping constraints from the max_autoclose
sysctl value.  This leads to an odd situation where an application may set a
socket level autoclose timeout, but sliently sctp will limit the autoclose
timeout to something less than that.

Fix this by modifying the autoclose setsockopt function to check the limit, cap
it and warn the user via syslog that the timeout is capped.  This will allow
getsockopt to return valid autoclose timeout values that reflect what subsequent
associations actually use.

While were at it, also elimintate the assoc->autoclose variable, it duplicates
whats in the timeout array, which leads to multiple sources for the same
information, that may differ (as the former isn't subject to any capping).  This
gives us the timeout information in a canonical place and saves some space in
the association structure as well.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
CC: Wang Weidong <wangweidong1@huawei.com>
CC: David Miller <davem@davemloft.net>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 22:41:26 -05:00
Jiri Pirko 9a32b86043 dn_dev: add support for IFA_FLAGS nl attribute
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 21:50:00 -05:00
Paul Moore 10ae76faa9 cipso: cleanup cipso_v4_translate() when !CONFIG_NETLABEL
Don't needlessly recompute 'opt[opt_iter + 1]' as we already have it
stored in 'tag_len'.

Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 17:56:54 -05:00
Florent Fourcot 3308de2b84 ipv6: add ip6_flowlabel helper
And use it if possible.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 21:03:49 -05:00
Florent Fourcot 37cfee909c ipv6: move IPV6_TCLASS_MASK definition in ipv6.h
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 21:03:49 -05:00
Jiri Pirko bba24896f0 neigh: ipv6: respect default values set before an address is assigned to device
Make the behaviour similar to ipv4. This will allow user to set sysctl
default neigh param values and these values will be respected even by
devices registered before (that ones what do not have address set yet).

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:56:12 -05:00
Jiri Pirko 1d4c8c2984 neigh: restore old behaviour of default parms values
Previously inet devices were only constructed when addresses are added.
Therefore the default neigh parms values they get are the ones at the
time of these operations.

Now that we're creating inet devices earlier, this changes the behaviour
of default neigh parms values in an incompatible way (see bug #8519).

This patch creates a compromise by setting the default values at the
same point as before but only for those that have not been explicitly
set by the user since the inet device's creation.

Introduced by:
commit 8030f54499
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Thu Feb 22 01:53:47 2007 +0900

    [IPV4] devinet: Register inetdev earlier.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:56:12 -05:00
Jiri Pirko 73af614aed neigh: use tbl->family to distinguish ipv4 from ipv6
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:56:12 -05:00
Jiri Pirko cb5b09c17f neigh: wrap proc dointvec functions
This will be needed later on to provide better management of default values.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:56:12 -05:00
Jiri Pirko 1f9248e560 neigh: convert parms to an array
This patch converts the neigh param members to an array. This allows easier
manipulation which will be needed later on to provide better management of
default values.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:56:12 -05:00
David S. Miller 34f9f43710 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge 'net' into 'net-next' to get the AF_PACKET bug fix that
Daniel's direct transmit changes depend upon.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:20:14 -05:00
Eric Dumazet 95dc19299f pkt_sched: give visibility to mq slave qdiscs
Commit 6da7c8fcbc ("qdisc: allow setting default queuing discipline")
added the ability to change default qdisc from pfifo_fast to say fq

But as most modern ethernet devices are multiqueue, we cant really
see all the statistics from "tc -s qdisc show", as the default root
qdisc is mq.

This patch adds the calls to qdisc_list_add() to mq and mqprio

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 19:54:47 -05:00
Marcel Holtmann 53b834d233 Bluetooth: Use macros for connectionless slave broadcast features
Add the LMP feature constants for connectionless slave broadcast
and use them for capability testing.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-12-09 00:02:41 +04:00
Eric Leblond 97a2d41c47 netfilter: xt_NFQUEUE: separate reusable code
This patch prepares the addition of nft_queue module by moving
reusable code into a header file.

This patch also converts NFQUEUE to use prandom_u32 to initialize
the random jhash seed as suggested by Florian Westphal.

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-07 23:20:45 +01:00
Jiri Pirko 479840ffdb ipv6 addrconf: extend ifa_flags to u32
There is no more space in u8 ifa_flags. So do what davem suffested and
add another netlink attr called IFA_FLAGS for carry more flags.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 16:34:43 -05:00
David S. Miller f1abb346d8 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please pull this batch of updates intended for the 3.14 stream...

For the mac80211 bits, Johannes says:

"I have various improvements/cleanups/fixes all over, but the shortlog
shows that Luis's regulatory work and mesh work from the cozybit folks
are the biggest ones, along with the CSA fixes."

Along with that, we have big batches of updates to brcmfmac, rtlwifi,
and ath9k.  There are updates to wcn36xx, rt2x00, and a handful of
others as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 14:25:23 -05:00
Eric Dumazet f54b311142 tcp: auto corking
With the introduction of TCP Small Queues, TSO auto sizing, and TCP
pacing, we can implement Automatic Corking in the kernel, to help
applications doing small write()/sendmsg() to TCP sockets.

Idea is to change tcp_push() to check if the current skb payload is
under skb optimal size (a multiple of MSS bytes)

If under 'size_goal', and at least one packet is still in Qdisc or
NIC TX queues, set the TCP Small Queue Throttled bit, so that the push
will be delayed up to TX completion time.

This delay might allow the application to coalesce more bytes
in the skb in following write()/sendmsg()/sendfile() system calls.

The exact duration of the delay is depending on the dynamics
of the system, and might be zero if no packet for this flow
is actually held in Qdisc or NIC TX ring.

Using FQ/pacing is a way to increase the probability of
autocorking being triggered.

Add a new sysctl (/proc/sys/net/ipv4/tcp_autocorking) to control
this feature and default it to 1 (enabled)

Add a new SNMP counter : nstat -a | grep TcpExtTCPAutoCorking
This counter is incremented every time we detected skb was under used
and its flush was deferred.

Tested:

Interesting effects when using line buffered commands under ssh.

Excellent performance results in term of cpu usage and total throughput.

lpq83:~# echo 1 >/proc/sys/net/ipv4/tcp_autocorking
lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
9410.39

 Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':

      35209.439626 task-clock                #    2.901 CPUs utilized
             2,294 context-switches          #    0.065 K/sec
               101 CPU-migrations            #    0.003 K/sec
             4,079 page-faults               #    0.116 K/sec
    97,923,241,298 cycles                    #    2.781 GHz                     [83.31%]
    51,832,908,236 stalled-cycles-frontend   #   52.93% frontend cycles idle    [83.30%]
    25,697,986,603 stalled-cycles-backend    #   26.24% backend  cycles idle    [66.70%]
   102,225,978,536 instructions              #    1.04  insns per cycle
                                             #    0.51  stalled cycles per insn [83.38%]
    18,657,696,819 branches                  #  529.906 M/sec                   [83.29%]
        91,679,646 branch-misses             #    0.49% of all branches         [83.40%]

      12.136204899 seconds time elapsed

lpq83:~# echo 0 >/proc/sys/net/ipv4/tcp_autocorking
lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
6624.89

 Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':
      40045.864494 task-clock                #    3.301 CPUs utilized
               171 context-switches          #    0.004 K/sec
                53 CPU-migrations            #    0.001 K/sec
             4,080 page-faults               #    0.102 K/sec
   111,340,458,645 cycles                    #    2.780 GHz                     [83.34%]
    61,778,039,277 stalled-cycles-frontend   #   55.49% frontend cycles idle    [83.31%]
    29,295,522,759 stalled-cycles-backend    #   26.31% backend  cycles idle    [66.67%]
   108,654,349,355 instructions              #    0.98  insns per cycle
                                             #    0.57  stalled cycles per insn [83.34%]
    19,552,170,748 branches                  #  488.244 M/sec                   [83.34%]
       157,875,417 branch-misses             #    0.81% of all branches         [83.34%]

      12.130267788 seconds time elapsed

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 12:51:41 -05:00
Jeff Kirsher a6227e26d9 include/net/: Fix FSF address in file headers
Several files refer to an old address for the Free Software Foundation
in the file header comment.  Resolve by replacing the address with
the URL <http://www.gnu.org/licenses/> so that we do not have to keep
updating the header comments anytime the address changes.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 12:37:56 -05:00
Jeff Kirsher 4b2f13a251 sctp: Fix FSF address in file headers
Several files refer to an old address for the Free Software Foundation
in the file header comment.  Resolve by replacing the address with
the URL <http://www.gnu.org/licenses/> so that we do not have to keep
updating the header comments anytime the address changes.

CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 12:37:56 -05:00
John W. Linville d86804cb70 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-12-06 10:37:24 -05:00
Steffen Klassert 0e0d44ab42 net: Remove FLOWI_FLAG_CAN_SLEEP
FLOWI_FLAG_CAN_SLEEP was used to notify xfrm about the posibility
to sleep until the needed states are resolved. This code is gone,
so FLOWI_FLAG_CAN_SLEEP is not needed anymore.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-12-06 07:24:39 +01:00
Steffen Klassert 5b8ef3415a xfrm: Remove ancient sleeping when the SA is in acquire state
We now queue packets to the policy if the states are not yet resolved,
this replaces the ancient sleeping code. Also the sleeping can cause
indefinite task hangs if the needed state does not get resolved.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-12-06 07:24:31 +01:00
Fan Du 283bc9f35b xfrm: Namespacify xfrm state/policy locks
By semantics, xfrm layer is fully name space aware,
so will the locks, e.g. xfrm_state/pocliy_lock.
Ensure exclusive access into state/policy link list
for different name space with one global lock is not
right in terms of semantics aspect at first place,
as they are indeed mutually independent with each
other, but also more seriously causes scalability
problem.

One practical scenario is on a Open Network Stack,
more than hundreds of lxc tenants acts as routers
within one host, a global xfrm_state/policy_lock
becomes the bottleneck. But onces those locks are
decoupled in a per-namespace fashion, locks contend
is just with in specific name space scope, without
causing additional SPD/SAD access delay for other
name space.

Also this patch improve scalability while as without
changing original xfrm behavior.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-12-06 06:45:06 +01:00
Fan Du 8d549c4f5d xfrm: Using the right namespace to migrate key info
because the home agent could surely be run on a different
net namespace other than init_net. The original behavior
could lead into inconsistent of key info.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-12-06 06:45:05 +01:00
Eric W. Biederman 7f2cbdc28c tcp_memcontrol: Cleanup/fix cg_proto->memory_pressure handling.
kill memcg_tcp_enter_memory_pressure.  The only function of
memcg_tcp_enter_memory_pressure was to reduce deal with the
unnecessary abstraction that was tcp_memcontrol.  Now that struct
tcp_memcontrol is gone remove this unnecessary function, the
unnecessary function pointer, and modify sk_enter_memory_pressure to
set this field directly, just as sk_leave_memory_pressure cleas this
field directly.

This fixes a small bug I intruduced when killing struct tcp_memcontrol
that caused memcg_tcp_enter_memory_pressure to never be called and
thus failed to ever set cg_proto->memory_pressure.

Remove the cg_proto enter_memory_pressure function as it now serves
no useful purpose.

Don't test cg_proto->memory_presser in sk_leave_memory_pressure before
clearing it.  The test was originally there to ensure that the pointer
was non-NULL.  Now that cg_proto is not a pointer the pointer does not
matter.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-05 21:01:01 -05:00
Paul Durrant 1431fb31ec xen-netback: fix fragment detection in checksum setup
The code to detect fragments in checksum_setup() was missing for IPv4 and
too eager for IPv6. (It transpires that Windows seems to send IPv6 packets
with a fragment header even if they are not a fragment - i.e. offset is zero,
and M bit is not set).

This patch also incorporates a fix to callers of maybe_pull_tail() where
skb->network_header was being erroneously added to the length argument.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
cc: David Miller <davem@davemloft.net>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-05 20:31:40 -05:00
Janusz Dziedzic d1e33e654e cfg80211: in bitrate_mask, rename mcs to ht_mcs
Rename NL80211_TXRATE_MCS to NL80211_TXRATE_HT and also
rename mcs to ht_mcs in struct cfg80211_bitrate_mask.

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
[reword commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-05 16:39:07 +01:00
Johan Hedberg 0ce43ce60d Bluetooth: Simplify l2cap_chan initialization for LE CoC
The values in l2cap_chan that are used for actually transmitting data
only need to be initialized right after we've received an L2CAP Connect
Request or just before we send one. The only thing that we need to
initialize though bind() and connect() is the chan->mode value. This way
all other initializations can be done in the l2cap_le_flowctl_init
function (which now becomes private to l2cap_core.c) and the
l2cap_le_flowctl_start function can be completely removed.

Also, since the l2cap_sock_init function initializes the imtu and omtu
to adequate values these do not need to be part of l2cap_le_flowctl_init.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:36 -08:00
Johan Hedberg e77af75592 Bluetooth: Fix CID ranges for LE CoC CID allocations
LE CoC used differend CIC ranges than BR/EDR L2CAP. The start of the
range is the same (0x0040) but the range ends at 0x007f (unlike BR/EDR
where it goes all the way to 0xffff).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:36 -08:00
Johan Hedberg 595177f311 Bluetooth: Fix LE L2CAP Connect Request handling together with SMP
Unlike BR/EDR, for LE when we're in the BT_CONNECT state we may or may
not have already have sent the Connect Request. This means that we need
some extra tracking of the request. This patch adds an extra channel
flag to prevent the request from being sent a second time.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:35 -08:00
Johan Hedberg 837776f790 Bluetooth: Introduce L2CAP channel callback for suspending
Setting the BT_SK_SUSPEND socket flag from the L2CAP core is causing a
dependency on the socket. So instead of doing that, use a channel
callback into the socket handling to suspend.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:35 -08:00
Johan Hedberg 1f435424ce Bluetooth: Add new BT_SNDMTU and BT_RCVMTU socket options
This patch adds new socket options for LE sockets since the existing
L2CAP_OPTIONS socket option is not usable for LE. For now, the new
socket options also require LE CoC support to be explicitly enabled to
leave some playroom in case something needs to be changed in a backwards
incompatible way.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:34 -08:00
Johan Hedberg 0cd75f7ed7 Bluetooth: Track LE L2CAP credits in l2cap_chan
This patch adds tracking of L2CAP connection oriented channel local and
remote credits to struct l2cap_chan and ensures that connect requests
and responses contain the right values.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:34 -08:00
Johan Hedberg 3831971355 Bluetooth: Add LE L2CAP flow control mode
The LE connection oriented channels have their own mode with its own
data transfer rules. In order to implement this properly we need to
distinguish L2CAP channels operating in this mode from other modes.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:34 -08:00
Johan Hedberg 27e2d4c8d2 Bluetooth: Add basic LE L2CAP connect request receiving support
This patch adds the necessary boiler plate code to handle receiving
L2CAP connect requests over LE and respond to them with a proper connect
response.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:33 -08:00
Johan Hedberg ee5ec5cf00 Bluetooth: Add definitions for LE connection oriented channels
This patch adds the necessary defines and structs for LE connection
oriented channels.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:32 -08:00
Johan Hedberg 9149761ad7 Bluetooth: Add module parameter to enable LE CoC support
Along with the L2CAP Connection Oriented Channels features it is now
allowed to use both custom fixed CIDs as well as PSM based (connection
oriented connections). Since the support for this (with the subsequent
patches) is still on an experimental stage, add a module parameter to
enable it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-12-05 07:05:32 -08:00
Johannes Berg ad7e718c9b nl80211: vendor command support
Add support for vendor-specific commands to nl80211. This is
intended to be used for really vendor-specific functionality
that can't be implemented in a generic fashion for any reason.
It's *NOT* intended to be used for any normal/generic feature
or any optimisations that could be implemented across drivers.

Currently, only vendor commands (with replies) are supported,
no dump operations or vendor-specific notifications.

Also add a function wdev_to_ieee80211_vif() to mac80211 which
is needed for mac80211-based drivers wanting to implement any
vendor commands.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-03 16:27:17 +01:00
Marek Kwaczynski 60f4a7b167 nl80211/cfg80211: Set Operating Mode Notification
This attribute is needed for setting Operating Mode Notification
in AP mode from User Space. This functionality is required when
User Space received Assoc Request contains Operation Mode
Notification element.

Signed-off-by: Marek Kwaczynski <marek.kwaczynski@tieto.com>
[fix typos, nl80211 documentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-03 13:46:42 +01:00
Johannes Berg 18cfd3bfc9 Revert "mac80211: add driver callback for per-interface multicast filter"
This reverts commit 488b366a45.

The code isn't used by anyone, and the Intel driver isn't planning
to use it either right now.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-03 13:21:55 +01:00
John W. Linville 4b074b0762 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2013-12-02 14:25:38 -05:00
Simon Wunderlich e487eaeb07 cfg80211/mac80211/ath6kl: acquire wdev lock outside ch_switch_notify
The channel switch notification should be sent under the
wdev/sdata-lock, preferably in the same moment as the channel change
happens, to avoid races by other callers (e.g. start/stop_ap).
This also adds the previously missing sdata_lock protection in
csa_finalize_work.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-02 11:51:54 +01:00
Andrei Otcheretianski b176e62940 cfg80211: aggregate mgmt_tx parameters into a struct
Change cfg80211 and mac80211 to use cfg80211_mgmt_tx_params
struct to aggregate parameters for mgmt_tx functions.
This makes the functions' signatures less clumsy and allows
less painful parameters extension.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
[fix all other drivers]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-02 11:51:52 +01:00
Xufeng Zhang 6eabca54d6 sctp: Restore 'resent' bit to avoid retransmitted chunks for RTT measurements
Currently retransmitted DATA chunks could also be used for
RTT measurements since there are no flag to identify whether
the transmitted DATA chunk is a new one or a retransmitted one.
This problem is introduced by commit ae19c5486 ("sctp: remove
'resent' bit from the chunk") which inappropriately removed the
'resent' bit completely, instead of doing this, we should set
the resent bit only for the retransmitted DATA chunks.

Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-28 18:29:58 -05:00
Luis R. Rodriguez 4c7d3982a6 cfg80211: use enum nl80211_dfs_regions for dfs_region everywhere
u8 was used in some other places, just stick to the enum,
this forces us to express the values that are expected.

Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:52:12 +01:00
Eliad Peller 21f659bf1f mac80211: add min required channel definition field
Add a new field to ieee80211_chanctx_conf to indicate
the min required channel configuration.

Tuning to a narrower channel might help reducing
the noise level and saving some power.

The min required channel definition is the max of
all min required channel definitions of the interfaces
bound to this channel context.

In AP mode, use 20MHz when there are no connected station.
When a new station is added/removed, calculate the new max
bandwidth supported by any of the stations (e.g. 80MHz when
80MHz and 40MHz stations are connected).

In other cases, simply use bss_conf.chandef as the
min required chandef.

Notify drivers about changes to this field by calling
drv_change_chanctx with a new CHANGE_MIN_WIDTH notification.

Signed-off-by: Eliad Peller <eliad@wizery.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:52:05 +01:00
Luis R. Rodriguez 2a901468c2 cfg80211: add an option to disable processing country IEs
Certain vendors may want to disable the processing of
country IEs so that they can continue using the regulatory
domain the driver or user has set.  Currently there is no
way to stop the core from processing country IEs, so add
support to the core to ignore country IE hints.

Cc: Mihir Shete <smihir@qti.qualcomm.com>
Cc: Henri Bahini <hbahini@qca.qualcomm.com>
Cc: Tushnim Bhattacharyya <tushnimb@qca.qualcomm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:51:51 +01:00
Luis R. Rodriguez a09a85a013 cfg80211: add flags to define country IE processing rules
802.11 cards may have different country IE parsing behavioural
preferences and vendors may want to support these. These preferences
were managed by the REGULATORY_CUSTOM_REG and the REGULATORY_STRICT_REG
flags and their combination. Instead of using this existing notation,
split out the country IE behavioural preferences as a new flag. This
will allow us to add more customizations easily and make the code more
maintainable.

Cc: Mihir Shete <smihir@qti.qualcomm.com>
Cc: Henri Bahini <hbahini@qca.qualcomm.com>
Cc: Tushnim Bhattacharyya <tushnimb@qca.qualcomm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
[fix up conflicts]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:51:49 +01:00
Luis R. Rodriguez a2f73b6c5d cfg80211: move regulatory flags to their own variable
We'll expand this later, this will make it easier to
classify and review what things are related to regulatory
or not.

Coccinelle only missed 4 hits, which I had to do manually,
supplying the SmPL in case of merge conflicts.

@@
struct wiphy *wiphy;
@@
-wiphy->flags |= WIPHY_FLAG_CUSTOM_REGULATORY
+wiphy->regulatory_flags |= REGULATORY_CUSTOM_REG
@@
expression e;
@@
-e->flags |= WIPHY_FLAG_CUSTOM_REGULATORY
+e->regulatory_flags |= REGULATORY_CUSTOM_REG
@@
struct wiphy *wiphy;
@@
-wiphy->flags &= ~WIPHY_FLAG_CUSTOM_REGULATORY
+wiphy->regulatory_flags &= ~REGULATORY_CUSTOM_REG
@@
struct wiphy *wiphy;
@@
-wiphy->flags & WIPHY_FLAG_CUSTOM_REGULATORY
+wiphy->regulatory_flags & REGULATORY_CUSTOM_REG

@@
struct wiphy *wiphy;
@@
-wiphy->flags |= WIPHY_FLAG_STRICT_REGULATORY
+wiphy->regulatory_flags |= REGULATORY_STRICT_REG
@@
expression e;
@@
-e->flags |= WIPHY_FLAG_STRICT_REGULATORY
+e->regulatory_flags |= REGULATORY_STRICT_REG
@@
struct wiphy *wiphy;
@@
-wiphy->flags &= ~WIPHY_FLAG_STRICT_REGULATORY
+wiphy->regulatory_flags &= ~REGULATORY_STRICT_REG
@@
struct wiphy *wiphy;
@@
-wiphy->flags & WIPHY_FLAG_STRICT_REGULATORY
+wiphy->regulatory_flags & REGULATORY_STRICT_REG

@@
struct wiphy *wiphy;
@@
-wiphy->flags |= WIPHY_FLAG_DISABLE_BEACON_HINTS
+wiphy->regulatory_flags |= REGULATORY_DISABLE_BEACON_HINTS
@@
expression e;
@@
-e->flags |= WIPHY_FLAG_DISABLE_BEACON_HINTS
+e->regulatory_flags |= REGULATORY_DISABLE_BEACON_HINTS
@@
struct wiphy *wiphy;
@@
-wiphy->flags &= ~WIPHY_FLAG_DISABLE_BEACON_HINTS
+wiphy->regulatory_flags &= ~REGULATORY_DISABLE_BEACON_HINTS
@@
struct wiphy *wiphy;
@@
-wiphy->flags & WIPHY_FLAG_DISABLE_BEACON_HINTS
+wiphy->regulatory_flags & REGULATORY_DISABLE_BEACON_HINTS

Generated-by: Coccinelle SmPL
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Peter Senna Tschudin <peter.senna@gmail.com>
Cc: Mihir Shete <smihir@qti.qualcomm.com>
Cc: Henri Bahini <hbahini@qca.qualcomm.com>
Cc: Tushnim Bhattacharyya <tushnimb@qca.qualcomm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
[fix up whitespace damage, overly long lines]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:51:46 +01:00
Max Stepanov 2475b1cc0d mac80211: add generic cipher scheme support
This adds generic cipher scheme support to mac80211, such schemes
are fully under control by the driver. On hw registration drivers
may specify additional HW ciphers with a scheme how these ciphers
have to be handled by mac80211 TX/RR. A cipher scheme specifies a
cipher suite value, a size of the security header to be added to
or stripped from frames and how the PN is to be verified on RX.

Signed-off-by: Max Stepanov <Max.Stepanov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:50:52 +01:00
Janusz Dziedzic d2859df5e7 cfg80211/mac80211: DFS setup chandef for cac event
To report channel width correctly we have
to send correct channel parameters from
mac80211 when calling cfg80211_cac_event().

This is required in case of using channel width
higher than 20MHz and we have to set correct
dfs channel state after CAC (NL80211_DFS_AVAILABLE).

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:50:46 +01:00
Luis R. Rodriguez 222ea58199 cfg80211: force WIPHY_FLAG_CUSTOM_REGULATORY on wiphy_apply_custom_regulatory()
wiphy_apply_custom_regulatory() implies WIPHY_FLAG_CUSTOM_REGULATORY
but we never enforced it, do that now and warn if the driver
didn't set it. All drivers should be following this today already.

Having WIPHY_FLAG_CUSTOM_REGULATORY does not however mean you will
use wiphy_apply_custom_regulatory() though, you may have your own
_orig value set up tools / helpers. The intel drivers are examples
of this type of driver.

Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:49:45 +01:00
Luis R. Rodriguez 8fe02e167e cfg80211: consolidate passive-scan and no-ibss flags
These two flags are used for the same purpose, just
combine them into a no-ir flag to annotate no initiating
radiation is allowed.

Old userspace sending either flag will have it treated as
the no-ir flag. To be considerate to older userspace we
also send both the no-ir flag and the old no-ibss flags.
Newer userspace will have to be aware of older kernels.

Update all places in the tree using these flags with the
following semantic patch:

@@
@@
-NL80211_RRF_PASSIVE_SCAN
+NL80211_RRF_NO_IR
@@
@@
-NL80211_RRF_NO_IBSS
+NL80211_RRF_NO_IR
@@
@@
-IEEE80211_CHAN_PASSIVE_SCAN
+IEEE80211_CHAN_NO_IR
@@
@@
-IEEE80211_CHAN_NO_IBSS
+IEEE80211_CHAN_NO_IR
@@
@@
-NL80211_RRF_NO_IR | NL80211_RRF_NO_IR
+NL80211_RRF_NO_IR
@@
@@
-IEEE80211_CHAN_NO_IR | IEEE80211_CHAN_NO_IR
+IEEE80211_CHAN_NO_IR
@@
@@
-(NL80211_RRF_NO_IR)
+NL80211_RRF_NO_IR
@@
@@
-(IEEE80211_CHAN_NO_IR)
+IEEE80211_CHAN_NO_IR

Along with some hand-optimisations in documentation, to
remove duplicates and to fix some indentation.

Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
[do all the driver updates in one go]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:49:35 +01:00
Hannes Frederic Sowa 85fbaa7503 inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu functions
Commit bceaa90240 ("inet: prevent leakage
of uninitialized memory to user in recv syscalls") conditionally updated
addr_len if the msg_name is written to. The recv_error and rxpmtu
functions relied on the recvmsg functions to set up addr_len before.

As this does not happen any more we have to pass addr_len to those
functions as well and set it to the size of the corresponding sockaddr
length.

This broke traceroute and such.

Fixes: bceaa90240 ("inet: prevent leakage of uninitialized memory to user in recv syscalls")
Reported-by: Brad Spengler <spender@grsecurity.net>
Reported-by: Tom Labanowski
Cc: mpb <mpb.mail@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-23 14:46:23 -08:00
Johannes Berg 91398a0992 genetlink: fix genl_set_err() group ID
Fix another really stupid bug - I introduced genl_set_err()
precisely to be able to adjust the group and reject invalid
ones, but then forgot to do so.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-21 13:09:43 -05:00
Johannes Berg 220815a966 genetlink: fix genlmsg_multicast() bug
Unfortunately, I introduced a tremendously stupid bug into
genlmsg_multicast() when doing all those multicast group
changes: it adjusts the group number, but then passes it
to genlmsg_multicast_netns() which does that again.

Somehow, my tests failed to catch this, so add a warning
into genlmsg_multicast_netns() and remove the offending
group ID adjustment.

Also add a warning to the similar code in other functions
so people who misuse them are more loudly warned.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-21 13:09:43 -05:00
Linus Torvalds 1ee2dcc224 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Mostly these are fixes for fallout due to merge window changes, as
  well as cures for problems that have been with us for a much longer
  period of time"

 1) Johannes Berg noticed two major deficiencies in our genetlink
    registration.  Some genetlink protocols we passing in constant
    counts for their ops array rather than something like
    ARRAY_SIZE(ops) or similar.  Also, some genetlink protocols were
    using fixed IDs for their multicast groups.

    We have to retain these fixed IDs to keep existing userland tools
    working, but reserve them so that other multicast groups used by
    other protocols can not possibly conflict.

    In dealing with these two problems, we actually now use less state
    management for genetlink operations and multicast groups.

 2) When configuring interface hardware timestamping, fix several
    drivers that simply do not validate that the hwtstamp_config value
    is one the driver actually supports.  From Ben Hutchings.

 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.

 4) In dev_forward_skb(), set the skb->protocol in the right order
    relative to skb_scrub_packet().  From Alexei Starovoitov.

 5) Bridge erroneously fails to use the proper wrapper functions to make
    calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid.  Fix from Toshiaki
    Makita.

 6) When detaching a bridge port, make sure to flush all VLAN IDs to
    prevent them from leaking, also from Toshiaki Makita.

 7) Put in a compromise for TCP Small Queues so that deep queued devices
    that delay TX reclaim non-trivially don't have such a performance
    decrease.  One particularly problematic area is 802.11 AMPDU in
    wireless.  From Eric Dumazet.

 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
    here.  Fix from Eric Dumzaet, reported by Dave Jones.

 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.

10) When computing mergeable buffer sizes, virtio-net fails to take the
    virtio-net header into account.  From Michael Dalton.

11) Fix seqlock deadlock in ip4_datagram_connect() wrt.  statistic
    bumping, this one has been with us for a while.  From Eric Dumazet.

12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
    Hugne.

13) 6lowpan bit used for traffic classification was wrong, from Jukka
    Rissanen.

14) macvlan has the same issue as normal vlans did wrt.  propagating LRO
    disabling down to the real device, fix it the same way.  From Michal
    Kubecek.

15) CPSW driver needs to soft reset all slaves during suspend, from
    Daniel Mack.

16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.

17) The xen-netfront RX buffer refill timer isn't properly scheduled on
    partial RX allocation success, from Ma JieYue.

18) When ipv6 ping protocol support was added, the AF_INET6 protocol
    initialization cleanup path on failure was borked a little.  Fix
    from Vlad Yasevich.

19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
    blocks we can do the wrong thing with the msg_name we write back to
    userspace.  From Hannes Frederic Sowa.  There is another fix in the
    works from Hannes which will prevent future problems of this nature.

20) Fix route leak in VTI tunnel transmit, from Fan Du.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  genetlink: make multicast groups const, prevent abuse
  genetlink: pass family to functions using groups
  genetlink: add and use genl_set_err()
  genetlink: remove family pointer from genl_multicast_group
  genetlink: remove genl_unregister_mc_group()
  hsr: don't call genl_unregister_mc_group()
  quota/genetlink: use proper genetlink multicast APIs
  drop_monitor/genetlink: use proper genetlink multicast APIs
  genetlink: only pass array to genl_register_family_with_ops()
  tcp: don't update snd_nxt, when a socket is switched from repair mode
  atm: idt77252: fix dev refcnt leak
  xfrm: Release dst if this dst is improper for vti tunnel
  netlink: fix documentation typo in netlink_set_err()
  be2net: Delete secondary unicast MAC addresses during be_close
  be2net: Fix unconditional enabling of Rx interface options
  net, virtio_net: replace the magic value
  ping: prevent NULL pointer dereference on write to msg_name
  bnx2x: Prevent "timeout waiting for state X"
  bnx2x: prevent CFC attention
  bnx2x: Prevent panic during DMAE timeout
  ...
2013-11-19 15:50:47 -08:00
Johannes Berg 2a94fe48f3 genetlink: make multicast groups const, prevent abuse
Register generic netlink multicast groups as an array with
the family and give them contiguous group IDs. Then instead
of passing the global group ID to the various functions that
send messages, pass the ID relative to the family - for most
families that's just 0 because the only have one group.

This avoids the list_head and ID in each group, adding a new
field for the mcast group ID offset to the family.

At the same time, this allows us to prevent abusing groups
again like the quota and dropmon code did, since we can now
check that a family only uses a group it owns.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:06 -05:00
Johannes Berg 68eb55031d genetlink: pass family to functions using groups
This doesn't really change anything, but prepares for the
next patch that will change the APIs to pass the group ID
within the family, rather than the global group ID.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:06 -05:00
Johannes Berg 62b68e99fa genetlink: add and use genl_set_err()
Add a static inline to generic netlink to wrap netlink_set_err()
to make it easier to use here - use it in openvswitch (the only
generic netlink user of netlink_set_err()).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:06 -05:00
Johannes Berg c2ebb90846 genetlink: remove family pointer from genl_multicast_group
There's no reason to have the family pointer there since it
can just be passed internally where needed, so remove it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:06 -05:00
Johannes Berg 06fb555a27 genetlink: remove genl_unregister_mc_group()
There are no users of this API remaining, and we'll soon
change group registration to be static (like ops are now)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:06 -05:00
Johannes Berg c53ed74236 genetlink: only pass array to genl_register_family_with_ops()
As suggested by David Miller, make genl_register_family_with_ops()
a macro and pass only the array, evaluating ARRAY_SIZE() in the
macro, this is a little safer.

The openvswitch has some indirection, assing ops/n_ops directly in
that code. This might ultimately just assign the pointers in the
family initializations, saving the struct genl_family_and_ops and
code (once mcast groups are handled differently.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:05 -05:00
Johannes Berg 568508aa07 genetlink: unify registration functions
Now that the ops assignment is just two variables rather than a
long list iteration etc., there's no reason to separately export
__genl_register_family() and __genl_register_family_with_ops().

Unify the two functions into __genl_register_family() and make
genl_register_family_with_ops() call it after assigning the ops.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-15 20:50:23 -05:00
Linus Torvalds 9073e1a804 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual earth-shaking, news-breaking, rocket science pile from
  trivial.git"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
  doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
  doc: add missing files to timers/00-INDEX
  timekeeping: Fix some trivial typos in comments
  mm: Fix some trivial typos in comments
  irq: Fix some trivial typos in comments
  NUMA: fix typos in Kconfig help text
  mm: update 00-INDEX
  doc: Documentation/DMA-attributes.txt fix typo
  DRM: comment: `halve' -> `half'
  Docs: Kconfig: `devlopers' -> `developers'
  doc: typo on word accounting in kprobes.c in mutliple architectures
  treewide: fix "usefull" typo
  treewide: fix "distingush" typo
  mm/Kconfig: Grammar s/an/a/
  kexec: Typo s/the/then/
  Documentation/kvm: Update cpuid documentation for steal time and pv eoi
  treewide: Fix common typo in "identify"
  __page_to_pfn: Fix typo in comment
  Correct some typos for word frequency
  clk: fixed-factor: Fix a trivial typo
  ...
2013-11-15 16:47:22 -08:00
Johannes Berg 3f5ccd06ae genetlink: make genl_ops flags a u8 and move to end
To save some space in the struct on 32-bit systems,
make the flags a u8 (only 4 bits are used) and also
move them to the end of the struct.

This has no impact on 64-bit systems as alignment of
the struct in an array uses up the space anyway.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:41 -05:00
Johannes Berg f84f771d94 genetlink: allow making ops const
Allow making the ops array const by not modifying the ops
flags on registration but rather only when ops are sent
out in the family information.

No users are updated yet except for the pre_doit/post_doit
calls in wireless (the only ones that exist now.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:41 -05:00
Johannes Berg d91824c08f genetlink: register family ops as array
Instead of using a linked list, use an array. This reduces
the data size needed by the users of genetlink, for example
in wireless (net/wireless/nl80211.c) on 64-bit it frees up
over 1K of data space.

Remove the attempted sending of CTRL_CMD_NEWOPS ctrl event
since genl_ctrl_event(CTRL_CMD_NEWOPS, ...) only returns
-EINVAL anyway, therefore no such event could ever be sent.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:41 -05:00
Johannes Berg 3686ec5e84 genetlink: remove genl_register_ops/genl_unregister_ops
genl_register_ops() is still needed for internal registration,
but is no longer available to users of the API.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:40 -05:00
Jiri Pirko 6aafeef03b netfilter: push reasm skb through instead of original frag skbs
Pushing original fragments through causes several problems. For example
for matching, frags may not be matched correctly. Take following
example:

<example>
On HOSTA do:
ip6tables -I INPUT -p icmpv6 -j DROP
ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT

and on HOSTB you do:
ping6 HOSTA -s2000    (MTU is 1500)

Incoming echo requests will be filtered out on HOSTA. This issue does
not occur with smaller packets than MTU (where fragmentation does not happen)
</example>

As was discussed previously, the only correct solution seems to be to use
reassembled skb instead of separete frags. Doing this has positive side
effects in reducing sk_buff by one pointer (nfct_reasm) and also the reams
dances in ipvs and conntrack can be removed.

Future plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c
entirely and use code in net/ipv6/reassembly.c instead.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-11 00:19:35 -05:00
Florent Fourcot 3fdfa5ff50 ipv6: enable IPV6_FLOWLABEL_MGR for getsockopt
It is already possible to set/put/renew a label
with IPV6_FLOWLABEL_MGR and setsockopt. This patch
add the possibility to get information about this
label (current value, time before expiration, etc).

It helps application to take decision for a renew
or a release of the label.

v2:
 * Add spin_lock to prevent race condition
 * return -ENOENT if no result found
 * check if flr_action is GET

v3:
 * move the spin_lock to protect only the
   relevant code

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-08 13:42:57 -05:00
John W. Linville c1f3bb6bd3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-11-08 09:03:10 -05:00
Mathias Krause 0c7ddf36c2 net: move pskb_put() to core code
This function has usage beside IPsec so move it to the core skbuff code.
While doing so, give it some documentation and change its return type to
'unsigned char *' to be in line with skb_put().

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 19:28:58 -05:00
Joe Perches f9bddcdf1f udp: Remove unnecessary semicolon from do{}while (0) macro
Just an unnecessary semicolon that should be removed...

Whitespace neatening of macro too.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 02:14:33 -05:00
Hannes Frederic Sowa 482fc6094a ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
their sockets won't accept and install new path mtu information and they
will always use the interface mtu for outgoing packets. It is guaranteed
that the packet is not fragmented locally. But we won't set the DF-Flag
on the outgoing frames.

Florian Weimer had the idea to use this flag to ensure DNS servers are
never generating outgoing fragments. They may well be fragmented on the
path, but the server never stores or usees path mtu values, which could
well be forged in an attack.

(The root of the problem with path MTU discovery is that there is
no reliable way to authenticate ICMP Fragmentation Needed But DF Set
messages because they are sent from intermediate routers with their
source addresses, and the IMCP payload will not always contain sufficient
information to identify a flow.)

Recent research in the DNS community showed that it is possible to
implement an attack where DNS cache poisoning is feasible by spoofing
fragments. This work was done by Amir Herzberg and Haya Shulman:
<https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf>

This issue was previously discussed among the DNS community, e.g.
<http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>,
without leading to fixes.

This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode
regarding local fragmentation with UFO/CORK" for the enforcement of the
non-fragmentable checks. If other users than ip_append_page/data should
use this semantic too, we have to add a new flag to IPCB(skb)->flags to
suppress local fragmentation and check for this in ip_finish_output.

Many thanks to Florian Weimer for the idea and feedback while implementing
this patch.

Cc: David S. Miller <davem@davemloft.net>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-05 21:52:27 -05:00
John W. Linville 33b443422e Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2013-11-05 15:50:22 -05:00
John W. Linville 353c78152c Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Conflicts:
	net/wireless/reg.c
2013-11-05 15:49:02 -05:00
David S. Miller cfce0a2b61 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please accept the following pull request intended for the 3.13 tree...

I had intended to pass most of these to you as much as two weeks ago.
Unfortunately, I failed to account for the effects of bad Internet
connections and my own fatique/laziness while traveling.  On the bright
side, at least these have been baking in linux-next for some time!

For the mac80211 bits, Johannes says:

"This time I have two fixes for P2P (which requires not using CCK rates)
and a workaround for APs with broken WMM information."

For the iwlwifi bits, Johannes says:

"I have a few fixes for warnings/issues: one from Alex, fixing scan
timings, one from Emmanuel fixing a WARN_ON in the DVM driver, one from
Stanislaw removing a trigger-happy WARN_ON in the MVM driver and a
change from myself to try to recover when the device isn't processing
commands quickly."

And:

"For this round, I have a lot of changes:
 * power management improvements
 * BT coexistence improvements/updates
 * new device support
 * VHT support
 * IBSS support (though due to a small bug it requires new firmware)
 * various other fixes/improvements."

For the Bluetooth bits, Gustavo says:

"More patches for 3.12, busy times for Bluetooth. More than a 100 commits since
the last pull. The bulk of work comes from Johan and Marcel, they are doing
fixes and improvements all over the Bluetooth subsystem, as the diffstat can
show."

For the ath10k and ath6kl bits, Kalle says:

"Bartosz added support to ath10k for our 10.x AP firmware branch, which
gives us AP specific features and fixes. We still support the main
firmware branch as well just like before, ath10k detects runtime what
firmware is used. Unfortunately the firmware interface in 10.x branch is
somewhat different so there was quite a lot of changes in ath10k for
this.

Michal and Sujith did some performance improvements in ath10k. Vladimir
fixed a compiler warning and Fengguang removed an extra semicolon."

For the NFC bits, Samuel says:

"It's a fairly big one, with the following highlights:

- NFC digital layer implementation: Most NFC chipsets implement the NFC
  digital layer in firmware, but others have more basic functionalities
  and expect the host to implement the digital layer. This layer sits
  below the NFC core.

- Sony's port100 support: This is "soft" NFC USB dongle that expects the
  digital layer to be implemented on the host. This is the first user of
  our NFC digital stack implementation.

- Secure element API: We now provide a netlink API for enabling,
  disabling and discovering NFC attached (embedded or UICC ones) secure
  elements. With some userspace help, this allows us to support NFC
  payments.
  Only the pn544 driver currently supports that API.

- NCI SPI fixes and improvements: In order to support NCI devices over
  SPI, we fixed and improved our NCI/SPI implementation. The currently
  most deployed NFC NCI chipset, Broadcom's bcm2079x, supports that mode
  and we're planning to use our NCI/SPI framework to implement a
  driver for it.

- pn533 fragmentation support in target mode: This was the only missing
  feature from our pn533 impementation. We now support fragmentation in
  both Tx and Rx modes, in target mode."

On top of all that, brcmfmac and rt2x00 both get the usual flurry
of updates.  A few other drivers get hit here or there as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-05 02:34:57 -05:00
Jesper Dangaard Brouer 1ba3aab303 net: codel: Avoid undefined behavior from signed overflow
As described in commit 5a581b367 (jiffies: Avoid undefined
behavior from signed overflow), according to the C standard
3.4.3p3, overflow of a signed integer results in undefined
behavior.

To fix this, do as the above commit, and do an unsigned
subtraction, and interpreting the result as a signed
two's-complement number.  This is based on the theory from
RFC 1982 and is nicely described in wikipedia here:
 https://en.wikipedia.org/wiki/Serial_number_arithmetic#General_Solution

A side-note, I have seen practical issues with the previous logic
when dealing with 16-bit, on a 64-bit machine (gcc version
4.4.5). This were 32-bit, which I have not observed issues with.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 20:01:29 -05:00
Yuchung Cheng 9f9843a751 tcp: properly handle stretch acks in slow start
Slow start now increases cwnd by 1 if an ACK acknowledges some packets,
regardless the number of packets. Consequently slow start performance
is highly dependent on the degree of the stretch ACKs caused by
receiver or network ACK compression mechanisms (e.g., delayed-ACK,
GRO, etc).  But slow start algorithm is to send twice the amount of
packets of packets left so it should process a stretch ACK of degree
N as if N ACKs of degree 1, then exits when cwnd exceeds ssthresh. A
follow up patch will use the remainder of the N (if greater than 1)
to adjust cwnd in the congestion avoidance phase.

In addition this patch retires the experimental limited slow start
(LSS) feature. LSS has multiple drawbacks but questionable benefit. The
fractional cwnd increase in LSS requires a loop in slow start even
though it's rarely used. Configuring such an increase step via a global
sysctl on different BDPS seems hard. Finally and most importantly the
slow start overshoot concern is now better covered by the Hybrid slow
start (hystart) enabled by default.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 19:57:59 -05:00