Commit graph

29875 commits

Author SHA1 Message Date
Daniel Borkmann e6d8b64b34 net: sctp: fix and consolidate SCTP checksumming code
This fixes an outstanding bug found through IPVS, where SCTP packets
with skb->data_len > 0 (non-linearized) and empty frag_list, but data
accumulated in frags[] member, are forwarded with incorrect checksum
letting SCTP initial handshake fail on some systems. Linearizing each
SCTP skb in IPVS to prevent that would not be a good solution as
this leads to an additional and unnecessary performance penalty on
the load-balancer itself for no good reason (as we actually only want
to update the checksum, and can do that in a different/better way
presented here).

The actual problem is elsewhere, namely, that SCTP's checksumming
in sctp_compute_cksum() does not take frags[] into account like
skb_checksum() does. So while we are fixing this up, we better reuse
the existing code that we have anyway in __skb_checksum() and use it
for walking through the data doing checksumming. This will not only
fix this issue, but also consolidates some SCTP code with core
sk_buff code, bringing it closer together and removing respectively
avoiding reimplementation of skb_checksum() for no good reason.

As crc32c() can use hardware implementation within the crypto layer,
we leave that intact (it wraps around / falls back to e.g. slice-by-8
algorithm in __crc32c_le() otherwise); plus use the __crc32c_le_combine()
combinator for crc32c blocks.

Also, we remove all other SCTP checksumming code, so that we only
have to use sctp_compute_cksum() from now on; for doing that, we need
to transform SCTP checkumming in output path slightly, and can leave
the rest intact.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-03 23:04:57 -05:00
Daniel Borkmann 2817a336d4 net: skb_checksum: allow custom update/combine for walking skb
Currently, skb_checksum walks over 1) linearized, 2) frags[], and
3) frag_list data and calculats the one's complement, a 32 bit
result suitable for feeding into itself or csum_tcpudp_magic(),
but unsuitable for SCTP as we're calculating CRC32c there.

Hence, in order to not re-implement the very same function in
SCTP (and maybe other protocols) over and over again, use an
update() + combine() callback internally to allow for walking
over the skb with different algorithms.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-03 23:04:57 -05:00
David S. Miller 296c10639a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Conflicts:
	net/xfrm/xfrm_policy.c

Minor merge conflict in xfrm_policy.c, consisting of overlapping
changes which were trivial to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-02 02:13:48 -04:00
Alexander Aring 3582b900ad 6lowpan: cleanup skb copy data
This patch drops the direct memcpy on skb and uses the right skb
memcpy functions. Also remove an unnecessary check if plen is non zero.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30 17:18:46 -04:00
Alexander Aring 578d524127 6lowpan: set 6lowpan network and transport header
This is necessary to access network header with the skb_network_header
function instead of calculate the position with mac_len, etc.
Do the same for the transport header, when we replace the IPv6 header
with the 6LoWPAN header.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30 17:18:46 -04:00
Alexander Aring 3e69162ea4 6lowpan: set and use mac_len for mac header length
Set the mac header length while creating the 802.15.4 mac header.

Drop the function for recalculate mac header length in upper layers
which was static and works for intra pan communication only.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30 17:18:46 -04:00
Alexander Aring 3961532fd4 6lowpan: remove unnecessary set of headers
On receiving side we don't need to set any headers in skb because the
6LoWPAN layer do not access it. Currently these values will set twice
after calling netif_rx.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30 17:18:45 -04:00
Duan Jiong ba4865027c ipv6: remove the unnecessary statement in find_match()
After reading the function rt6_check_neigh(), we can
know that the RT6_NUD_FAIL_SOFT can be returned only
when the IS_ENABLE(CONFIG_IPV6_ROUTER_PREF) is false.
so in function find_match(), there is no need to execute
the statement !IS_ENABLED(CONFIG_IPV6_ROUTER_PREF).

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30 17:06:51 -04:00
Chen Weilong 83a1a7ce60 mac802154: Use pr_err(...) rather than printk(KERN_ERR ...)
This change is inspired by checkpatch.

Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30 17:05:44 -04:00
Ying Xue 3af390e2c5 tipc: remove two indentation levels in tipc_recv_msg routine
The message dispatching part of tipc_recv_msg() is wrapped layers of
while/if/if/switch, causing out-of-control indentation and does not
look very good. We reduce two indentation levels by separating the
message dispatching from the blocks that checks link state and
sequence numbers, allowing longer function and arg names to be
consistently indented without wrapping. Additionally we also rename
"cont" label to "discard" and add one new label called "unlock_discard"
to make code clearer. In all, these are cosmetic changes that do not
alter the operation of TIPC in any way.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Andreas Bofjäll <andreas.bofjall@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30 16:54:54 -04:00
Yuchung Cheng c968601d17 tcp: temporarily disable Fast Open on SYN timeout
Fast Open currently has a fall back feature to address SYN-data being
dropped but it requires the middle-box to pass on regular SYN retry
after SYN-data. This is implemented in commit aab487435 ("net-tcp:
Fast Open client - detecting SYN-data drops")

However some NAT boxes will drop all subsequent packets after first
SYN-data and blackholes the entire connections.  An example is in
commit 356d7d8 "netfilter: nf_conntrack: fix tcp_in_window for Fast
Open".

The sender should note such incidents and fall back to use the regular
TCP handshake on subsequent attempts temporarily as well: after the
second SYN timeouts the original Fast Open SYN is most likely lost.
When such an event recurs Fast Open is disabled based on the number of
recurrences exponentially.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 22:50:41 -04:00
Daniel Borkmann 7d1d65cb84 net: sched: cls_bpf: add BPF-based classifier
This work contains a lightweight BPF-based traffic classifier that can
serve as a flexible alternative to ematch-based tree classification, i.e.
now that BPF filter engine can also be JITed in the kernel. Naturally, tc
actions and policies are supported as well with cls_bpf. Multiple BPF
programs/filter can be attached for a class, or they can just as well be
written within a single BPF program, that's really up to the user how he
wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
The notion of a BPF program's return/exit codes is being kept as follows:

     0: No match
    -1: Select classid given in "tc filter ..." command
  else: flowid, overwrite the default one

As a minimal usage example with iproute2, we use a 3 band prio root qdisc
on a router with sfq each as leave, and assign ssh and icmp bpf-based
filters to band 1, http traffic to band 2 and the rest to band 3. For the
first two bands we load the bytecode from a file, in the 2nd we load it
inline as an example:

echo 1 > /proc/sys/net/core/bpf_jit_enable

tc qdisc del dev em1 root
tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

tc qdisc add dev em1 parent 1:1 sfq perturb 16
tc qdisc add dev em1 parent 1:2 sfq perturb 16
tc qdisc add dev em1 parent 1:3 sfq perturb 16

tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3

BPF programs can be easily created and passed to tc, either as inline
'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
compile opcodes, for example:

1) People familiar with tcpdump-like filters:

   tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf

2) People that want to low-level program their filters or use BPF
   extensions that lack support by libpcap's compiler:

   bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf

   ssh.ops example code:
   ldh [12]
   jne #0x800, drop
   ldb [23]
   jneq #6, drop
   ldh [20]
   jset #0x1fff, drop
   ldxb 4 * ([14] & 0xf)
   ldh [%x + 14]
   jeq #0x16, pass
   ldh [%x + 16]
   jne #0x16, drop
   pass: ret #-1
   drop: ret #0

It was chosen to load bytecode into tc, since the reverse operation,
tc filter list dev em1, is then able to show the exact commands again.
Possible follow-up work could also include a small expression compiler
for iproute2. Tested with the help of bmon. This idea came up during
the Netfilter Workshop 2013 in Copenhagen. Also thanks to feedback from
Eric Dumazet!

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 17:33:17 -04:00
Mathias Krause 1c5ad13f7c net: esp{4,6}: get rid of struct esp_data
struct esp_data consists of a single pointer, vanishing the need for it
to be a structure. Fold the pointer into 'data' direcly, removing one
level of pointer indirection.

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-29 06:39:42 +01:00
Mathias Krause 123b0d1ba0 net: esp{4,6}: remove padlen from struct esp_data
The padlen member of struct esp_data is always zero. Get rid of it.

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-29 06:39:42 +01:00
Zhi Yong Wu cdfb97bc01 net, mc: fix the incorrect comments in two mc-related functions
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 00:19:05 -04:00
Zhi Yong Wu ab1a2d7773 net, iovec: fix the incorrect comment in memcpy_fromiovecend()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 00:19:04 -04:00
Zhi Yong Wu c4e819d16c net, datagram: fix the incorrect comment in zerocopy_sg_from_iovec()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 00:19:04 -04:00
Hannes Frederic Sowa daba287b29 ipv4: fix DO and PROBE pmtu mode regarding local fragmentation with UFO/CORK
UFO as well as UDP_CORK do not respect IP_PMTUDISC_DO and
IP_PMTUDISC_PROBE well enough.

UFO enabled packet delivery just appends all frags to the cork and hands
it over to the network card. So we just deliver non-DF udp fragments
(DF-flag may get overwritten by hardware or virtual UFO enabled
interface).

UDP_CORK does enqueue the data until the cork is disengaged. At this
point it sets the correct IP_DF and local_df flags and hands it over to
ip_fragment which in this case will generate an icmp error which gets
appended to the error socket queue. This is not reflected in the syscall
error (of course, if UFO is enabled this also won't happen).

Improve this by checking the pmtudisc flags before appending data to the
socket and if we still can fit all data in one packet when IP_PMTUDISC_DO
or IP_PMTUDISC_PROBE is set, only then proceed.

We use (mtu-fragheaderlen) to check for the maximum length because we
ensure not to generate a fragment and non-fragmented data does not need
to have its length aligned on 64 bit boundaries. Also the passed in
ip_options are already aligned correctly.

Maybe, we can relax some other checks around ip_fragment. This needs
more research.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 00:15:22 -04:00
David S. Miller 5d9efa7ee9 ipv6: Remove privacy config option.
The code for privacy extentions is very mature, and making it
configurable only gives marginal memory/code savings in exchange
for obfuscation and hard to read code via CPP ifdef'ery.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 20:07:50 -04:00
Alexander Aring 8ef007fd1d 6lowpan: remove unnecessary break
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 19:47:52 -04:00
Alexander Aring b236b954de 6lowpan: remove skb->dev assignment
This patch removes the assignment of skb->dev. We don't need it here because
we use the netdev_alloc_skb_ip_align function which already sets the
skb->dev.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 19:47:52 -04:00
Alexander Aring b614442f34 6lowpan: use netdev_alloc_skb instead dev_alloc_skb
This patch uses the netdev_alloc_skb instead dev_alloc_skb function and
drops the seperate assignment to skb->dev.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 19:47:51 -04:00
Alexander Aring 53cb5717b4 6lowpan: remove unnecessary check on err >= 0
The err variable can only be zero in this case.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 19:47:51 -04:00
Alexander Aring 545f3613a8 6lowpan: remove unnecessary ret variable
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 19:47:51 -04:00
wangweidong 747edc0f9e sctp: merge two if statements to one
Two if statements do the same work, we can merge them to
one. And fix some typos. There is just code simplification,
no functional changes.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 01:02:34 -04:00
wangweidong 3dc0a548a0 sctp: remove the repeat initialize with 0
kmem_cache_zalloc had set the allocated memory to zero. I think no need
to initialize with 0. And move the comments to the function begin.

Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 01:02:34 -04:00
wangweidong 2bccbadf20 sctp: fix some comments in chunk.c and associola.c
fix some typos

Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 01:02:34 -04:00
Eric Dumazet 8c3a897bfa inet: restore gso for vxlan
Alexei reported a performance regression on vxlan, caused
by commit 3347c96029 "ipv4: gso: make inet_gso_segment() stackable"

GSO vxlan packets were not properly segmented, adding IP fragments
while they were not expected.

Rename 'bool tunnel' to 'bool encap', and add a new boolean
to express the fact that UDP should be fragmented.
This fragmentation is triggered by skb->encapsulation being set.

Remove a "skb->encapsulation = 1" added in above commit,
as its not needed, as frags inherit skb->frag from original
GSO skb.

Reported-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 00:23:06 -04:00
Alexei Starovoitov 7f29405403 net: fix rtnl notification in atomic context
commit 991fb3f74c "dev: always advertise rx_flags changes via netlink"
introduced rtnl notification from __dev_set_promiscuity(),
which can be called in atomic context.

Steps to reproduce:
ip tuntap add dev tap1 mode tap
ifconfig tap1 up
tcpdump -nei tap1 &
ip tuntap del dev tap1 mode tap

[  271.627994] device tap1 left promiscuous mode
[  271.639897] BUG: sleeping function called from invalid context at mm/slub.c:940
[  271.664491] in_atomic(): 1, irqs_disabled(): 0, pid: 3394, name: ip
[  271.677525] INFO: lockdep is turned off.
[  271.690503] CPU: 0 PID: 3394 Comm: ip Tainted: G        W    3.12.0-rc3+ #73
[  271.703996] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[  271.731254]  ffffffff81a58506 ffff8807f0d57a58 ffffffff817544e5 ffff88082fa0f428
[  271.760261]  ffff8808071f5f40 ffff8807f0d57a88 ffffffff8108bad1 ffffffff81110ff8
[  271.790683]  0000000000000010 00000000000000d0 00000000000000d0 ffff8807f0d57af8
[  271.822332] Call Trace:
[  271.838234]  [<ffffffff817544e5>] dump_stack+0x55/0x76
[  271.854446]  [<ffffffff8108bad1>] __might_sleep+0x181/0x240
[  271.870836]  [<ffffffff81110ff8>] ? rcu_irq_exit+0x68/0xb0
[  271.887076]  [<ffffffff811a80be>] kmem_cache_alloc_node+0x4e/0x2a0
[  271.903368]  [<ffffffff810b4ddc>] ? vprintk_emit+0x1dc/0x5a0
[  271.919716]  [<ffffffff81614d67>] ? __alloc_skb+0x57/0x2a0
[  271.936088]  [<ffffffff810b4de0>] ? vprintk_emit+0x1e0/0x5a0
[  271.952504]  [<ffffffff81614d67>] __alloc_skb+0x57/0x2a0
[  271.968902]  [<ffffffff8163a0b2>] rtmsg_ifinfo+0x52/0x100
[  271.985302]  [<ffffffff8162ac6d>] __dev_notify_flags+0xad/0xc0
[  272.001642]  [<ffffffff8162ad0c>] __dev_set_promiscuity+0x8c/0x1c0
[  272.017917]  [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380
[  272.033961]  [<ffffffff8162b109>] dev_set_promiscuity+0x29/0x50
[  272.049855]  [<ffffffff8172e937>] packet_dev_mc+0x87/0xc0
[  272.065494]  [<ffffffff81732052>] packet_notifier+0x1b2/0x380
[  272.080915]  [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380
[  272.096009]  [<ffffffff81761c66>] notifier_call_chain+0x66/0x150
[  272.110803]  [<ffffffff8108503e>] __raw_notifier_call_chain+0xe/0x10
[  272.125468]  [<ffffffff81085056>] raw_notifier_call_chain+0x16/0x20
[  272.139984]  [<ffffffff81620190>] call_netdevice_notifiers_info+0x40/0x70
[  272.154523]  [<ffffffff816201d6>] call_netdevice_notifiers+0x16/0x20
[  272.168552]  [<ffffffff816224c5>] rollback_registered_many+0x145/0x240
[  272.182263]  [<ffffffff81622641>] rollback_registered+0x31/0x40
[  272.195369]  [<ffffffff816229c8>] unregister_netdevice_queue+0x58/0x90
[  272.208230]  [<ffffffff81547ca0>] __tun_detach+0x140/0x340
[  272.220686]  [<ffffffff81547ed6>] tun_chr_close+0x36/0x60

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-25 19:03:45 -04:00
Hannes Frederic Sowa 66415cf8a1 net: initialize hashrnd in flow_dissector with net_get_random_once
We also can defer the initialization of hashrnd in flow_dissector
to its first use. Since net_get_random_once is irq safe now we don't
have to audit the call paths if one of this functions get called by an
interrupt handler.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-25 19:03:39 -04:00
Hannes Frederic Sowa f84be2bd96 net: make net_get_random_once irq safe
I initial build non irq safe version of net_get_random_once because I
would liked to have the freedom to defer even the extraction process of
get_random_bytes until the nonblocking pool is fully seeded.

I don't think this is a good idea anymore and thus this patch makes
net_get_random_once irq safe. Now someone using net_get_random_once does
not need to care from where it is called.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-25 19:03:39 -04:00
Nikolay Aleksandrov 974daef7f8 net: add missing dev_put() in __netdev_adjacent_dev_insert
I think that a dev_put() is needed in the error path to preserve the
proper dev refcount.

CC: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-25 19:03:39 -04:00
Hagen Paul Pfeifer 4a3ad7b3ea netem: markov loss model transition fix
The transition from markov state "3 => lost packets within a burst
period" to "1 => successfully transmitted packets within a gap period"
has no *additional* loss event. The loss already happen for transition
from 1 -> 3, this additional loss will make things go wild.

E.g. transition probabilities:

p13:   10%
p31:  100%

Expected:

Ploss = p13 / (p13 + p31)
Ploss = ~9.09%

... but it isn't. Even worse: we get a double loss - each time.
So simple don't return true to indicate loss, rather break and return
false.

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Stefano Salsano <stefano.salsano@uniroma2.it>
Cc: Fabio Ludovici <fabio.ludovici@yahoo.it>
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-25 19:03:39 -04:00
David S. Miller b45bd46dec Included changes:
- data structure reshaping to accommodate multiple routing protocol
   implementations
 - routing protocol API enhancement
 - send to userspace the event "batman-adv Gateway loss" in case of soft-iface
   destruction and a "batman-adv Gateway" was configured
 - improve the TT component to support and advertise runtime flag changes
 - minor code refactoring
 - make the ICMP kernel-to-userspace communication more generic
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJSZ+VyAAoJEADl0hg6qKeOccIP/1RE68wFqd20BTh6WtnKo4s6
 H/F5WMUrk/HCNe3p4JnIOYv5WESR3tqTylCqBIl3yzcsks2KsLm4zEG5FIx/0J3Y
 1Mgy8V/xlGVN7M+wFSLCgzpTnJ3aCvy7+ied2qoz9KsC4vgiKUimDkPTsbUL7NUp
 OBmJGYfe/nLDoPI/CXu3nJCtNXcDgv5a5Z8ZMBuipeK++JsBMZRLfJEIM+7Q4Ouc
 KNLZFXavwtXAxsHLpWDS48MkVAz0tbyy4P6e2k7iQQq+W1WZjCoFMz0xLIKRFf+Y
 yOOZXItpTTX7rzLxFHkLAopZo9UMPsFjm/OceFBlnAbp24ftfR0b4gjFAUQiKFsq
 lGlGIXVkhsR6arfQ4SIlrrGOW7h+Kea2I1aPWC7yoi/97+22Nrr/a903p+kkhP4t
 sAoMk7DbbdajcV01RULF+xjaBFvEdaSfSBVB5j76Gf9AxNZfSGd7wH1qPG7O7HFT
 jO6Z4fbG6bbHBHMt9j4o9oGg4h5X8epbhZB7e8rwoBe/dSzw+B4CK2Y+j1/QW3C4
 PDqL3t5yi0O0+dhkI2G8DmhTOm+ZKVZt60WIMM+G5T6DiLyECveexGWIbjjZR67w
 qgVXsvE0PwHabY8Ne/z+IlnHY8zegUFYVusQ0lQfLpkKhdjoYXLF709RT42r2QIN
 8/6WlNGD2YG/0sWEDF7H
 =sYKR
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Antonio Quartulli says:

====================
this is another set of changes intended for net-next/linux-3.13.
(probably our last pull request for this cycle)

Patches 1 and 2 reshape two of our main data structures in a way that they can
easily be extended in the future to accommodate new routing protocols.

Patches from 3 to 9 improve our routing protocol API and its users so that all
the protocol-related code is not mixed up with the other components anymore.

Patch 10 limits the local Translation Table maximum size to a value such that it
can be fully transfered over the air if needed. This value depends on
fragmentation being enabled or not and on the mtu values.

Patch 11 makes batman-adv send a uevent in case of soft-interface destruction
while a "bat-Gateway" was configured (this informs userspace about the GW not
being available anymore).

Patches 13 and 14 enable the TT component to detect non-mesh client flag
changes at runtime (till now those flags where set upon client detection and
were not changed anymore).

Patch 16 is a generalisation of our user-to-kernel space communication (and
viceversa) used to exchange ICMP packets to send/received to/from the mesh
network. Now it can easily accommodate new ICMP packet types without breaking
the existing userspace API anymore.

Remaining patches are minor changes and cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 17:12:33 -04:00
Hannes Frederic Sowa 7088ad74e6 inet: remove old fragmentation hash initializing
All fragmentation hash secrets now get initialized by their
corresponding hash function with net_get_random_once. Thus we can
eliminate the initial seeding.

Also provide a comment that hash secret seeding happens at the first
call to the corresponding hashing function.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 17:01:41 -04:00
Hannes Frederic Sowa b1190570b4 ipv6: split inet6_hash_frag for netfilter and initialize secrets with net_get_random_once
Defer the fragmentation hash secret initialization for IPv6 like the
previous patch did for IPv4.

Because the netfilter logic reuses the hash secret we have to split it
first. Thus introduce a new nf_hash_frag function which takes care to
seed the hash secret.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 17:01:40 -04:00
Hannes Frederic Sowa e7b519ba55 ipv4: initialize ip4_frags hash secret as late as possible
Defer the generation of the first hash secret for the ipv4 fragmentation
cache as late as possible.

ip4_frags.rnd gets initial seeded by inet_frags_init and regulary
reseeded by inet_frag_secret_rebuild. Either we call ipqhashfn directly
from ip_fragment.c in which case we initialize the secret directly.

If we first get called by inet_frag_secret_rebuild we install a new secret
by a manual call to get_random_bytes. This secret will be overwritten
as soon as the first call to ipqhashfn happens. This is safe because we
won't race while publishing the new secrets with anyone else.

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 17:01:40 -04:00
David S. Miller c3fa32b976 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/qmi_wwan.c
	include/net/dst.h

Trivial merge conflicts, both were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 16:49:34 -04:00
Hannes Frederic Sowa 34d92d5315 net: always inline net_secret_init
Currently net_secret_init does not get inlined, so we always have a call
to net_secret_init even in the fast path.

Let's specify net_secret_init as __always_inline so we have the nop in
the fast-path without the call to net_secret_init and the unlikely path
at the epilogue of the function.

jump_labels handle the inlining correctly.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 16:26:46 -04:00
Simon Wunderlich da6b8c20a5 batman-adv: generalize batman-adv icmp packet handling
Instead of handling icmp packets only up to length of icmp_packet_rr,
the code should handle any icmp length size. Therefore the length
truncating is moved to when the packet is actually sent to userspace
(this does not support lengths longer than icmp_packet_rr yet). Longer
packets are forwarded without truncating.

This patch also cleans up some parts where the icmp header struct could
be used instead of other icmp_packet(_rr) structs to make the code more
readable.

Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-10-23 17:03:47 +02:00
Simon Wunderlich 15c33da6e8 batman-adv: Start new development cycle
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-10-23 17:03:46 +02:00
Antonio Quartulli 0eb01568f0 batman-adv: include the sync-flags when compute the global/local table CRC
Flags covered by TT_SYNC_MASK are kept in sync among the
nodes in the network and therefore they have to be
considered while computing the global/local table CRC.

In this way a generic originator is able to understand if
its table contains the correct flags or not.

Bits from 4 to 7 in the TT flags fields are now reserved for
"synchronized" flags only.

This allows future developers to add more flags of this type
without breaking compatibility.

It's important to note that not all the remote TT flags are
synchronised. This comes from the fact that some flags are
used to inject an information once only.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2013-10-23 17:03:46 +02:00
Antonio Quartulli 3c4f7ab60c batman-adv: improve the TT component to support runtime flag changes
Some flags (i.e. the WIFI flag) may change after that the
related client has already been announced. However it is
useful to informa the rest of the network about this change.

Add a runtime-flag-switch detection mechanism and
re-announce the related TT entry to advertise the new flag
value.

This mechanism can be easily exploited by future flags that
may need the same treatment.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2013-10-23 17:03:45 +02:00
Antonio Quartulli 0c69aecc5b batman-adv: invoke dev_get_by_index() outside of is_wifi_iface()
Upcoming changes need to perform other checks on the
incoming net_device struct.

To avoid performing dev_get_by_index() for each and every
check, it is better to move it outside of is_wifi_iface()
and search the netdev object once only.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2013-10-23 17:03:44 +02:00
Antonio Quartulli 8257f55ae2 batman-adv: send GW_DEL event in case of soft-iface destruction
In case of soft_iface destruction send a GW DEL event to
userspace so that applications which are listening for GW
events are informed about the lost of connectivity and can
react accordingly.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2013-10-23 17:03:44 +02:00
Marek Lindner a19d3d85e1 batman-adv: limit local translation table max size
The local translation table size is limited by what can be
transferred from one node to another via a full table request.

The number of entries fitting into a full table request depend
on whether the fragmentation is enabled or not. Therefore this
patch introduces a max table size check and refuses to add
more local clients when that size is reached. Moreover, if the
max full table packet size changes (MTU change or fragmentation
is disabled) the local table is downsized instantaneously.

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Antonio Quartulli <ordex@autistici.org>
2013-10-23 17:03:43 +02:00
Antonio Quartulli 4627456a77 batman-adv: adapt the TT component to use the new API functions
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-23 17:03:42 +02:00
Antonio Quartulli d0015fdd3d batman-adv: provide orig_node routing API
Some operations executed on an orig_node depends on the
current routing algorithm being used. To easily make this
mechanism routing algorithm agnostic add a orig_node
specific API that each algorithm can populate with its own
routines.

Such routines are then invoked by the code when needed,
without knowing which routing algorithm is currently in use

With this patch 3 API functions are added:
- orig_free (to free routing depending internal structs)
- orig_add_if (to change the inner state of an orig_node
  when a new hard interface is added)
- orig_del_if (to change the inner state of an orig_node
  when an hard interface is removed)

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-23 17:03:21 +02:00
Antonio Quartulli 81e26b1a1c batman-adv: adapt the neighbor purging routine to use the new API functions
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-23 15:33:12 +02:00
Antonio Quartulli 6680a1249f batman-adv: adapt bonding to use the new API functions
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-23 15:33:12 +02:00