Commit graph

2669 commits

Author SHA1 Message Date
Eric Dumazet e688a60480 net: introduce DST_NOPEER dst flag
Chris Boot reported crashes occurring in ipv6_select_ident().

[  461.457562] RIP: 0010:[<ffffffff812dde61>]  [<ffffffff812dde61>]
ipv6_select_ident+0x31/0xa7

[  461.578229] Call Trace:
[  461.580742] <IRQ>
[  461.582870]  [<ffffffff812efa7f>] ? udp6_ufo_fragment+0x124/0x1a2
[  461.589054]  [<ffffffff812dbfe0>] ? ipv6_gso_segment+0xc0/0x155
[  461.595140]  [<ffffffff812700c6>] ? skb_gso_segment+0x208/0x28b
[  461.601198]  [<ffffffffa03f236b>] ? ipv6_confirm+0x146/0x15e
[nf_conntrack_ipv6]
[  461.608786]  [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77
[  461.614227]  [<ffffffff81271d64>] ? dev_hard_start_xmit+0x357/0x543
[  461.620659]  [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111
[  461.626440]  [<ffffffffa0379745>] ? br_parse_ip_options+0x19a/0x19a
[bridge]
[  461.633581]  [<ffffffff812722ff>] ? dev_queue_xmit+0x3af/0x459
[  461.639577]  [<ffffffffa03747d2>] ? br_dev_queue_push_xmit+0x72/0x76
[bridge]
[  461.646887]  [<ffffffffa03791e3>] ? br_nf_post_routing+0x17d/0x18f
[bridge]
[  461.653997]  [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77
[  461.659473]  [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge]
[  461.665485]  [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111
[  461.671234]  [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge]
[  461.677299]  [<ffffffffa0379215>] ?
nf_bridge_update_protocol+0x20/0x20 [bridge]
[  461.684891]  [<ffffffffa03bb0e5>] ? nf_ct_zone+0xa/0x17 [nf_conntrack]
[  461.691520]  [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge]
[  461.697572]  [<ffffffffa0374812>] ? NF_HOOK.constprop.8+0x3c/0x56
[bridge]
[  461.704616]  [<ffffffffa0379031>] ?
nf_bridge_push_encap_header+0x1c/0x26 [bridge]
[  461.712329]  [<ffffffffa037929f>] ? br_nf_forward_finish+0x8a/0x95
[bridge]
[  461.719490]  [<ffffffffa037900a>] ?
nf_bridge_pull_encap_header+0x1c/0x27 [bridge]
[  461.727223]  [<ffffffffa0379974>] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge]
[  461.734292]  [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77
[  461.739758]  [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge]
[  461.746203]  [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111
[  461.751950]  [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge]
[  461.758378]  [<ffffffffa037533a>] ? NF_HOOK.constprop.4+0x56/0x56
[bridge]

This is caused by bridge netfilter special dst_entry (fake_rtable), a
special shared entry, where attaching an inetpeer makes no sense.

Problem is present since commit 87c48fa3b4 (ipv6: make fragment
identifications less predictable)

Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and
__ip_select_ident() fallback to the 'no peer attached' handling.

Reported-by: Chris Boot <bootc@bootc.net>
Tested-by: Chris Boot <bootc@bootc.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22 22:34:56 -05:00
David S. Miller bb3c36863e ipv6: Check dest prefix length on original route not copied one in rt6_alloc_cow().
After commit 8e2ec63917 ("ipv6: don't
use inetpeer to store metrics for routes.") the test in rt6_alloc_cow()
for setting the ANYCAST flag is now wrong.

'rt' will always now have a plen of 128, because it is set explicitly
to 128 by ip6_rt_copy.

So to restore the semantics of the test, check the destination prefix
length of 'ort'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 17:35:06 -05:00
Ted Feng 72b36015ba ipip, sit: copy parms.name after register_netdevice
Same fix as 731abb9cb2 for ipip and sit tunnel.
Commit 1c5cae815d removed an explicit call to dev_alloc_name in
ipip_tunnel_locate and ipip6_tunnel_locate, because register_netdevice
will now create a valid name, however the tunnel keeps a copy of the
name in the private parms structure. Fix this by copying the name back
after register_netdevice has successfully returned.

This shows up if you do a simple tunnel add, followed by a tunnel show:

$ sudo ip tunnel add mode ipip remote 10.2.20.211
$ ip tunnel
tunl0: ip/ip  remote any  local any  ttl inherit  nopmtudisc
tunl%d: ip/ip  remote 10.2.20.211  local any  ttl inherit
$ sudo ip tunnel add mode sit remote 10.2.20.212
$ ip tunnel
sit0: ipv6/ip  remote any  local any  ttl 64  nopmtudisc 6rd-prefix 2002::/16
sit%d: ioctl 89f8 failed: No such device
sit%d: ipv6/ip  remote 10.2.20.212  local any  ttl inherit

Cc: stable@vger.kernel.org
Signed-off-by: Ted Feng <artisdom@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12 18:50:51 -05:00
Li Wei 4af04aba93 ipv6: Fix for adding multicast route for loopback device automatically.
There is no obvious reason to add a default multicast route for loopback
devices, otherwise there would be a route entry whose dst.error set to
-ENETUNREACH that would blocking all multicast packets.

====================

[ more detailed explanation ]

The problem is that the resulting routing table depends on the sequence
of interface's initialization and in some situation, that would block all
muticast packets. Suppose there are two interfaces on my computer
(lo and eth0), if we initailize 'lo' before 'eth0', the resuting routing
table(for multicast) would be

# ip -6 route show | grep ff00::
unreachable ff00::/8 dev lo metric 256 error -101
ff00::/8 dev eth0 metric 256

When sending multicasting packets, routing subsystem will return the first
route entry which with a error set to -101(ENETUNREACH).

I know the kernel will set the default ipv6 address for 'lo' when it is up
and won't set the default multicast route for it, but there is no reason to
stop 'init' program from setting address for 'lo', and that is exactly what
systemd did.

I am sure there is something wrong with kernel or systemd, currently I preferred
kernel caused this problem.

====================

Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12 18:48:18 -05:00
David S. Miller 59c2cdae27 Revert "udp: remove redundant variable"
This reverts commit 81d54ec847.

If we take the "try_again" goto, due to a checksum error,
the 'len' has already been truncated.  So we won't compute
the same values as the original code did.

Reported-by: paul bilke <fsmail@conspiracy.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-01 14:12:55 -05:00
Li Wei 2a38e6d5ae ipv6: Set mcast_hops to IPV6_DEFAULT_MCASTHOPS when -1 was given.
We need to set np->mcast_hops to it's default value at this moment
otherwise when we use it and found it's value is -1, the logic to
get default hop limit doesn't take multicast into account and will
return wrong hop limit(IPV6_DEFAULT_HOPLIMIT) which is for unicast.

Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-28 18:09:13 -05:00
Steffen Klassert 618f9bc74a net: Move mtu handling down to the protocol depended handlers
We move all mtu handling from dst_mtu() down to the protocol
layer. So each protocol can implement the mtu handling in
a different manner.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-26 14:29:51 -05:00
Steffen Klassert ebb762f27f net: Rename the dst_opt default_mtu method to mtu
We plan to invoke the dst_opt->default_mtu() method unconditioally
from dst_mtu(). So rename the method to dst_opt->mtu() to match
the name with the new meaning.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-26 14:29:50 -05:00
Steffen Klassert 6b600b26c0 route: Use the device mtu as the default for blackhole routes
As it is, we return null as the default mtu of blackhole routes.
This may lead to a propagation of a bogus pmtu if the default_mtu
method of a blackhole route is invoked. So return dst->dev->mtu
as the default mtu instead.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-26 14:29:50 -05:00
Eric Dumazet 4d0fe50c75 ipv6: tcp: fix tcp_v6_conn_request()
Since linux 2.6.26 (commit c6aefafb7e : Add IPv6 support to TCP SYN
cookies), we can drop a SYN packet reusing a TIME_WAIT socket.

(As a matter of fact we fail to send the SYNACK answer)

As the client resends its SYN packet after a one second timeout, we
accept it, because first packet removed the TIME_WAIT socket before
being dropped.

This probably explains why nobody ever noticed or complained.

Reported-by: Jesse Young <jlyo@jlyo.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-23 17:29:23 -05:00
David S. Miller 46a246c4df netfilter: Remove NOTRACK/RAW dependency on NETFILTER_ADVANCED.
Distributions are using this in their default scripts, so don't hide
them behind the advanced setting.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-23 16:07:00 -05:00
Eric Dumazet c16a98ed91 ipv6: tcp: fix panic in SYN processing
commit 72a3effaf6 ([NET]: Size listen hash tables using backlog
hint) added a bug allowing inet6_synq_hash() to return an out of bound
array index, because of u16 overflow.

Bug can happen if system admins set net.core.somaxconn &
net.ipv4.tcp_max_syn_backlog sysctls to values greater than 65536

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-23 15:49:31 -05:00
Li Wei 4d65a2465f ipv6: fix a bug in ndisc_send_redirect
Release skb when transmit rate limit _not_ allow

Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-23 03:51:54 -05:00
Josh Boyer 731abb9cb2 ip6_tunnel: copy parms.name after register_netdevice
Commit 1c5cae815d removed an explicit call to dev_alloc_name in ip6_tnl_create
because register_netdevice will now create a valid name.  This works for the
net_device itself.

However the tunnel keeps a copy of the name in the parms structure for the
ip6_tnl associated with the tunnel.  parms.name is set by copying the net_device
name in ip6_tnl_dev_init_gen.  That function is called from ip6_tnl_dev_init in
ip6_tnl_create, but it is done before register_netdevice is called so the name
is set to a bogus value in the parms.name structure.

This shows up if you do a simple tunnel add, followed by a tunnel show:

[root@localhost ~]# ip -6 tunnel add remote fec0::100 local fec0::200
[root@localhost ~]# ip -6 tunnel show
ip6tnl0: ipv6/ipv6 remote :: local :: encaplimit 0 hoplimit 0 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
ip6tnl%d: ipv6/ipv6 remote fec0::100 local fec0::200 encaplimit 4 hoplimit 64 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
[root@localhost ~]#

Fix this by moving the strcpy out of ip6_tnl_dev_init_gen, and calling it after
register_netdevice has successfully returned.

Cc: stable@vger.kernel.org
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-14 00:24:06 -05:00
Nick Bowler 4b90a603a1 ah: Don't return NET_XMIT_DROP on input.
When the ahash driver returns -EBUSY, AH4/6 input functions return
NET_XMIT_DROP, presumably copied from the output code path.  But
returning transmit codes on input doesn't make a lot of sense.
Since NET_XMIT_DROP is a positive int, this gets interpreted as
the next header type (i.e., success).  As that can only end badly,
remove the check.

Signed-off-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-12 18:13:32 -05:00
Nick Bowler b7ea81a58a ah: Read nexthdr value before overwriting it in ahash input callback.
The AH4/6 ahash input callbacks read out the nexthdr field from the AH
header *after* they overwrite that header.  This is obviously not going
to end well.  Fix it up.

Signed-off-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-09 15:55:53 -05:00
Nick Bowler 069294e813 ah: Correctly pass error codes in ahash output callback.
The AH4/6 ahash output callbacks pass nexthdr to xfrm_output_resume
instead of the error code.  This appears to be a copy+paste error from
the input case, where nexthdr is expected.  This causes the driver to
continuously add AH headers to the datagram until either an allocation
fails and the packet is dropped or the ahash driver hits a synchronous
fallback and the resulting monstrosity is transmitted.

Correct this issue by simply passing the error code unadulterated.

Signed-off-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-09 15:55:53 -05:00
Brian Haley c457338d7a ipv6: drop packets when source address is multicast
RFC 4291 Section 2.7 says Multicast addresses must not be used as source
addresses in IPv6 packets - drop them on input so we don't process the
packet further.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Reported-and-Tested-by: Kumar Sanghvi <divinekumar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-08 12:37:06 -05:00
Linus Torvalds 32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Arjan van de Ven 73cb88ecb9 net: make the tcp and udp file_operations for the /proc stuff const
the tcp and udp code creates a set of struct file_operations at runtime
while it can also be done at compile time, with the added benefit of then
having these file operations be const.

the trickiest part was to get the "THIS_MODULE" reference right; the naive
method of declaring a struct in the place of registration would not work
for this reason.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-01 17:56:14 -04:00
Florian Westphal 2dad81adf2 netfilter: ipv6: fix afinfo->route refcnt leak on error
Several callers (h323 conntrack, xt_addrtype) assume that the
returned **dst only needs to be released if the function returns 0.

This is true for the ipv4 implementation, but not for the ipv6 one.

Instead of changing the users, change the ipv6 implementation
to behave like the ipv4 version by only providing the dst_entry result
in the success case.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-11-01 09:20:07 +01:00
Joe Perches 0a9ee81349 netfilter: Remove unnecessary OOM logging messages
Site specific OOM messages are duplications of a generic MM
out of memory message and aren't really useful, so just
delete them.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-11-01 09:19:49 +01:00
Paul Gortmaker bc3b2d7fb9 net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules
These files are non modular, but need to export symbols using
the macros now living in export.h -- call out the include so
that things won't break when we remove the implicit presence
of module.h from everywhere.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:30 -04:00
Andreas Hofmeister 14ef37b6d0 ipv6: fix route lookup in addrconf_prefix_rcv()
The route lookup to find a previously auto-configured route for a prefixes used
to use rt6_lookup(), with the prefix from the RA used as an address. However,
that kind of lookup ignores routing tables, the prefix length and route flags,
so when there were other matching routes, even in different tables and/or with
a different prefix length, the wrong route would be manipulated.

Now, a new function "addrconf_get_prefix_route()" is used for the route lookup,
which searches in RT6_TABLE_PREFIX and takes the prefix-length and route flags
into account.

Signed-off-by: Andreas Hofmeister <andi@collax.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-30 04:12:36 -04:00
Gao feng 7011687f0f ipv6: fix route error binding peer in func icmp6_dst_alloc
in func icmp6_dst_alloc,dst_metric_set call ipv6_cow_metrics to set metric.
ipv6_cow_metrics may will call rt6_bind_peer to set rt6_info->rt6i_peer.
So,we should move ipv6_addr_copy before dst_metric_set to make sure rt6_bind_peer success.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-28 16:36:07 -04:00
Zheng Yan 504744e4ed ipv6: fix error propagation in ip6_ufo_append_data()
We should return errcode from sock_alloc_send_skb()

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-28 00:26:00 -04:00
Eric Dumazet b903d324be ipv6: tcp: fix TCLASS value in ACK messages sent from TIME_WAIT
commit 66b13d99d9 (ipv4: tcp: fix TOS value in ACK messages sent from
TIME_WAIT) fixed IPv4 only.

This part is for the IPv6 side, adding a tclass param to ip6_xmit()

We alias tw_tclass and tw_tos, if socket family is INET6.

[ if sockets is ipv4-mapped, only IP_TOS socket option is used to fill
TOS field, TCLASS is not taken into account ]

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-27 00:44:35 -04:00
Andreas Hofmeister 9f56220fad ipv6: Do not use routes from locally generated RAs
When hybrid mode is enabled (accept_ra == 2), the kernel also sees RAs
generated locally. This is useful since it allows the kernel to auto-configure
its own interface addresses.

However, if 'accept_ra_defrtr' and/or 'accept_ra_rtr_pref' are set and the
locally generated RAs announce the default route and/or other route information,
the kernel happily inserts bogus routes with its own address as gateway.

With this patch, adding routes from an RA will be skiped when the RAs source
address matches any local address, just as if 'accept_ra_defrtr' and
'accept_ra_rtr_pref' were set to 0.

Signed-off-by: Andreas Hofmeister <andi@collax.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 19:13:15 -04:00
David S. Miller 1805b2f048 Merge branch 'master' of ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2011-10-24 18:18:09 -04:00
Eric Dumazet 318cf7aaa0 tcp: md5: add more const attributes
Now tcp_md5_hash_header() has a const tcphdr argument, we can add more
const attributes to callers.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 02:46:04 -04:00
Eric Dumazet cf533ea53e tcp: add const qualifiers where possible
Adding const qualifiers to pointers can ease code review, and spot some
bugs. It might allow compiler to optimize code further.

For example, is it legal to temporary write a null cksum into tcphdr
in tcp_md5_hash_header() ? I am afraid a sniffer could catch the
temporary null value...

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21 05:22:42 -04:00
Maciej Żenczykowski 6cc7a765c2 net: allow CAP_NET_RAW to set socket options IP{,V6}_TRANSPARENT
Up till now the IP{,V6}_TRANSPARENT socket options (which actually set
the same bit in the socket struct) have required CAP_NET_ADMIN
privileges to set or clear the option.

- we make clearing the bit not require any privileges.
- we allow CAP_NET_ADMIN to set the bit (as before this change)
- we allow CAP_NET_RAW to set this bit, because raw
  sockets already pretty much effectively allow you
  to emulate socket transparency.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20 18:21:36 -04:00
Kevin Wilson 25c8295b5b cleanup: remove unnecessary include.
This cleanup patch removes unnecessary include from net/ipv6/ip6_fib.c.

Signed-off-by: Kevin Wilson <wkevils@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:26:16 -04:00
Eric Dumazet 9e903e0852 net: add skb frag size accessors
To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.

Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 03:10:46 -04:00
Steffen Klassert dd767856a3 xfrm6: Don't call icmpv6_send on local error
Calling icmpv6_send() on a local message size error leads to
an incorrect update of the path mtu. So use xfrm6_local_rxpmtu()
to notify about the pmtu if the IPV6_DONTFRAG socket option is
set on an udp or raw socket, according RFC 3542 and use
ipv6_local_error() otherwise.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-18 23:53:10 -04:00
Steffen Klassert 299b076764 ipv6: Fix IPsec slowpath fragmentation problem
ip6_append_data() builds packets based on the mtu from dst_mtu(rt->dst.path).
On IPsec the effective mtu is lower because we need to add the protocol
headers and trailers later when we do the IPsec transformations. So after
the IPsec transformations the packet might be too big, which leads to a
slowpath fragmentation then. This patch fixes this by building the packets
based on the lower IPsec mtu from dst_mtu(&rt->dst) and adapts the exthdr
handling to this.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-18 23:53:10 -04:00
Steffen Klassert c113464d43 ipv6: Remove superfluous NULL pointer check in ipv6_local_rxpmtu
The pointer to mtu_info is taken from the common buffer
of the skb, thus it can't be a NULL pointer. This patch
removes this check on mtu_info.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-18 23:51:30 -04:00
Roy.Li 01b7806cdc ipv6: remove a rcu_read_lock in ndisc_constructor
in6_dev_get(dev) takes a reference on struct inet6_dev, we dont need
rcu locking in ndisc_constructor()

Signed-off-by: Roy.Li <rongqing.li@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-17 19:27:56 -04:00
Eric Dumazet 87fb4b7b53 net: more accurate skb truesize
skb truesize currently accounts for sk_buff struct and part of skb head.
kmalloc() roundings are also ignored.

Considering that skb_shared_info is larger than sk_buff, its time to
take it into account for better memory accounting.

This patch introduces SKB_TRUESIZE(X) macro to centralize various
assumptions into a single place.

At skb alloc phase, we put skb_shared_info struct at the exact end of
skb head, to allow a better use of memory (lowering number of
reallocations), since kmalloc() gives us power-of-two memory blocks.

Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are
aligned to cache lines, as before.

Note: This patch might trigger performance regressions because of
misconfigured protocol stacks, hitting per socket or global memory
limits that were previously not reached. But its a necessary step for a
more accurate memory accounting.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Andi Kleen <ak@linux.intel.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-13 16:05:07 -04:00
Yan, Zheng cdaf557034 gro: refetch inet6_protos[] after pulling ext headers
ipv6_gro_receive() doesn't update the protocol ops after pulling
the ext headers. It looks like a typo.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-10 14:26:16 -04:00
David S. Miller 88c5100c28 Merge branch 'master' of github.com:davem330/net
Conflicts:
	net/batman-adv/soft-interface.c
2011-10-07 13:38:43 -04:00
Yan, Zheng 260fcbeb1a tcp: properly handle md5sig_pool references
tcp_v4_clear_md5_list() assumes that multiple tcp md5sig peers
only hold one reference to md5sig_pool. but tcp_v4_md5_do_add()
increases use count of md5sig_pool for each peer. This patch
makes tcp_v4_md5_do_add() only increases use count for the first
tcp md5sig peer.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-04 23:31:24 -04:00
Yan, Zheng 676a1184e8 ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket
ipv6_ac_list and ipv6_fl_list from listening socket are inadvertently
shared with new socket created for connection.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-29 00:32:10 -04:00
Ben Greear 67928c4041 ipv6-multicast: Fix memory leak in IPv6 multicast.
If reg_vif_xmit cannot find a routing entry, be sure to
free the skb before returning the error.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-27 15:34:00 -04:00
Madalin Bucur fbe5818690 ipv6: check return value for dst_alloc
return value of dst_alloc must be checked before use

Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-27 15:32:06 -04:00
Ben Greear 2015de5fe2 ipv6-multicast: Fix memory leak in input path.
Have to free the skb before returning if we fail
the fib lookup.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-27 15:16:08 -04:00
Eric Dumazet b82d1bb4fd tcp: unalias tcp_skb_cb flags and ip_dsfield
struct tcp_skb_cb contains a "flags" field containing either tcp flags
or IP dsfield depending on context (input or output path)

Introduce ip_dsfield to make the difference clear and ease maintenance.
If later we want to save space, we can union flags/ip_dsfield

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-27 02:20:08 -04:00
David S. Miller 8decf86879 Merge branch 'master' of github.com:davem330/net
Conflicts:
	MAINTAINERS
	drivers/net/Kconfig
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
	drivers/net/ethernet/broadcom/tg3.c
	drivers/net/wireless/iwlwifi/iwl-pci.c
	drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c
	drivers/net/wireless/rt2x00/rt2800usb.c
	drivers/net/wireless/wl12xx/main.c
2011-09-22 03:23:13 -04:00
Roy Li 8603e33d01 ipv6: fix a possible double free
When calling snmp6_alloc_dev fails, the snmp6 relevant memory
are freed by snmp6_alloc_dev. Calling in6_dev_finish_destroy
will free these memory twice.

Double free will lead that undefined behavior occurs.

Signed-off-by: Roy Li <rongqing.li@windriver.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-20 15:10:16 -04:00
Eric Dumazet d24f22f3df ip6_tunnel: add optional fwmark inherit
Add IP6_TNL_F_USE_ORIG_FWMARK to ip6_tunnel, so that ip6_tnl_xmit2()
makes a route lookup taking into account skb->fwmark and doesnt cache
lookup result.

This permits more flexibility in policies and firewall setups.

To setup such a tunnel, "fwmark inherit" option should be added to "ip
-f inet6 tunnel" command.

Reported-by: Anders Franzen <Anders.Franzen@ericsson.com>
CC: Hans Schillström <hans.schillstrom@ericsson.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-20 14:50:00 -04:00