1
0
Fork 0
Commit Graph

253 Commits (b0d175781ab275576429fe379ba8e98e1c60f362)

Author SHA1 Message Date
Eric Dumazet 9fe516ba3f inet: move ipv6only in sock_common
When an UDP application switches from AF_INET to AF_INET6 sockets, we
have a small performance degradation for IPv4 communications because of
extra cache line misses to access ipv6only information.

This can also be noticed for TCP listeners, as ipv6_only_sock() is also
used from __inet_lookup_listener()->compute_score()

This is magnified when SO_REUSEPORT is used.

Move ipv6only into struct sock_common so that it is available at
no extra cost in lookups.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-01 23:46:21 -07:00
Tom Herbert b26ba202e0 net: Eliminate no_check from protosw
It doesn't seem like an protocols are setting anything other
than the default, and allowing to arbitrarily disable checksums
for a whole protocol seems dangerous. This can be done on a per
socket basis.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-23 16:28:53 -04:00
WANG Cong 698365fa18 net: clean up snmp stats code
commit 8f0ea0fe3a (snmp: reduce percpu needs by 50%)
reduced snmp array size to 1, so technically it doesn't have to be
an array any more. What's more, after the following commit:

	commit 933393f58f
	Date:   Thu Dec 22 11:58:51 2011 -0600

	    percpu: Remove irqsafe_cpu_xxx variants

	    We simply say that regular this_cpu use must be safe regardless of
	    preemption and interrupt state.  That has no material change for x86
	    and s390 implementations of this_cpu operations.  However, arches that
	    do not provide their own implementation for this_cpu operations will
	    now get code generated that disables interrupts instead of preemption.

probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
almost 3 years, no one complains.

So, just convert the array to a single pointer and remove snmp_mib_init()
and snmp_mib_free() as well.

Cc: Christoph Lameter <cl@linux.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-07 16:06:05 -04:00
Florent Fourcot 6444f72b4b ipv6: add flowlabel_consistency sysctl
With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of
flow label unicity. This patch introduces a new sysctl to protect the old
behaviour, enable by default.

Changelog of V3:
 * rename ip6_flowlabel_consistency to flowlabel_consistency
 * use net_info_ratelimited()
 * checkpatch cleanups

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 17:12:31 -08:00
David S. Miller 1669cb9855 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2013-12-19

1) Use the user supplied policy index instead of a generated one
   if present. From Fan Du.

2) Make xfrm migration namespace aware. From Fan Du.

3) Make the xfrm state and policy locks namespace aware. From Fan Du.

4) Remove ancient sleeping when the SA is in acquire state,
   we now queue packets to the policy instead. This replaces the
   sleeping code.

5) Remove FLOWI_FLAG_CAN_SLEEP. This was used to notify xfrm about the
   posibility to sleep. The sleeping code is gone, so remove it.

6) Check user specified spi for IPComp. Thr spi for IPcomp is only
   16 bit wide, so check for a valid value. From Fan Du.

7) Export verify_userspi_info to check for valid user supplied spi ranges
   with pfkey and netlink. From Fan Du.

8) RFC3173 states that if the total size of a compressed payload and the IPComp
   header is not smaller than the size of the original payload, the IP datagram
   must be sent in the original non-compressed form. These packets are dropped
   by the inbound policy check because they are not transformed. Document the need
   to set 'level use' for IPcomp to receive such packets anyway. From Fan Du.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19 18:37:49 -05:00
Hannes Frederic Sowa 974eda11c5 inet: make no_pmtu_disc per namespace and kill ipv4_config
The other field in ipv4_config, log_martians, was converted to a
per-interface setting, so we can just remove the whole structure.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 16:58:20 -05:00
Florent Fourcot ac1eabcaec ipv6: use ip6_flowinfo helper
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 21:03:50 -05:00
Steffen Klassert 0e0d44ab42 net: Remove FLOWI_FLAG_CAN_SLEEP
FLOWI_FLAG_CAN_SLEEP was used to notify xfrm about the posibility
to sleep until the needed states are resolved. This code is gone,
so FLOWI_FLAG_CAN_SLEEP is not needed anymore.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-12-06 07:24:39 +01:00
Linus Torvalds 1ee2dcc224 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Mostly these are fixes for fallout due to merge window changes, as
  well as cures for problems that have been with us for a much longer
  period of time"

 1) Johannes Berg noticed two major deficiencies in our genetlink
    registration.  Some genetlink protocols we passing in constant
    counts for their ops array rather than something like
    ARRAY_SIZE(ops) or similar.  Also, some genetlink protocols were
    using fixed IDs for their multicast groups.

    We have to retain these fixed IDs to keep existing userland tools
    working, but reserve them so that other multicast groups used by
    other protocols can not possibly conflict.

    In dealing with these two problems, we actually now use less state
    management for genetlink operations and multicast groups.

 2) When configuring interface hardware timestamping, fix several
    drivers that simply do not validate that the hwtstamp_config value
    is one the driver actually supports.  From Ben Hutchings.

 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.

 4) In dev_forward_skb(), set the skb->protocol in the right order
    relative to skb_scrub_packet().  From Alexei Starovoitov.

 5) Bridge erroneously fails to use the proper wrapper functions to make
    calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid.  Fix from Toshiaki
    Makita.

 6) When detaching a bridge port, make sure to flush all VLAN IDs to
    prevent them from leaking, also from Toshiaki Makita.

 7) Put in a compromise for TCP Small Queues so that deep queued devices
    that delay TX reclaim non-trivially don't have such a performance
    decrease.  One particularly problematic area is 802.11 AMPDU in
    wireless.  From Eric Dumazet.

 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
    here.  Fix from Eric Dumzaet, reported by Dave Jones.

 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.

10) When computing mergeable buffer sizes, virtio-net fails to take the
    virtio-net header into account.  From Michael Dalton.

11) Fix seqlock deadlock in ip4_datagram_connect() wrt.  statistic
    bumping, this one has been with us for a while.  From Eric Dumazet.

12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
    Hugne.

13) 6lowpan bit used for traffic classification was wrong, from Jukka
    Rissanen.

14) macvlan has the same issue as normal vlans did wrt.  propagating LRO
    disabling down to the real device, fix it the same way.  From Michal
    Kubecek.

15) CPSW driver needs to soft reset all slaves during suspend, from
    Daniel Mack.

16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.

17) The xen-netfront RX buffer refill timer isn't properly scheduled on
    partial RX allocation success, from Ma JieYue.

18) When ipv6 ping protocol support was added, the AF_INET6 protocol
    initialization cleanup path on failure was borked a little.  Fix
    from Vlad Yasevich.

19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
    blocks we can do the wrong thing with the msg_name we write back to
    userspace.  From Hannes Frederic Sowa.  There is another fix in the
    works from Hannes which will prevent future problems of this nature.

20) Fix route leak in VTI tunnel transmit, from Fan Du.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  genetlink: make multicast groups const, prevent abuse
  genetlink: pass family to functions using groups
  genetlink: add and use genl_set_err()
  genetlink: remove family pointer from genl_multicast_group
  genetlink: remove genl_unregister_mc_group()
  hsr: don't call genl_unregister_mc_group()
  quota/genetlink: use proper genetlink multicast APIs
  drop_monitor/genetlink: use proper genetlink multicast APIs
  genetlink: only pass array to genl_register_family_with_ops()
  tcp: don't update snd_nxt, when a socket is switched from repair mode
  atm: idt77252: fix dev refcnt leak
  xfrm: Release dst if this dst is improper for vti tunnel
  netlink: fix documentation typo in netlink_set_err()
  be2net: Delete secondary unicast MAC addresses during be_close
  be2net: Fix unconditional enabling of Rx interface options
  net, virtio_net: replace the magic value
  ping: prevent NULL pointer dereference on write to msg_name
  bnx2x: Prevent "timeout waiting for state X"
  bnx2x: prevent CFC attention
  bnx2x: Prevent panic during DMAE timeout
  ...
2013-11-19 15:50:47 -08:00
Vlad Yasevich eca42aaf89 ipv6: Fix inet6_init() cleanup order
Commit 6d0bfe2261
	net: ipv6: Add IPv6 support to the ping socket

introduced a change in the cleanup logic of inet6_init and
has a bug in that ipv6_packet_cleanup() may not be called.
Fix the cleanup ordering.

CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Lorenzo Colitti <lorenzo@google.com>
CC: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-18 15:38:46 -05:00
Linus Torvalds 5e30025a31 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking changes from Ingo Molnar:
 "The biggest changes:

   - add lockdep support for seqcount/seqlocks structures, this
     unearthed both bugs and required extra annotation.

   - move the various kernel locking primitives to the new
     kernel/locking/ directory"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  block: Use u64_stats_init() to initialize seqcounts
  locking/lockdep: Mark __lockdep_count_forward_deps() as static
  lockdep/proc: Fix lock-time avg computation
  locking/doc: Update references to kernel/mutex.c
  ipv6: Fix possible ipv6 seqlock deadlock
  cpuset: Fix potential deadlock w/ set_mems_allowed
  seqcount: Add lockdep functionality to seqcount/seqlock structures
  net: Explicitly initialize u64_stats_sync structures for lockdep
  locking: Move the percpu-rwsem code to kernel/locking/
  locking: Move the lglocks code to kernel/locking/
  locking: Move the rwsem code to kernel/locking/
  locking: Move the rtmutex code to kernel/locking/
  locking: Move the semaphore core to kernel/locking/
  locking: Move the spinlock code to kernel/locking/
  locking: Move the lockdep code to kernel/locking/
  locking: Move the mutex code to kernel/locking/
  hung_task debugging: Add tracepoint to report the hang
  x86/locking/kconfig: Update paravirt spinlock Kconfig description
  lockstat: Report avg wait and hold times
  lockdep, x86/alternatives: Drop ancient lockdep fixup message
  ...
2013-11-14 16:30:30 +09:00
John Stultz 827da44c61 net: Explicitly initialize u64_stats_sync structures for lockdep
In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.

The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.

This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.

Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.

Feedback would be appreciated!

Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Simon Horman <horms@verge.net.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 12:40:25 +01:00
Eric W. Biederman a4fe34bf90 tcp_memcontrol: Remove the per netns control.
The code that is implemented is per memory cgroup not per netns, and
having per netns bits is just confusing.  Remove the per netns bits to
make it easier to see what is really going on.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:43:02 -04:00
Hannes Frederic Sowa 1bbdceef1e inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once
Initialize the ehash and ipv6_hash_secrets with net_get_random_once.

Each compilation unit gets its own secret now:
  ipv4/inet_hashtables.o
  ipv4/udp.o
  ipv6/inet6_hashtables.o
  ipv6/udp.o
  rds/connection.o

The functions still get inlined into the hashing functions. In the fast
path we have at most two (needed in ipv6) if (unlikely(...)).

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:45:35 -04:00
Eric Dumazet efe4208f47 ipv6: make lookups simpler and faster
TCP listener refactoring, part 4 :

To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common

Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.

Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).

inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6

This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.

inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr

And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.

We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-09 00:01:25 -04:00
Cong Wang 8ce4406103 ipv6: do not allow ipv6 module to be removed
There was some bug report on ipv6 module removal path before.
Also, as Stephen pointed out, after vxlan module gets ipv6 support,
the ipv6 stub it used is not safe against this module removal either.
So, let's just remove inet6_exit() so that ipv6 module will not be
able to be unloaded.

Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-24 11:31:58 -04:00
Michal Kubeček 2c861cc65e ipv6: don't call fib6_run_gc() until routing is ready
When loading the ipv6 module, ndisc_init() is called before
ip6_route_init(). As the former registers a handler calling
fib6_run_gc(), this opens a window to run the garbage collector
before necessary data structures are initialized. If a network
device is initialized in this window, adding MAC address to it
triggers a NETDEV_CHANGEADDR event, leading to a crash in
fib6_clean_all().

Take the event handler registration out of ndisc_init() into a
separate function ndisc_late_init() and move it after
ip6_route_init().

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11 17:04:09 -04:00
Cong Wang f564f45c45 vxlan: add ipv6 proxy support
This patch adds the IPv6 version of "arp_reduce", ndisc_send_na()
will be needed.

Cc: David S. Miller <davem@davemloft.net>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:30:01 -04:00
Cong Wang e15a00aafa vxlan: add ipv6 route short circuit support
route short circuit only has IPv4 part, this patch adds
the IPv6 part. nd_tbl will be needed.

Cc: David S. Miller <davem@davemloft.net>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:30:00 -04:00
Cong Wang 5f81bd2e5d ipv6: export a stub for IPv6 symbols used by vxlan
In case IPv6 is compiled as a module, introduce a stub
for ipv6_sock_mc_join and ipv6_sock_mc_drop etc.. It will be used
by vxlan module. Suggested by Ben.

This is an ugly but easy solution for now.

Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:30:00 -04:00
fan.du ca4c3fc24e net: split rt_genid for ipv4 and ipv6
Current net name space has only one genid for both IPv4 and IPv6, it has below
drawbacks:

- Add/delete an IPv4 address will invalidate all IPv6 routing table entries.
- Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table
  entries even when the policy is only applied for one address family.

Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6
separately in a fine granularity.

Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 14:56:36 -07:00
Lorenzo Colitti 6d0bfe2261 net: ipv6: Add IPv6 support to the ping socket.
This adds the ability to send ICMPv6 echo requests without a
raw socket. The equivalent ability for ICMPv4 was added in
2011.

Instead of having separate code paths for IPv4 and IPv6, make
most of the code in net/ipv4/ping.c dual-stack and only add a
few IPv6-specific bits (like the protocol definition) to a new
net/ipv6/ping.c. Hopefully this will reduce divergence and/or
duplication of bugs in the future.

Caveats:

- Setting options via ancillary data (e.g., using IPV6_PKTINFO
  to specify the outgoing interface) is not yet supported.
- There are no separate security settings for IPv4 and IPv6;
  everything is controlled by /proc/net/ipv4/ping_group_range.
- The proc interface does not yet display IPv6 ping sockets
  properly.

Tested with a patched copy of ping6 and using raw socket calls.
Compiles and works with all of CONFIG_IPV6={n,m,y}.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-25 21:07:49 -07:00
Pravin B Shelar c544193214 GRE: Refactor GRE tunneling code.
Following patch refactors GRE code into ip tunneling code and GRE
specific code. Common tunneling code is moved to ip_tunnel module.
ip_tunnel module is written as generic library which can be used
by different tunneling implementations.

ip_tunnel module contains following components:
 - packet xmit and rcv generic code. xmit flow looks like
   (gre_xmit/ipip_xmit)->ip_tunnel_xmit->ip_local_out.
 - hash table of all devices.
 - lookup for tunnel devices.
 - control plane operations like device create, destroy, ioctl, netlink
   operations code.
 - registration for tunneling modules, like gre, ipip etc.
 - define single pcpu_tstats dev->tstats.
 - struct tnl_ptk_info added to pass parsed tunnel packet parameters.

ipip.h header is renamed to ip_tunnel.h

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:27:18 -04:00
Hannes Frederic Sowa 842df07397 ipv6: use newly introduced __ipv6_addr_needs_scope_id and ipv6_iface_scope_id
This patch requires multicast interface-scoped addresses to supply a
sin6_scope_id. Because the sin6_scope_id is now also correctly used
in case of interface-scoped multicast traffic this enables one to use
interface scoped addresses over interfaces which are not targeted by the
default multicast route (the route has to be put there manually, though).

getsockname() and getpeername() now return the correct sin6_scope_id in
case of interface-local mc addresses.

v2:
a) rebased ontop of patch 1/4 (now uses ipv6_addr_props)

v3:
a) reverted changes for ipv6_addr_props

v4:
a) unchanged

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>dave
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 12:29:22 -05:00
YOSHIFUJI Hideaki / 吉藤英明 ba96bcbcd2 ipv6: Use FIELD_SIZEOF() in inet6_init().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-09 23:38:23 -08:00
Eric W. Biederman 3594698a1f net: Make CAP_NET_BIND_SERVICE per user namespace
Allow privileged users in any user namespace to bind to
privileged sockets in network namespaces they control.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:33:37 -05:00
Eric W. Biederman af31f412c7 net: Allow userns root to control ipv6
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.

In general policy and network stack state changes are allowed while
resource control is left unchanged.

Allow the SIOCSIFADDR ioctl to add ipv6 addresses.
Allow the SIOCDIFADDR ioctl to delete ipv6 addresses.
Allow the SIOCADDRT ioctl to add ipv6 routes.
Allow the SIOCDELRT ioctl to delete ipv6 routes.

Allow creation of ipv6 raw sockets.

Allow setting the IPV6_JOIN_ANYCAST socket option.
Allow setting the IPV6_FL_A_RENEW parameter of the IPV6_FLOWLABEL_MGR
socket option.

Allow setting the IPV6_TRANSPARENT socket option.
Allow setting the IPV6_HOPOPTS socket option.
Allow setting the IPV6_RTHDRDSTOPTS socket option.
Allow setting the IPV6_DSTOPTS socket option.
Allow setting the IPV6_IPSEC_POLICY socket option.
Allow setting the IPV6_XFRM_POLICY socket option.

Allow sending packets with the IPV6_2292HOPOPTS control message.
Allow sending packets with the IPV6_2292DSTOPTS control message.
Allow sending packets with the IPV6_RTHDRDSTOPTS control message.

Allow setting the multicast routing socket options on non multicast
routing sockets.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, and SIOCDELTUNNEL ioctls for
setting up, changing and deleting tunnels over ipv6.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, SIOCDELTUNNEL ioctls for
setting up, changing and deleting ipv6 over ipv4 tunnels.

Allow the SIOCADDPRL, SIOCDELPRL, SIOCCHGPRL ioctls for adding,
deleting, and changing the potential router list for ISATAP tunnels.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:32:45 -05:00
Vlad Yasevich c6b641a4c6 ipv6: Pull IPv6 GSO registration out of the module
Sing GSO support is now separate, pull it out of the module
and make it its own init call.
Remove the cleanup functions as they are no longer called.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 17:39:24 -05:00
Vlad Yasevich d1da932ed4 ipv6: Separate ipv6 offload support
Separate IPv6 offload functionality into its own file
in preparation for the move out of the module

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 17:36:17 -05:00
Vlad Yasevich 3336288a9f ipv6: Switch to using new offload infrastructure.
Switch IPv6 protocol to using the new GRO/GSO calls and data.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 17:36:17 -05:00
Vlad Yasevich 22061d8014 net: Switch to using the new packet offload infrustructure
Convert to using the new GSO/GRO registration mechanism and new
packet offload structure.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 17:36:17 -05:00
Eric Dumazet 863472454c ipv6: gro: fix PV6_GRO_CB(skb)->proto problem
It seems IPV6_GRO_CB(skb)->proto can be destroyed in skb_gro_receive()
if a new skb is allocated (to serve as an anchor for frag_list)

We copy NAPI_GRO_CB() only (not the IPV6 specific part) in :

*NAPI_GRO_CB(nskb) = *NAPI_GRO_CB(p);

So we leave IPV6_GRO_CB(nskb)->proto to 0 (fresh skb allocation) instead
of IPPROTO_TCP (6)

ipv6_gro_complete() isnt able to call ops->gro_complete()
[ tcp6_gro_complete() ]

Fix this by moving proto in NAPI_GRO_CB() and getting rid of
IPV6_GRO_CB

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08 15:40:43 -04:00
Eric Dumazet 51ec04038c ipv6: GRO should be ECN friendly
IPv4 side of the problem was addressed in commit a9e050f4e7
(net: tcp: GRO should be ECN friendly)

This patch does the same, but for IPv6 : A Traffic Class mismatch
doesnt mean flows are different, but instead should force a flush
of previous packets.

This patch removes artificial packet reordering problem.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-07 14:44:36 -04:00
Eric Dumazet 92113bfde2 ipv6: bool conversions phase1
ipv6_opt_accepted() returns a bool, and can use const pointers

ipv6_addr_equal(), ipv6_addr_any(), ipv6_addr_loopback(),
ipv6_addr_orchid() return a bool.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18 02:24:13 -04:00
Joe Perches f32138319c net: ipv6: Standardize prefixes for message logging
Add #define pr_fmt(fmt) as appropriate.

Add "IPv6: " to appropriate files.

Convert printk(KERN_<LEVEL> to pr_<level> (but not KERN_DEBUG).
Standardize on "%s: " not "%s(): " when emitting __func__.
Use "%s: ", __func__ instead of embedding function name.
Coalesce formats, align arguments.

ADDRCONF output is now prefixed with "IPv6: "

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 01:01:03 -04:00
Eldad Zack 647c0c70e8 net/ipv6/af_inet6.c: checkpatch cleanup
af_inet6.c:80: ERROR: do not initialise statics to 0 or NULL
af_inet6.c:259: ERROR: spaces required around that '=' (ctx:VxV)
af_inet6.c:394: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:412: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:422: ERROR: do not use assignment in if condition
af_inet6.c:425: ERROR: do not use assignment in if condition
af_inet6.c:433: ERROR: do not use assignment in if condition
af_inet6.c:437: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:446: ERROR: spaces required around that '=' (ctx:VxV)
af_inet6.c:478: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:485: ERROR: that open brace { should be on the previous line
af_inet6.c:485: ERROR: space required before the open parenthesis '('
af_inet6.c:513: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:629: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:647: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:687: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:709: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:1073: ERROR: space required before the open parenthesis '('

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-11 18:04:53 -04:00
Pavel Emelyanov 4a17fd5229 sock: Introduce named constants for sk_reuse
Name them in a "backward compatible" manner, i.e. reuse or not
are still 1 and 0 respectively. The reuse value of 2 means that
the socket with it will forcibly reuse everyone else's port.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Eric W. Biederman a5287acc6c net ipv6: Remove unneded registration of an empty net/ipv6/neigh
sysctl no longer requires explicit creation of directories.  The neigh
directory is always populated with at least a default entry so this
should cause no user visible changes.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:18 -04:00
David Howells 9ffc93f203 Remove all #inclusions of asm/system.h
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it.  Performed with the following command:

perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`

Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-28 18:30:03 +01:00
Jiri Benc 4c507d2897 net: implement IP_RECVTOS for IP_PKTOPTIONS
Currently, it is not easily possible to get TOS/DSCP value of packets from
an incoming TCP stream. The mechanism is there, IP_PKTOPTIONS getsockopt
with IP_RECVTOS set, the same way as incoming TTL can be queried. This is
not actually implemented for TOS, though.

This patch adds this functionality, both for IPv4 (IP_PKTOPTIONS) and IPv6
(IPV6_2292PKTOPTIONS). For IPv4, like in the IP_RECVTTL case, the value of
the TOS field is stored from the other party's ACK.

This is needed for proxies which require DSCP transparency. One such example
is at http://zph.bratcheda.org/.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-13 00:46:41 -05:00
Glauber Costa 3dc43e3e4d per-netns ipv4 sysctl_tcp_mem
This patch allows each namespace to independently set up
its levels for tcp memory pressure thresholds. This patch
alone does not buy much: we need to make this values
per group of process somehow. This is achieved in the
patches that follows in this patchset.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: David S. Miller <davem@davemloft.net>
CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12 19:04:11 -05:00
Alexey Dobriyan 4e3fd7a06d net: remove ipv6_addr_copy()
C assignment can handle struct in6_addr copying.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-22 16:43:32 -05:00
Michał Mirosław c8f44affb7 net: introduce and use netdev_features_t for device features sets
v2:	add couple missing conversions in drivers
	split unexporting netdev_fix_features()
	implemented %pNF
	convert sock::sk_route_(no?)caps

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-16 17:43:10 -05:00
Eric Dumazet 2a24444f8f ipv6: reduce percpu needs for icmpv6msg mibs
Reading /proc/net/snmp6 on a machine with a lot of cpus is very
expensive (can be ~88000 us).

This is because ICMPV6MSG MIB uses 4096 bytes per cpu, and folding
values for all possible cpus can read 16 Mbytes of memory (32MBytes on
non x86 arches)

ICMP messages are not considered as fast path on a typical server, and
eventually few cpus handle them anyway. We can afford an atomic
operation instead of using percpu data.

This saves 4096 bytes per cpu and per network namespace.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-14 00:12:26 -05:00
Maciej Żenczykowski f74024d9f0 net: make ipv6 bind honour freebind
This makes native ipv6 bind follow the precedent set by:
  - native ipv4 bind behaviour
  - dual stack ipv4-mapped ipv6 bind behaviour.

This does allow an unpriviledged process to spoof its source IPv6
address, just like it currently can spoof its source IPv4 address
(for example when using UDP).

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-08 15:13:03 -05:00
Yan, Zheng cdaf557034 gro: refetch inet6_protos[] after pulling ext headers
ipv6_gro_receive() doesn't update the protocol ops after pulling
the ext headers. It looks like a typo.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-10 14:26:16 -04:00
Marcus Meissner c349a528cd net: bind() fix error return on wrong address family
Hi,

Reinhard Max also pointed out that the error should EAFNOSUPPORT according
to POSIX.

The Linux manpages have it as EINVAL, some other OSes (Minix, HPUX, perhaps BSD) use
EAFNOSUPPORT. Windows uses WSAEFAULT according to MSDN.

Other protocols error values in their af bind() methods in current mainline git as far
as a brief look shows:
	EAFNOSUPPORT: atm, appletalk, l2tp, llc, phonet, rxrpc
	EINVAL: ax25, bluetooth, decnet, econet, ieee802154, iucv, netlink, netrom, packet, rds, rose, unix, x25,
	No check?: can/raw, ipv6/raw, irda, l2tp/l2tp_ip

Ciao, Marcus

Signed-off-by: Marcus Meissner <meissner@suse.de>
Cc: Reinhard Max <max@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-04 21:37:41 -07:00
Marcus Meissner 5a079c305a net/ipv6: check for mistakenly passed in non-AF_INET6 sockaddrs
Same check as for IPv4, also do for IPv6.

(If you passed in a IPv4 sockaddr_in here, the sizeof check
 in the line before would have triggered already though.)

Signed-off-by: Marcus Meissner <meissner@suse.de>
Cc: Reinhard Max <max@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-06 14:48:16 -07:00
Eric Dumazet b71d1d426d inet: constify ip headers and in6_addr
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-22 11:04:14 -07:00
Lucas De Marchi 25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
David S. Miller 1958b856c1 net: Put fl6_* macros to struct flowi6 and use them again.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:55 -08:00
David S. Miller 4c9483b2fb ipv6: Convert to use flowi6 where applicable.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:54 -08:00
David S. Miller 6281dcc94a net: Make flowi ports AF dependent.
Create two sets of port member accessors, one set prefixed by fl4_*
and the other prefixed by fl6_*

This will let us to create AF optimal flow instances.

It will work because every context in which we access the ports,
we have to be fully aware of which AF the flowi is anyways.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:46 -08:00
David S. Miller 1d28f42c1b net: Put flowi_* prefix on AF independent members of struct flowi
I intend to turn struct flowi into a union of AF specific flowi
structs.  There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:44 -08:00
David S. Miller 68d0c6d34d ipv6: Consolidate route lookup sequences.
Route lookups follow a general pattern in the ipv6 code wherein
we first find the non-IPSEC route, potentially override the
flow destination address due to ipv6 options settings, and then
finally make an IPSEC search using either xfrm_lookup() or
__xfrm_lookup().

__xfrm_lookup() is used when we want to generate a blackhole route
if the key manager needs to resolve the IPSEC rules (in this case
-EREMOTE is returned and the original 'dst' is left unchanged).

Otherwise plain xfrm_lookup() is used and when asynchronous IPSEC
resolution is necessary, we simply fail the lookup completely.

All of these cases are encapsulated into two routines,
ip6_dst_lookup_flow and ip6_sk_dst_lookup_flow.  The latter of which
handles unconnected UDP datagram sockets.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 13:19:07 -08:00
Michał Mirosław 04ed3e741d net: change netdev->features to u32
Quoting Ben Hutchings: we presumably won't be defining features that
can only be enabled on 64-bit architectures.

Occurences found by `grep -r` on net/, drivers/net, include/

[ Move features and vlan_features next to each other in
  struct netdev, as per Eric Dumazet's suggestion -DaveM ]

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 15:32:47 -08:00
Linus Torvalds 008d23e485 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
  Documentation/trace/events.txt: Remove obsolete sched_signal_send.
  writeback: fix global_dirty_limits comment runtime -> real-time
  ppc: fix comment typo singal -> signal
  drivers: fix comment typo diable -> disable.
  m68k: fix comment typo diable -> disable.
  wireless: comment typo fix diable -> disable.
  media: comment typo fix diable -> disable.
  remove doc for obsolete dynamic-printk kernel-parameter
  remove extraneous 'is' from Documentation/iostats.txt
  Fix spelling milisec -> ms in snd_ps3 module parameter description
  Fix spelling mistakes in comments
  Revert conflicting V4L changes
  i7core_edac: fix typos in comments
  mm/rmap.c: fix comment
  sound, ca0106: Fix assignment to 'channel'.
  hrtimer: fix a typo in comment
  init/Kconfig: fix typo
  anon_inodes: fix wrong function name in comment
  fix comment typos concerning "consistent"
  poll: fix a typo in comment
  ...

Fix up trivial conflicts in:
 - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
 - fs/ext4/ext4.h

Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
2011-01-13 10:05:56 -08:00
Tobias Klauser 376d940ee9 inet6: Remove redundant unlikely()
IS_ERR() already implies unlikely(), so it can be omitted here.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 14:57:34 -08:00
Uwe Kleine-König a34f0b3139 fix comment typos concerning "consistent"
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-12-10 16:04:28 +01:00
David S. Miller 9941fb6276 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2010-10-21 08:21:34 -07:00
Balazs Scheidler 0a513f6af9 tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-21 16:10:03 +02:00
Eric Dumazet a02cec2155 net: return operator cleanup
Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-23 14:33:39 -07:00
Changli Gao 7ba4291007 inet, inet6: make tcp_sendmsg() and tcp_sendpage() through inet_sendmsg() and inet_sendpage()
a new boolean flag no_autobind is added to structure proto to avoid the autobind
calls when the protocol is TCP. Then sock_rps_record_flow() is called int the
TCP's sendmsg() and sendpage() pathes.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/net/inet_common.h |    4 ++++
 include/net/sock.h        |    1 +
 include/net/tcp.h         |    8 ++++----
 net/ipv4/af_inet.c        |   15 +++++++++------
 net/ipv4/tcp.c            |   11 +++++------
 net/ipv4/tcp_ipv4.c       |    3 +++
 net/ipv6/af_inet6.c       |    8 ++++----
 net/ipv6/tcp_ipv6.c       |    3 +++
 8 files changed, 33 insertions(+), 20 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-12 20:21:46 -07:00
Eric Dumazet 1823e4c80e snmp: add align parameter to snmp_mib_init()
In preparation for 64bit snmp counters for some mibs,
add an 'align' parameter to snmp_mib_init(), instead
of assuming mibs only contain 'unsigned long' fields.

Callers can use __alignof__(type) to provide correct
alignment.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-25 21:33:17 -07:00
Arnaud Ebalard 20c59de2e6 ipv6: Refactor update of IPv6 flowi destination address for srcrt (RH) option
There are more than a dozen occurrences of following code in the
IPv6 stack:

    if (opt && opt->srcrt) {
            struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt;
            ipv6_addr_copy(&final, &fl.fl6_dst);
            ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
            final_p = &final;
    }

Replace those with a helper. Note that the helper overrides final_p
in all cases. This is ok as final_p was previously initialized to
NULL when declared.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-02 07:08:31 -07:00
David S. Miller 278554bd65 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/net/wireless/ath/ar9170/usb.c
	drivers/scsi/iscsi_tcp.c
	net/ipv4/ipmr.c
2010-05-12 00:05:35 -07:00
David S. Miller f935aa9e99 ipv6: Fix default multicast hops setting.
As per RFC 3493 the default multicast hops setting
for a socket should be "1" just like ipv4.

Ironically we have a IPV6_DEFAULT_MCASTHOPS macro
it just wasn't being used.

Reported-by: Elliot Hughes <enh@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-03 23:42:27 -07:00
Brian Haley 4b340ae20d IPv6: Complete IPV6_DONTFRAG support
Finally add support to detect a local IPV6_DONTFRAG event
and return the relevant data to the user if they've enabled
IPV6_RECVPATHMTU on the socket.  The next recvmsg() will
return no data, but have an IPV6_PATHMTU as ancillary data.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-23 23:35:29 -07:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Tejun Heo 7d720c3e4f percpu: add __percpu sparse annotations to net
Add __percpu sparse annotations to net.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors.  This patch doesn't affect normal builds.

The macro and type tricks around snmp stats make things a bit
interesting.  DEFINE/DECLARE_SNMP_STAT() macros mark the target field
as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly.  All
snmp_mib_*() users which used to cast the argument to (void **) are
updated to cast it to (void __percpu **).

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16 23:05:38 -08:00
Alexey Dobriyan 2c8c1e7297 net: spread __net_init, __net_exit
__net_init/__net_exit are apparently not going away, so use them
to full extent.

In some cases __net_init was removed, because it was called from
__net_exit code.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-17 19:16:02 -08:00
Eric Paris c84b3268da net: check kern before calling security subsystem
Before calling capable(CAP_NET_RAW) check if this operations is on behalf
of the kernel or on behalf of userspace.  Do not do the security check if
it is on behalf of the kernel.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-05 22:18:18 -08:00
Eric Paris 3f378b6844 net: pass kern to net_proto_family create function
The generic __sock_create function has a kern argument which allows the
security system to make decisions based on if a socket is being created by
the kernel or by userspace.  This patch passes that flag to the
net_proto_family specific create function, so it can do the same thing.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-05 22:18:14 -08:00
Eric Paris 13f18aa05f net: drop capability from protocol definitions
struct can_proto had a capability field which wasn't ever used.  It is
dropped entirely.

struct inet_protosw had a capability field which can be more clearly
expressed in the code by just checking if sock->type = SOCK_RAW.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-05 21:40:17 -08:00
Eric Dumazet 16ba5e8e7c ipv6: no more dev_put() in inet6_bind()
Avoids touching device refcount in inet6_bind(), thanks to RCU

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-02 03:42:41 -08:00
Eric Dumazet c720c7e838 inet: rename some inet_sock fields
In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.

Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)

This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-18 18:52:53 -07:00
Stephen Hemminger ec1b4cf74c net: mark net_proto_ops as const
All usages of structure net_proto_ops should be declared const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:10:46 -07:00
Brian Haley 51953d5bc4 Use sk_mark for IPv6 routing lookups
Atis Elsts wrote:
> Not sure if there is need to fill the mark from skb in tunnel xmit functions. In any case, it's not done for GRE or IPIP tunnels at the moment.

Ok, I'll just drop that part, I'm not sure what should be done in this case.

> Also, in this patch you are doing that for SIT (v6-in-v4) tunnels only, and not doing it for v4-in-v6 or v6-in-v6 tunnels. Any reason for that?

I just sent that patch out too quickly, here's a better one with the updates.

Add support for IPv6 route lookups using sk_mark.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:10:45 -07:00
Alexey Dobriyan 41135cc836 net: constify struct inet6_protocol
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-14 17:03:05 -07:00
David S. Miller 6cdee2f96a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/yellowfin.c
2009-09-02 00:32:56 -07:00
Bruno Prémont ca6982b858 ipv6: Fix commit 63d9950b08 (ipv6: Make v4-mapped bindings consistent with IPv4)
Commit 63d9950b08
  (ipv6: Make v4-mapped bindings consistent with IPv4)
changes behavior of inet6_bind() for v4-mapped addresses so it should
behave the same way as inet_bind().

During this change setting of err to -EADDRNOTAVAIL got lost:

af_inet.c:469 inet_bind()
	err = -EADDRNOTAVAIL;
	if (!sysctl_ip_nonlocal_bind &&
	    !(inet->freebind || inet->transparent) &&
	    addr->sin_addr.s_addr != htonl(INADDR_ANY) &&
	    chk_addr_ret != RTN_LOCAL &&
	    chk_addr_ret != RTN_MULTICAST &&
	    chk_addr_ret != RTN_BROADCAST)
		goto out;


af_inet6.c:463 inet6_bind()
	if (addr_type == IPV6_ADDR_MAPPED) {
		int chk_addr_ret;

		/* Binding to v4-mapped address on a v6-only socket                         
		 * makes no sense                                                           
		 */
		if (np->ipv6only) {
			err = -EINVAL;
			goto out; 
		}

		/* Reproduce AF_INET checks to make the bindings consitant */               
		v4addr = addr->sin6_addr.s6_addr32[3];                                      
		chk_addr_ret = inet_addr_type(net, v4addr);                                 
		if (!sysctl_ip_nonlocal_bind &&                                             
		    !(inet->freebind || inet->transparent) &&                               
		    v4addr != htonl(INADDR_ANY) &&
		    chk_addr_ret != RTN_LOCAL &&                                            
		    chk_addr_ret != RTN_MULTICAST &&                                        
		    chk_addr_ret != RTN_BROADCAST)
			goto out;
	} else {


Signed-off-by Bruno Prémont <bonbons@linux-vserver.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-23 19:06:28 -07:00
Sridhar Samudrala ba73542585 udpv6: Handle large incoming UDP/IPv6 packets and support software UFO
- validate and forward GSO UDP/IPv6 packets from untrusted sources.
- do software UFO if the outgoing device doesn't support UFO.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12 14:29:29 -07:00
Jesper Dangaard Brouer 1f2ccd00f2 ipv6: Use rcu_barrier() on module unload.
The ipv6 module uses rcu_call() thus it should use rcu_barrier() on
module unload.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-26 13:51:31 -07:00
Brian Haley 56d417b12e IPv6: Add 'autoconf' and 'disable_ipv6' module parameters
Add 'autoconf' and 'disable_ipv6' parameters to the IPv6 module.

The first controls if IPv6 addresses are autoconfigured from
prefixes received in Router Advertisements.  The IPv6 loopback
(::1) and link-local addresses are still configured.

The second controls if IPv6 addresses are desired at all.  No
IPv6 addresses will be added to any interfaces.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-01 03:07:33 -07:00
Herbert Xu a5b1cf288d gro: Avoid unnecessary comparison after skb_gro_header
For the overwhelming majority of cases, skb_gro_header's return
value cannot be NULL.  Yet we must check it because of its current
form.  This patch splits it up into multiple functions in order
to avoid this.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:26:01 -07:00
Vlad Yasevich 63d9950b08 ipv6: Make v4-mapped bindings consistent with IPv4
Binding to a v4-mapped address on an AF_INET6 socket should
produce the same result as binding to an IPv4 address on
AF_INET socket.  The two are interchangable as v4-mapped
address is really a portability aid.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-24 19:49:10 -07:00
Vlad Yasevich 0f8d3c7ac3 ipv6: Allow ipv4 wildcard binds after ipv6 address binds
The IPv4 wildcard (0.0.0.0) address does not intersect
in any way with explicit IPv6 addresses.  These two should
be permitted, but the IPv4 conflict code checks the ipv6only
bit as part of the test.  Since binding to an explicit IPv6
address restricts the socket to only that IPv6 address, the
side-effect is that the socket behaves as v6-only.  By
explicitely setting ipv6only in this case, allows the 2 binds
to succeed.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-24 19:49:10 -07:00
Vlad Yasevich 783ed5a783 ipv6: Disallow binding to v4-mapped address on v6-only socket.
A socket marked v6-only, can not receive or send traffic to v4-mapped
addresses.  Thus allowing binding to v4-mapped address on such a
socket makes no sense.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-24 19:49:09 -07:00
David S. Miller 2d6a5e9500 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/igb/igb_main.c
	drivers/net/qlge/qlge_main.c
	drivers/net/wireless/ath9k/ath9k.h
	drivers/net/wireless/ath9k/core.h
	drivers/net/wireless/ath9k/hw.c
2009-03-17 15:01:30 -07:00
John Dykstra ff8cf9a938 ipv6: Fix BUG when disabled ipv6 module is unloaded
Do not try to "uninitialize" ipv6 if its initialization had been skipped
because module parameter disable=1 had been specified.

Reported-by:  Thomas Backlund <tmb@mandriva.org>
Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Acked-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-11 09:22:51 -07:00
Stephen Hemminger 7546dd97d2 net: convert usage of packet_type to read_mostly
Protocols that use packet_type can be __read_mostly section for better
locality. Elminate any unnecessary initializations of NULL.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-10 05:22:43 -07:00
David S. Miller 508827ff0a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/tokenring/tmspci.c
	drivers/net/ucc_geth_mii.c
2009-03-05 02:06:47 -08:00
Brian Haley fe7ca2e1e8 IPv6: add "disable" module parameter support to ipv6.ko
Add "disable" module parameter support to ipv6.ko by specifying
"disable=1" on module load.  We just do the minimum of initializing
inetsw6[] so calls from other modules to inet6_register_protosw()
won't OOPs, then bail out.  No IPv6 addresses or sockets can be
created as a result, and a reboot is required to enable IPv6.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 03:19:08 -08:00
Harvey Harrison 09640e6365 net: replace uses of __constant_{endian}
Base versions handle constant folding now.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-01 00:45:17 -08:00
Herbert Xu 86911732d3 gro: Avoid copying headers of unmerged packets
Unfortunately simplicity isn't always the best.  The fraginfo
interface turned out to be suboptimal.  The problem was quite
obvious.  For every packet, we have to copy the headers from
the frags structure into skb->head, even though for 99% of the
packets this part is immediately thrown away after the merge.

LRO didn't have this problem because it directly read the headers
from the frags structure.

This patch attempts to address this by creating an interface
that allows GRO to access the headers in the first frag without
having to copy it.  Because all drivers that use frags place the
headers in the first frag this optimisation should be enough.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-29 16:33:03 -08:00
Herbert Xu ebad18e93f gro: Fix handling of complete checksums in IPv6
We need to perform skb_postpull_rcsum after pulling the IPv6
header in order to maintain the correctness of the complete
checksum.

This patch also adds a missing iph reload after pulling.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-20 14:44:01 -08:00
Herbert Xu 787e920836 ipv6: Add GRO support
This patch adds GRO support for IPv6.  IPv6 GRO supports extension
headers in the same way as GSO (by using the same infrastructure).
It's also simpler compared to IPv4 since we no longer have to worry
about fragmentation attributes or header checksums.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-08 10:40:57 -08:00
Alexey Dobriyan 52479b623d netns xfrm: lookup in netns
Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns
to flow_cache_lookup() and resolver callback.

Take it from socket or netdevice. Stub DECnet to init_net.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:35:18 -08:00
Alexey Dobriyan e7dc849494 netns: mib6 section fixlet
LD      net/ipv6/ipv6.o
WARNING: net/ipv6/ipv6.o(.text+0xd8): Section mismatch in reference from the function inet6_net_init() to the function .init.text:ipv6_init_mibs()

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-13 18:54:07 -07:00
Denis V. Lunev 2ca89cea5c ipv6: remove unused not init_ipv6_mibs/cleanup_ipv6_mibs
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08 11:17:07 -07:00