Commit graph

12037 commits

Author SHA1 Message Date
Eric Dumazet 1d45209d89 netfilter: nf_conntrack: Reduce conntrack count in nf_conntrack_free()
We use RCU to defer freeing of conntrack structures. In DOS situation, RCU might
accumulate about 10.000 elements per CPU in its internal queues. To get accurate
conntrack counts (at the expense of slightly more RAM used), we might consider
conntrack counter not taking into account "about to be freed elements, waiting
in RCU queues". We thus decrement it in nf_conntrack_free(), not in the RCU
callback.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Tested-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-24 14:26:50 +01:00
Vitaly Mayatskikh 30842f2989 udp: Wrong locking code in udp seq_file infrastructure
Reading zero bytes from /proc/net/udp or other similar files which use
the same seq_file udp infrastructure panics kernel in that way:

=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
read/1985 is trying to release lock (&table->hash[i].lock) at:
[<ffffffff81321d83>] udp_seq_stop+0x27/0x29
but there are no more locks to release!

other info that might help us debug this:
1 lock held by read/1985:
 #0:  (&p->lock){--..}, at: [<ffffffff810eefb6>] seq_read+0x38/0x348

stack backtrace:
Pid: 1985, comm: read Not tainted 2.6.29-rc8 #9
Call Trace:
 [<ffffffff81321d83>] ? udp_seq_stop+0x27/0x29
 [<ffffffff8106dab9>] print_unlock_inbalance_bug+0xd6/0xe1
 [<ffffffff8106db62>] lock_release_non_nested+0x9e/0x1c6
 [<ffffffff810ef030>] ? seq_read+0xb2/0x348
 [<ffffffff8106bdba>] ? mark_held_locks+0x68/0x86
 [<ffffffff81321d83>] ? udp_seq_stop+0x27/0x29
 [<ffffffff8106dde7>] lock_release+0x15d/0x189
 [<ffffffff8137163c>] _spin_unlock_bh+0x1e/0x34
 [<ffffffff81321d83>] udp_seq_stop+0x27/0x29
 [<ffffffff810ef239>] seq_read+0x2bb/0x348
 [<ffffffff810eef7e>] ? seq_read+0x0/0x348
 [<ffffffff8111aedd>] proc_reg_read+0x90/0xaf
 [<ffffffff810d878f>] vfs_read+0xa6/0x103
 [<ffffffff8106bfac>] ? trace_hardirqs_on_caller+0x12f/0x153
 [<ffffffff810d88a2>] sys_read+0x45/0x69
 [<ffffffff8101123a>] system_call_fastpath+0x16/0x1b
BUG: scheduling while atomic: read/1985/0xffffff00
INFO: lockdep is turned off.
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table dm_multipath kvm ppdev snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi_event arc4 snd_s
eq ecb thinkpad_acpi snd_seq_device iwl3945 hwmon sdhci_pci snd_pcm_oss sdhci rfkill mmc_core snd_mixer_oss i2c_i801 mac80211 yenta_socket ricoh_mmc i2c_core iTCO_wdt snd_pcm iTCO_vendor_support rs
rc_nonstatic snd_timer snd lib80211 cfg80211 soundcore snd_page_alloc video parport_pc output parport e1000e [last unloaded: scsi_wait_scan]
Pid: 1985, comm: read Not tainted 2.6.29-rc8 #9
Call Trace:
 [<ffffffff8106b456>] ? __debug_show_held_locks+0x1b/0x24
 [<ffffffff81043660>] __schedule_bug+0x7e/0x83
 [<ffffffff8136ede9>] schedule+0xce/0x838
 [<ffffffff810d7972>] ? fsnotify_access+0x5f/0x67
 [<ffffffff810112d0>] ? sysret_careful+0xb/0x37
 [<ffffffff8106be9c>] ? trace_hardirqs_on_caller+0x1f/0x153
 [<ffffffff8137127b>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff810112f6>] sysret_careful+0x31/0x37
read[1985]: segfault at 7fffc479bfe8 ip 0000003e7420a180 sp 00007fffc479bfa0 error 6
Kernel panic - not syncing: Aiee, killing interrupt handler!

udp_seq_stop() tries to unlock not yet locked spinlock. The lock was lost
during splitting global udp_hash_lock to subsequent spinlocks.

Signed-off by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-23 15:22:33 -07:00
David S. Miller 8be7cdccac Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/ucc_geth.c
2009-03-23 13:35:04 -07:00
Linus Torvalds d56ffd38a9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (32 commits)
  ucc_geth: Fix oops when using fixed-link support
  dm9000: locking bugfix
  net: update dnet.c for bus_id removal
  dnet: DNET should depend on HAS_IOMEM
  dca: add missing copyright/license headers
  nl80211: Check that function pointer != NULL before using it
  sungem: missing net_device_ops
  be2net: fix to restore vlan ids into BE2 during a IF DOWN->UP cycle
  be2net: replenish when posting to rx-queue is starved in out of mem conditions
  bas_gigaset: correctly allocate USB interrupt transfer buffer
  smsc911x: reset last known duplex and carrier on open
  sh_eth: Fix mistake of the address of SH7763
  sh_eth: Change handling of IRQ
  netns: oops in ip[6]_frag_reasm incrementing stats
  net: kfree(napi->skb) => kfree_skb
  net: fix sctp breakage
  ipv6: fix display of local and remote sit endpoints
  net: Document /proc/sys/net/core/netdev_budget
  tulip: fix crash on iface up with shirq debug
  virtio_net: Make virtio_net support carrier detection
  ...
2009-03-23 09:25:58 -07:00
Mark H. Weaver 534f81a506 netfilter: nf_conntrack_tcp: fix unaligned memory access in tcp_sack
This patch fixes an unaligned memory access in tcp_sack while reading
sequence numbers from TCP selective acknowledgement options.  Prior to
applying this patch, upstream linux-2.6.27.20 was occasionally
generating messages like this on my sparc64 system:

  [54678.532071] Kernel unaligned access at TPC[6b17d4] tcp_packet+0xcd4/0xd00

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-23 13:46:12 +01:00
Pablo Neira Ayuso dd5b6ce6fd nefilter: nfnetlink: add nfnetlink_set_err and use it in ctnetlink
This patch adds nfnetlink_set_err() to propagate the error to netlink
broadcast listener in case of memory allocation errors in the
message building.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-23 13:21:06 +01:00
Eric Leblond 176252746e netfilter: sysctl support of logger choice
This patchs adds support of modification of the used logger via sysctl.
It can be used to change the logger to module that can not use the bind
operation (ipt_LOG and ipt_ULOG). For this purpose, it creates a
directory /proc/sys/net/netfilter/nf_log which contains a file
per-protocol. The content of the file is the name current logger (NONE if
not set) and a logger can be setup by simply echoing its name to the file.
By echoing "NONE" to a /proc/sys/net/netfilter/nf_log/PROTO file, the
logger corresponding to this PROTO is set to NULL.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-23 13:16:53 +01:00
John Dykstra 96e0bf4b51 tcp: Discard segments that ack data not yet sent
Discard incoming packets whose ack field iincludes data not yet sent.
This is consistent with RFC 793 Section 3.9.

Change tcp_ack() to distinguish between too-small and too-large ack
field values.  Keep segments with too-large ack fields out of the fast
path, and change slow path to discard them.

Reported-by:  Oliver Zheng <mailinglists+netdev@oliverzheng.com>
Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-22 21:49:57 -07:00
Stephen Hemminger d44c3a2e0e netdev: expose net_device_ops compat as config option
Now that most network device drivers in (all but one in x86_64 allmodconfig)
support net_device_ops. Expose it as a configuration parameter. Still
need to address even older 32 bit drivers, and other arch before
compatiablity can be scheduled for removal in some future release.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 22:55:36 -07:00
Stephen Hemminger 9cc8ba783d irlan: convert to net_device_ops
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 19:19:16 -07:00
Stephen Hemminger 92bcd4fe9a irda: net_device_ops ioctl fix
Need to reference net_device_ops not old pointer.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 19:19:14 -07:00
Stephen Hemminger dde0975855 atm: convert clip driver to net_device_ops
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 19:19:12 -07:00
Stephen Hemminger 788dee0a95 atm: convert mpc device to using netdev_ops
This converts the mpc device to using new netdevice_ops.
Compile tested only, needs more than usual review since
device was swaping pointers around etc.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 19:19:12 -07:00
Lennert Buytenhek e84665c9cb dsa: add switch chip cascading support
The initial version of the DSA driver only supported a single switch
chip per network interface, while DSA-capable switch chips can be
interconnected to form a tree of switch chips.  This patch adds support
for multiple switch chips on a network interface.

An example topology for a 16-port device with an embedded CPU is as
follows:

	+-----+          +--------+       +--------+
	|     |eth0    10| switch |9    10| switch |
	| CPU +----------+        +-------+        |
	|     |          | chip 0 |       | chip 1 |
	+-----+          +---++---+       +---++---+
	                     ||               ||
	                     ||               ||
	                     ||1000baseT      ||1000baseT
	                     ||ports 1-8      ||ports 9-16

This requires a couple of interdependent changes in the DSA layer:

- The dsa platform driver data needs to be extended: there is still
  only one netdevice per DSA driver instance (eth0 in the example
  above), but each of the switch chips in the tree needs its own
  mii_bus device pointer, MII management bus address, and port name
  array. (include/net/dsa.h)  The existing in-tree dsa users need
  some small changes to deal with this. (arch/arm)

- The DSA and Ethertype DSA tagging modules need to be extended to
  use the DSA device ID field on receive and demultiplex the packet
  accordingly, and fill in the DSA device ID field on transmit
  according to which switch chip the packet is heading to.
  (net/dsa/tag_{dsa,edsa}.c)

- The concept of "CPU port", which is the switch chip port that the
  CPU is connected to (port 10 on switch chip 0 in the example), needs
  to be extended with the concept of "upstream port", which is the
  port on the switch chip that will bring us one hop closer to the CPU
  (port 10 for both switch chips in the example above).

- The dsa platform data needs to specify which ports on which switch
  chips are links to other switch chips, so that we can enable DSA
  tagging mode on them.  (For inter-switch links, we always use
  non-EtherType DSA tagging, since it has lower overhead.  The CPU
  link uses dsa or edsa tagging depending on what the 'root' switch
  chip supports.)  This is done by specifying "dsa" for the given
  port in the port array.

- The dsa platform data needs to be extended with information on via
  which port to reach any given switch chip from any given switch chip.
  This info is specified via the per-switch chip data struct ->rtable[]
  array, which gives the nexthop ports for each of the other switches
  in the tree.

For the example topology above, the dsa platform data would look
something like this:

	static struct dsa_chip_data sw[2] = {
		{
			.mii_bus	= &foo,
			.sw_addr	= 1,
			.port_names[0]	= "p1",
			.port_names[1]	= "p2",
			.port_names[2]	= "p3",
			.port_names[3]	= "p4",
			.port_names[4]	= "p5",
			.port_names[5]	= "p6",
			.port_names[6]	= "p7",
			.port_names[7]	= "p8",
			.port_names[9]	= "dsa",
			.port_names[10]	= "cpu",
			.rtable		= (s8 []){ -1, 9, },
		}, {
			.mii_bus	= &foo,
			.sw_addr	= 2,
			.port_names[0]	= "p9",
			.port_names[1]	= "p10",
			.port_names[2]	= "p11",
			.port_names[3]	= "p12",
			.port_names[4]	= "p13",
			.port_names[5]	= "p14",
			.port_names[6]	= "p15",
			.port_names[7]	= "p16",
			.port_names[10]	= "dsa",
			.rtable		= (s8 []){ 10, -1, },
		},
	},

	static struct dsa_platform_data pd = {
		.netdev		= &foo,
		.nr_switches	= 2,
		.sw		= sw,
	};

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Tested-by: Gary Thomas <gary@mlbassoc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 19:06:54 -07:00
Lennert Buytenhek 076d3e10a5 dsa: add support for the Marvell 88E6095/6095F switch chips
Add support for the Marvell 88E6095/6095F switch chips.  These
chips are similar to the 88e6131, so we can add the support to
mv88e6131.c easily.

Thanks to Gary Thomas <gary@mlbassoc.com> and Jesper Dangaard
Brouer <hawk@diku.dk> for testing various patches.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Tested-by: Gary Thomas <gary@mlbassoc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 19:06:54 -07:00
Lennert Buytenhek c084080151 dsa: set ->iflink on slave interfaces to the ifindex of the parent
..so that we can parse the DSA topology from 'ip link' output:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
4: lan1@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
5: lan2@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
6: lan3@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
7: lan4@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 19:06:53 -07:00
Stephen Hemminger fa665ccf01 ipx: use constant for strings and desciptor
Fix compiler warning about non-const format string.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 19:06:51 -07:00
Stephen Hemminger 7ca98fa234 snap: use const for descriptor
Protocols should be able to use constant value for the descriptor.
Minor whitespace cleanup as well

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 19:06:50 -07:00
Eric Dumazet ed734a97c6 net: remove useless prefetch() call
There is no gain using prefetch() in dev_hard_start_xmit(), since
we already had to read ops->ndo_select_queue pointer in dev_pick_tx(),
and both pointers are probably located in the same cache line.

This prefetch call slows down fast path because of a stall in address
computation.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 13:42:55 -07:00
Vlad Yasevich 8d2f9e8116 sctp: Clean up TEST_FRAME hacks.
Remove 2 TEST_FRAME hacks that are no longer needed.  These allowed
sctp regression tests to compile before, but are no longer needed.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 13:41:09 -07:00
Stephen Hemminger 9247744e5e skb: expose and constify hash primitives
Some minor changes to queue hashing:
 1. Use const on accessor functions
 2. Export skb_tx_hash for use in drivers (see ixgbe)

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 13:39:26 -07:00
Stephen Hemminger 1f1900f935 atm: lec use dev_change_mtu
Rather than calling device pointer directly (which is incorrect with
net_device_ops), use the standard dev_change_mtu. Compile tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 13:37:28 -07:00
Ilpo Järvinen a0bffffc14 net/*: use linux/kernel.h swap()
tcp_sack_swap seems unnecessary so I pushed swap to the caller.
Also removed comment that seemed then pointless, and added include
when not already there. Compile tested.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 13:36:17 -07:00
Bernard Pidoux a3ac80a130 netrom: zero length frame filtering in NetRom
A zero length frame filter was recently introduced in ROSE protocole.
Previous commit makes the same at AX25 protocole level.
This patch has the same purpose for NetRom  protocole.
The reason is that empty frames have no meaning in NetRom protocole.

Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 13:34:20 -07:00
Bernard Pidoux f99bcff7a2 ax25: zero length frame filtering in AX25
In previous commit 244f46ae6e
was introduced a zero length frame filter for ROSE protocole.
This patch has the same purpose at AX25 frame level for the same
reason. Empty frames have no meaning in AX25 protocole.

Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 13:33:55 -07:00
Bernard Pidoux 60784427ab ax25: SOCK_DEBUG message simplification
This patch condenses two debug messages in one.

Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 13:33:18 -07:00
Jouni Malinen f3f9258678 nl80211: Check that function pointer != NULL before using it
NL80211_CMD_GET_MESH_PARAMS and NL80211_CMD_SET_MESH_PARAMS handlers
did not verify whether a function pointer is NULL (not supported by
the driver) before trying to call the function. The former nl80211
command is available for unprivileged users, too, so this can
potentially allow normal users to kill networking (or worse..) if
mac80211 is built without CONFIG_MAC80211_MESH=y.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-20 16:01:57 -04:00
David S. Miller 2b1c4354de Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/virtio_net.c
2009-03-20 02:27:41 -07:00
Jorge Boncompte [DTI2] 2bad35b7c9 netns: oops in ip[6]_frag_reasm incrementing stats
dev can be NULL in ip[6]_frag_reasm for skb's coming from RAW sockets.

Quagga's OSPFD sends fragmented packets on a RAW socket, when netfilter
conntrack reassembles them on the OUTPUT path you hit this code path.

You can test it with something like "hping2 -0 -d 2000 -f AA.BB.CC.DD"

With help from Jarek Poplawski.

Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-18 23:26:11 -07:00
Roel Kluin e4a389a9b5 net: kfree(napi->skb) => kfree_skb
struct sk_buff pointers should be freed with kfree_skb.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-18 23:12:13 -07:00
Al Viro cb0dc77de0 net: fix sctp breakage
broken by commit 5e739d1752aca4e8f3e794d431503bfca3162df4; AFAICS should
be -stable fodder as well...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Aced-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-18 19:12:42 -07:00
Stephen Hemminger 4b704d59d6 tipc: fix non-const printf format arguments
Fix warnings from current gcc about using non-const strings as printf
args in TIPC. Compile tested only (not a TIPC user).

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-18 19:11:29 -07:00
Bjørn Mork 1b1d8f73a4 ipv6: fix display of local and remote sit endpoints
This fixes the regressions cause by
commit 1326c3d5a4
(v2.6.28-rc6-461-g23a12b1) broke the display of local and remote
addresses of an SIT tunnel in iproute2.

nt->parms is used by ipip6_tunnel_init() and therefore need to be
initialized first.

Tracked as http://bugzilla.kernel.org/show_bug.cgi?id=12868

Reported-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-18 18:56:54 -07:00
Rami Rosen beedad923a tcp: remove parameter from tcp_recv_urg().
This patch removes an unused parameter (addr_len) from tcp_recv_urg()
method in net/ipv4/tcp.c.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-18 18:50:09 -07:00
Brian Haley 9bdd8d40c8 ipv6: Fix incorrect disable_ipv6 behavior
Fix the behavior of allowing both sysctl and addrconf_dad_failure()
to set the disable_ipv6 parameter without any bad side-effects.
If DAD fails and accept_dad > 1, we will still set disable_ipv6=1,
but then instead of allowing an RA to add an address then
immediately fail DAD, we simply don't allow the address to be
added in the first place.  This also lets the user set this flag
and disable all IPv6 addresses on the interface, or on the entire
system.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-18 18:22:48 -07:00
Patrick McHardy 0f5b3e85a3 netfilter: ctnetlink: fix rcu context imbalance
Introduced by 7ec47496 (netfilter: ctnetlink: cleanup master conntrack assignation):

net/netfilter/nf_conntrack_netlink.c:1275:2: warning: context imbalance in 'ctnetlink_create_conntrack' - different lock contexts for basic block

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-18 17:36:40 +01:00
Florian Westphal 711d60a9e7 netfilter: remove nf_ct_l4proto_find_get/nf_ct_l4proto_put
users have been moved to __nf_ct_l4proto_find.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-18 17:30:50 +01:00
Florian Westphal cd91566e4b netfilter: ctnetlink: remove remaining module refcounting
Convert the remaining refcount users.

As pointed out by Patrick McHardy, the protocols can be accessed safely using RCU.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-18 17:28:37 +01:00
David S. Miller af4330631c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-03-17 15:04:31 -07:00
David S. Miller 2d6a5e9500 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/igb/igb_main.c
	drivers/net/qlge/qlge_main.c
	drivers/net/wireless/ath9k/ath9k.h
	drivers/net/wireless/ath9k/core.h
	drivers/net/wireless/ath9k/hw.c
2009-03-17 15:01:30 -07:00
David S. Miller f10023a4ef Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-03-17 14:29:22 -07:00
David S. Miller 4ada8107f4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2009-03-17 13:12:47 -07:00
Herbert Xu 303c6a0251 gro: Fix legacy path napi_complete crash
On the legacy netif_rx path, I incorrectly tried to optimise
the napi_complete call by using __napi_complete before we reenable
IRQs.  This simply doesn't work since we need to flush the held
GRO packets first.

This patch fixes it by doing the obvious thing of reenabling
IRQs first and then calling napi_complete.

Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-17 13:11:29 -07:00
Herbert Xu 2ffb455819 gro: Fix vlan/netpoll check again
Jarek Poplawski pointed out that my previous fix is broken for
VLAN+netpoll as if netpoll is enabled we'd end up in the normal
receive path instead of the VLAN receive path.

This patch fixes it by calling the VLAN receive hook.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-17 13:10:52 -07:00
Luis R. Rodriguez 73d54c9e74 cfg80211: add regulatory netlink multicast group
This allows us to send to userspace "regulatory" events.
For now we just send an event when we change regulatory domains.
We also notify userspace when devices are using their own custom
world roaming regulatory domains.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:40 -04:00
Luis R. Rodriguez 7db90f4a25 cfg80211: move enum reg_set_by to nl80211.h
We do this so we can later inform userspace who set the
regulatory domain and provide details of the request.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:40 -04:00
Luis R. Rodriguez 0fee54cab7 cfg80211: remove REGDOM_SET_BY_INIT
This is not used as we can always just assume the first
regulatory domain set will _always_ be a static regulatory
domain. REGDOM_SET_BY_CORE will be the first request from
cfg80211 for a regdomain and that then populates the first
regulatory request.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:39 -04:00
Herton Ronaldo Krzesinski 1a28c78b46 mac80211: deauth before flushing STA information
Even after commit "mac80211: deauth when interface is marked down"
(e327b847 on Linus tree), userspace still isn't notified when interface
goes down. There isn't a problem with this commit, but because of other
code changes it doesn't work on kernels >= 2.6.28 (works if same/similar
change applied on 2.6.27 for example).

The issue is as follows: after commit "mac80211: restructure disassoc/deauth
flows" in 2.6.28, the call to ieee80211_sta_deauthenticate added by
commit e327b847 will not work: because we do sta_info_flush(local, sdata)
inside ieee80211_stop (iface.c), all stations in interface are cleared, so
when calling ieee80211_sta_deauthenticate->ieee80211_set_disassoc (mlme.c),
inside ieee80211_set_disassoc we have this in the beginning:

         sta = sta_info_get(local, ifsta->bssid);
         if (!sta) {

The !sta check triggers, thus the function returns early and
ieee80211_sta_send_apinfo(sdata, ifsta) later isn't called, so
wpa_supplicant/userspace isn't notified with SIOCGIWAP.

This commit moves deauthentication to before flushing STA info
(sta_info_flush), thus the above can't happen and userspace is really
notified when interface goes down.

Signed-off-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:39 -04:00
Helmut Schaa af88b9078d mac80211: handle failed scan requests in STA mode
If cfg80211 requests a scan it awaits either a return code != 0 from
the scan function or the cfg80211_scan_done to be called. In case of
a STA mac80211's scan function ever returns 0 and queues the scan request.
If ieee80211_sta_work is executed and ieee80211_start_scan fails for
some reason cfg80211_scan_done will never be called but cfg80211 still
thinks the scan was triggered successfully and will refuse any future
scan requests due to drv->scan_req not being cleaned up.

If a scan is triggered from within the MLME a similar problem appears. If
ieee80211_start_scan returns an error, local->scan_req will not be reset
and mac80211 will refuse any future scan requests.

Hence, in both cases call ieee80211_scan_failed (which notifies cfg80211
and resets local->scan_req) if ieee80211_start_scan returns an error.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:38 -04:00
Luis R. Rodriguez ec329acef9 cfg80211: fix max tx power for world regdom on 5 GHz to 20dBm
This is the lowest value amongst countries which do enable 5 GHz operation.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:29 -04:00
Luis R. Rodriguez 611b6a82aa cfg80211: Enable passive scan on channels 12-14 for world roaming
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:29 -04:00
Jouni Malinen 0eeb59fe2c mac80211: Fix WMM ACM parsing and AC downgrade operation
Incorrect local->wmm_acm bits were set for AC_BK and AC_BE. Fix this
and add some comments to make it easier to understand the AC-to-UP(pair)
mapping. Set the wmm_acm bits (and show WMM debug) even if the driver
does not implement conf_tx() handler.

In addition, fix the ACM-based AC downgrade code to not use the
highest priority in error cases. We need to break the loop to get the
correct AC_BK value (3) instead of returning 0 (which would indicate
AC_VO). The comment here was not really very useful either, so let's
provide somewhat more helpful description of the situation.

Since it is very unlikely that the ACM flag would be set for AC_BK and
AC_BE, these bugs are not likely to be seen in real life networks.
Anyway, better do these things correctly should someone really use
silly AP configuration (and to pass some functionality tests, too).

Remove the TODO comment about handling ACM. Downgrading AC is
perfectly valid mechanism for ACM. Eventually, we may add support for
WMM-AC and send a request for a TS, but anyway, that functionality
won't be here at the location of this TODO comment.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:27 -04:00
Jouni Malinen 055249d20d mac80211: Fix panic on fragmentation with power saving
It was possible to hit a kernel panic on NULL pointer dereference in
dev_queue_xmit() when sending power save buffered frames to a STA that
woke up from sleep. This happened when the buffered frame was requeued
for transmission in ap_sta_ps_end(). In order to avoid the panic, copy
the skb->dev and skb->iif values from the first fragment to all other
fragments.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:01:59 -04:00
John W. Linville 6f16bf3bdb lib80211: silence excessive crypto debugging messages
When they were part of the now defunct ieee80211 component, these
messages were only visible when special debugging settings were enabled.
Let's mirror that with a new lib80211 debugging Kconfig option.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:01:58 -04:00
Herbert Xu d1c76af9e2 GRO: Move netpoll checks to correct location
As my netpoll fix for net doesn't really work for net-next, we
need this update to move the checks into the right place.  As it
stands we may pass freed skbs to netpoll_receive_skb.

This patch also introduces a netpoll_rx_on function to avoid GRO
completely if we're invoked through netpoll.  This might seem
paranoid but as netpoll may have an external receive hook it's
better to be safe than sorry.  I don't think we need this for
2.6.29 though since there's nothing immediately broken by it.

This patch also moves the GRO_* return values to netdevice.h since
VLAN needs them too (I tried to avoid this originally but alas
this seems to be the easiest way out).  This fixes a bug in VLAN
where it continued to use the old return value 2 instead of the
correct GRO_DROP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-16 10:50:02 -07:00
Pablo Neira Ayuso 0269ea4937 netfilter: xtables: add cluster match
This patch adds the iptables cluster match. This match can be used
to deploy gateway and back-end load-sharing clusters. The cluster
can be composed of 32 nodes maximum (although I have only tested
this with two nodes, so I cannot tell what is the real scalability
limit of this solution in terms of cluster nodes).

Assuming that all the nodes see all packets (see below for an
example on how to do that if your switch does not allow this), the
cluster match decides if this node has to handle a packet given:

	(jhash(source IP) % total_nodes) & node_mask

For related connections, the master conntrack is used. The following
is an example of its use to deploy a gateway cluster composed of two
nodes (where this is the node 1):

iptables -I PREROUTING -t mangle -i eth1 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth1 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth1 \
	-m mark ! --mark 0xffff -j DROP
iptables -A PREROUTING -t mangle -i eth2 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth2 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth2 \
	-m mark ! --mark 0xffff -j DROP

And the following commands to make all nodes see the same packets:

ip maddr add 01:00:5e:00:01:01 dev eth1
ip maddr add 01:00:5e:00:01:02 dev eth2
arptables -I OUTPUT -o eth1 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:01
arptables -I INPUT -i eth1 --h-length 6 \
	--destination-mac 01:00:5e:00:01:01 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
arptables -I OUTPUT -o eth2 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:02
arptables -I INPUT -i eth2 --h-length 6 \
	--destination-mac 01:00:5e:00:01:02 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27

In the case of TCP connections, pickup facility has to be disabled
to avoid marking TCP ACK packets coming in the reply direction as
valid.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

BTW, some final notes:

 * This match mangles the skbuff pkt_type in case that it detects
PACKET_MULTICAST for a non-multicast address. This may be done in
a PKTTYPE target for this sole purpose.
 * This match supersedes the CLUSTERIP target.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 17:10:36 +01:00
Cyrill Gorcunov 1546000fe8 net: netfilter conntrack - add per-net functionality for DCCP protocol
Module specific data moved into per-net site and being allocated/freed
during net namespace creation/deletion.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 16:30:49 +01:00
Cyrill Gorcunov 81a1d3c31e net: sysctl_net - use net_eq to compare nets
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 16:23:30 +01:00
Linus Torvalds 8e91f178a2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
  r8169: revert "r8169: read MAC address from EEPROM on init (2nd attempt)"
  r8169: use hardware auto-padding.
  igb: remove ASPM L0s workaround
  netxen: remove old flash check.
  mv643xx_eth: fix unicast address filter corruption on mtu change
  xfrm: Fix xfrm_state_find() wrt. wildcard source address.
  emac: Fix clock control for 405EX and 405EXr chips
  ixgbe: fix multiple unicast address support
  via-velocity: Fix DMA mapping length errors on transmit.
  qlge: bugfix: Pad outbound frames smaller than 60 bytes.
  qlge: bugfix: Move netif_napi_del() to common call point.
  qlge: bugfix: Tell hw to strip vlan header.
  qlge: bugfix: Increase filter on inbound csum.
  dnet: replace obsolete *netif_rx_* functions with *napi_*
  net: Add be2net driver.
  dnet: Fix warnings on 64-bit.
  dnet: Dave DNET ethernet controller driver (updated)
  ipv6:  Fix BUG when disabled ipv6 module is unloaded
  bnx2x: Using DMAE to initialize the chip
  bnx2x: Casting page alignment
  ...
2009-03-16 07:56:58 -07:00
Christoph Paasch d1238d5337 netfilter: conntrack: check for NEXTHDR_NONE before header sanity checking
NEXTHDR_NONE doesn't has an IPv6 option header, so the first check
for the length will always fail and results in a confusing message
"too short" if debugging enabled. With this patch, we check for
NEXTHDR_NONE before length sanity checkings are done.

Signed-off-by: Christoph Paasch <christoph.paasch@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:52:11 +01:00
Christoph Paasch ec8d540969 netfilter: conntrack: fix dropping packet after l4proto->packet()
We currently use the negative value in the conntrack code to encode
the packet verdict in the error. As NF_DROP is equal to 0, inverting
NF_DROP makes no sense and, as a result, no packets are ever dropped.

Signed-off-by: Christoph Paasch <christoph.paasch@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:51:29 +01:00
Pablo Neira Ayuso 626ba8fbac netfilter: ctnetlink: fix crash during expectation creation
This patch fixes a possible crash due to the missing initialization
of the expectation class when nf_ct_expect_related() is called.

Reported-by: BORBELY Zoltan <bozo@andrews.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:50:51 +01:00
Jan Engelhardt acc738fec0 netfilter: xtables: avoid pointer to self
Commit 784544739a (netfilter: iptables:
lock free counters) broke a number of modules whose rule data referenced
itself. A reallocation would not reestablish the correct references, so
it is best to use a separate struct that does not fall under RCU.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:35:29 +01:00
Jonathan Corbet 76398425bb Move FASYNC bit handling to f_op->fasync()
Removing the BKL from FASYNC handling ran into the challenge of keeping the
setting of the FASYNC bit in filp->f_flags atomic with regard to calls to
the underlying fasync() function.  Andi Kleen suggested moving the handling
of that bit into fasync(); this patch does exactly that.  As a result, we
have a couple of internal API changes: fasync() must now manage the FASYNC
bit, and it will be called without the BKL held.

As it happens, every fasync() implementation in the kernel with one
exception calls fasync_helper().  So, if we make fasync_helper() set the
FASYNC bit, we can avoid making any changes to the other fasync()
functions - as long as those functions, themselves, have proper locking.
Most fasync() implementations do nothing but call fasync_helper() - which
has its own lock - so they are easily verified as correct.  The BKL had
already been pushed down into the rest.

The networking code has its own version of fasync_helper(), so that code
has been augmented with explicit FASYNC bit handling.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Miller <davem@davemloft.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2009-03-16 08:32:27 -06:00
Scott James Remnant 95ba434f89 netfilter: auto-load ip_queue module when socket opened
The ip_queue module is missing the net-pf-16-proto-3 alias that would
causae it to be auto-loaded when a socket of that type is opened.  This
patch adds the alias.

Signed-off-by: Scott James Remnant <scott@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:31:10 +01:00
Scott James Remnant 26c3b67806 netfilter: auto-load ip6_queue module when socket opened
The ip6_queue module is missing the net-pf-16-proto-13 alias that would
cause it to be auto-loaded when a socket of that type is opened.  This
patch adds the alias.

Signed-off-by: Scott James Remnant <scott@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:30:14 +01:00
Pablo Neira Ayuso f0a3c0869f netfilter: ctnetlink: move event reporting for new entries outside the lock
This patch moves the event reporting outside the lock section. With
this patch, the creation and update of entries is homogeneous from
the event reporting perspective. Moreover, as the event reporting is
done outside the lock section, the netlink broadcast delivery can
benefit of the yield() call under congestion.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:28:09 +01:00
Pablo Neira Ayuso e098360f15 netfilter: ctnetlink: cleanup conntrack update preliminary checkings
This patch moves the preliminary checkings that must be fulfilled
to update a conntrack, which are the following:

 * NAT manglings cannot be updated
 * Changing the master conntrack is not allowed.

This patch is a cleanup.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:27:22 +01:00
Pablo Neira Ayuso 7ec4749675 netfilter: ctnetlink: cleanup master conntrack assignation
This patch moves the assignation of the master conntrack to
ctnetlink_create_conntrack(), which is where it really belongs.
This patch is a cleanup.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:25:46 +01:00
Pablo Neira Ayuso 1db7a748df netfilter: conntrack: increase drop stats if sequence adjustment fails
This patch increases the statistics of packets drop if the sequence
adjustment fails in ipv4_confirm().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:18:50 +01:00
Stephen Hemminger 67c0d57930 netfilter: Kconfig spelling fixes (trivial)
Signed-off-by: Stephen Hemminger <sheminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:17:23 +01:00
Christoph Paasch 9d2493f88f netfilter: remove IPvX specific parts from nf_conntrack_l4proto.h
Moving the structure definitions to the corresponding IPvX specific header files.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:15:35 +01:00
Eric Leblond c7a913cd55 netfilter: print the list of register loggers
This patch modifies the proc output to add display of registered
loggers. The content of /proc/net/netfilter/nf_log is modified. Instead
of displaying a protocol per line with format:
	proto:logger
it now displays:
	proto:logger (comma_separated_list_of_loggers)
NONE is used as keyword if no logger is used.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 14:55:27 +01:00
Eric Leblond ca735b3aaa netfilter: use a linked list of loggers
This patch modifies nf_log to use a linked list of loggers for each
protocol. This list of loggers is read and write protected with a
mutex.

This patch separates registration and binding. To be used as
logging module, a module has to register calling nf_log_register()
and to bind to a protocol it has to call nf_log_bind_pf().
This patch also converts the logging modules to the new API. For nfnetlink_log,
it simply switchs call to register functions to call to bind function and
adds a call to nf_log_register() during init. For other modules, it just
remove a const flag from the logger structure and replace it with a
__read_mostly.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 14:54:21 +01:00
Ilpo Järvinen afece1c658 tcp: make sure xmit goal size never becomes zero
It's not too likely to happen, would basically require crafted
packets (must hit the max guard in tcp_bound_to_half_wnd()).
It seems that nothing that bad would happen as there's tcp_mems
and congestion window that prevent runaway at some point from
hurting all too much (I'm not that sure what all those zero
sized segments we would generate do though in write queue).
Preventing it regardless is certainly the best way to go.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:09:55 -07:00
Ilpo Järvinen 2a3a041c4e tcp: cache result of earlier divides when mss-aligning things
The results is very unlikely change every so often so we
hardly need to divide again after doing that once for a
connection. Yet, if divide still becomes necessary we
detect that and do the right thing and again settle for
non-divide state. Takes the u16 space which was previously
taken by the plain xmit_size_goal.

This should take care part of the tso vs non-tso difference
we found earlier.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:09:55 -07:00
Ilpo Järvinen 0c54b85f28 tcp: simplify tcp_current_mss
There's very little need for most of the callsites to get
tp->xmit_goal_size updated. That will cost us divide as is,
so slice the function in two. Also, the only users of the
tp->xmit_goal_size are directly behind tcp_current_mss(),
so there's no need to store that variable into tcp_sock
at all! The drop of xmit_goal_size currently leaves 16-bit
hole and some reorganization would again be necessary to
change that (but I'm aiming to fill that hole with u16
xmit_goal_size_segs to cache the results of the remaining
divide to get that tso on regression).

Bring xmit_goal_size parts into tcp.c

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:09:54 -07:00
Ilpo Järvinen 72211e9050 tcp: don't check mtu probe completion in the loop
It seems that no variables clash such that we couldn't do
the check just once later on. Therefore move it.

Also kill dead obvious comment, dead argument and add
unlikely since this mtu probe does not happen too often.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:09:53 -07:00
Ilpo Järvinen c887e6d2d9 tcp: consolidate paws check
Wow, it was quite tricky to merge that stream of negations
but I think I finally got it right:

check & replace_ts_recent:
(s32)(rcv_tsval - ts_recent) >= 0                  => 0
(s32)(ts_recent - rcv_tsval) <= 0                  => 0

discard:
(s32)(ts_recent - rcv_tsval)  > TCP_PAWS_WINDOW    => 1
(s32)(ts_recent - rcv_tsval) <= TCP_PAWS_WINDOW    => 0

I toggled the return values of tcp_paws_check around since
the old encoding added yet-another negation making tracking
of truth-values really complicated.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:09:52 -07:00
Ilpo Järvinen c43d558a51 tcp: kill dead end_seq variable in clean_rtx_queue
I've already forgotten what for this was necessary, anyway
it's no longer used (if it ever was).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:09:51 -07:00
Ilpo Järvinen 5861f8e58d tcp: remove pointless .dsack/.num_sacks code
In the pure assignment case, the earlier zeroing is
still in effect.

David S. Miller raised concerns if the ifs are there to avoid
dirtying cachelines. I came to these conclusions:

> We'll be dirty it anyway (now that I check), the first "real" statement
> in tcp_rcv_established is:
>
>       tp->rx_opt.saw_tstamp = 0;
>
> ...that'll land on the same dword. :-/
>
> I suppose the blocks are there just because they had more complexity
> inside when they had to calculate the eff_sacks too (maybe it would
> have been better to just remove them in that drop-patch so you would
> have had less head-ache :-)).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:09:51 -07:00
Jarek Poplawski 7cd0a63872 pkt_sched: Change misleading code in class delete.
While looking for a possible reason of bugzilla report on HTB oops:
http://bugzilla.kernel.org/show_bug.cgi?id=12858
I found the code in htb_delete calling htb_destroy_class on zero
refcount is very misleading: it can suggest this is a common path, and
destroy is called under sch_tree_lock. Actually, this can never happen
like this because before deletion cops->get() is done, and after
delete a class is still used by tclass_notify. The class destroy is
always called from cops->put(), so without sch_tree_lock.

This doesn't mean much now (since 2.6.27) because all vulnerable calls
were moved from htb_destroy_class to htb_delete, but there was a bug
in older kernels. The same change is done for other classful scheds,
which, it seems, didn't have similar locking problems here.

Reported-by: m0sia <m0sia@m0sia.ru>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:00:19 -07:00
Roel Kluin a2025b8b10 tcp: '< 0' test on unsigned
promote 'cnt' to size_t, to match 'len'.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 16:05:14 -07:00
Roel Kluin 8db09f26f9 x25: '< 0' and '>= 0' test on unsigned
skb->len is an unsigned int, so the test in x25_rx_call_request() always
evaluates to true.

len in x25_sendmsg() is unsigned as well. so -ERRORS returned by x25_output()
are not noticed.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 16:04:12 -07:00
Denys Fedoryshchenko 73ce7b01b4 ipv4: arp announce, arp_proxy and windows ip conflict verification
Windows (XP at least) hosts on boot, with configured static ip, performing 
address conflict detection, which is defined in RFC3927.
Here is quote of important information:

"
An ARP announcement is identical to the ARP Probe described above, 
except    that now the sender and target IP addresses are both set 
to the host's newly selected IPv4 address. 
"

But it same time this goes wrong with RFC5227.
"
The 'sender IP address' field MUST be set to all zeroes; this is to avoid
polluting ARP caches in other hosts on the same link in the case
where the address turns out to be already in use by another host.
"

When ARP proxy configured, it must not answer to both cases, because 
it is address conflict verification in any case. For Windows it is just 
causing to detect false "ip conflict". Already there is code for RFC5227, so 
just trivially we just check also if source ip == target ip.

Signed-off-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 16:02:07 -07:00
David S. Miller 08ec9af1c0 xfrm: Fix xfrm_state_find() wrt. wildcard source address.
The change to make xfrm_state objects hash on source address
broke the case where such source addresses are wildcarded.

Fix this by doing a two phase lookup, first with fully specified
source address, next using saddr wildcarded.

Reported-by: Nicolas Dichtel <nicolas.dichtel@dev.6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 14:22:40 -07:00
Yi Zou 1c8dbcf649 [SCSI] net: add NETIF_F_FCOE_CRC to can_checksum_protocol
Add FC CRC offload check for ETH_P_FCOE.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-03-13 15:11:38 -05:00
Neil Horman 273ae44b9c Network Drop Monitor: Adding Build changes to enable drop monitor
Network Drop Monitor: Adding Build changes to enable drop monitor

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/linux/Kbuild |    1 +
 net/Kconfig          |   11 +++++++++++
 net/core/Makefile    |    1 +
 3 files changed, 13 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 12:09:29 -07:00
Neil Horman 9a8afc8d39 Network Drop Monitor: Adding drop monitor implementation & Netlink protocol
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/linux/net_dropmon.h |   56 +++++++++
 net/core/drop_monitor.c     |  263 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 319 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 12:09:29 -07:00
Neil Horman ead2ceb0ec Network Drop Monitor: Adding kfree_skb_clean for non-drops and modifying end-of-line points for skbs
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/linux/skbuff.h |    4 +++-
 net/core/datagram.c    |    2 +-
 net/core/skbuff.c      |   22 ++++++++++++++++++++++
 net/ipv4/arp.c         |    2 +-
 net/ipv4/udp.c         |    2 +-
 net/packet/af_packet.c |    2 +-
 6 files changed, 29 insertions(+), 5 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 12:09:28 -07:00
Neil Horman 4893d39e86 Network Drop Monitor: Add trace declaration for skb frees
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/trace/skb.h   |    8 ++++++++
 net/core/Makefile     |    2 ++
 net/core/net-traces.c |   29 +++++++++++++++++++++++++++++
 3 files changed, 39 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 12:09:27 -07:00
malc 6fc791ee63 sctp: add Adaptation Layer Indication parameter only when it's set
RFC5061 states:

        Each adaptation layer that is defined that wishes
        to use this parameter MUST specify an adaptation code point in an
        appropriate RFC defining its use and meaning.

If the user has not set one - assume they don't want to sent the param
with a zero Adaptation Code Point.

Rationale - Currently the IANA defines zero as reserved - and
1 as the only valid value - so we consider zero to be unset - to save
adding a boolean to the socket structure.

Including this parameter unconditionally causes endpoints that do not
understand it to report errors unnecessarily.

Signed-off-by: Malcolm Lashley <mlashley@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 11:37:58 -07:00
Wei Yongjun 76595024ff sctp: fix to send FORWARD-TSN chunk only if peer has such capable
RFC3758 Section 3.3.1.  Sending Forward-TSN-Supported param in INIT

   Note that if the endpoint chooses NOT to include the parameter, then
   at no time during the life of the association can it send or process
   a FORWARD TSN.

If peer does not support PR-SCTP capable, don't send FORWARD-TSN chunk
to peer.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 11:37:58 -07:00
Wei Yongjun 5ffad5aceb sctp: fix to indicate ASCONF support in INIT-ACK only if peer has such capable
This patch fix to indicate ASCONF support in INIT-ACK only if peer has
such capable.

This patch also fix to calc the chunk size if peer has no FWD-TSN
capable.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 11:37:56 -07:00
Vlad Yasevich 5e8f3f703a sctp: simplify sctp listening code
sctp_inet_listen() call is split between UDP and TCP style.  Looking
at the code, the two functions are almost the same and can be
merged into a single helper.  This also fixes a bug that was
fixed in the UDP function, but missed in the TCP function.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 11:37:56 -07:00
Trond Myklebust 01d37c428a SUNRPC: xprt_connect() don't abort the task if the transport isn't bound
If the transport isn't bound, then we should just return ENOTCONN, letting
call_connect_status() and/or call_status() deal with retrying. Currently,
we appear to abort all pending tasks with an EIO error.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:09:39 -04:00
Trond Myklebust fba91afbec SUNRPC: Fix an Oops due to socket not set up yet...
We can Oops in both xs_udp_send_request() and xs_tcp_send_request() if the
call to xs_sendpages() returns an error due to the socket not yet being
set up.
Deal with that situation by returning a new error: ENOTSOCK, so that we
know to avoid dereferencing transport->sock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:06:41 -04:00
Eric Dumazet fc1ad92dfc tcp: allow timestamps even if SYN packet has tsval=0
Some systems send SYN packets with apparently wrong RFC1323 timestamp
option values [timestamp tsval=0 tsecr=0].
It might be for security reasons (http://www.secuobs.com/plugs/25220.shtml )

Linux TCP stack ignores this option and sends back a SYN+ACK packet
without timestamp option, thus many TCP flows cannot use timestamps
and lose some benefit of RFC1323.

Other operating systems seem to not care about initial tsval value, and let
tcp flows to negotiate timestamp option.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-11 09:23:57 -07:00
John Dykstra ff8cf9a938 ipv6: Fix BUG when disabled ipv6 module is unloaded
Do not try to "uninitialize" ipv6 if its initialization had been skipped
because module parameter disable=1 had been specified.

Reported-by:  Thomas Backlund <tmb@mandriva.org>
Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Acked-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-11 09:22:51 -07:00
Trond Myklebust eb9b55ab4d SUNRPC: Tighten up the task locking rules in __rpc_execute()
We should probably not be testing any flags after we've cleared the
RPC_TASK_RUNNING flag, since rpc_make_runnable() is then free to assign the
rpc_task to another workqueue, which may then destroy it.

We can fix any races with rpc_make_runnable() by ensuring that we only
clear the RPC_TASK_RUNNING flag while holding the rpc_wait_queue->lock that
the task is supposed to be sleeping on (and then checking whether or not
the task really is sleeping).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-10 20:33:16 -04:00
Stephen Hemminger a2205472c3 net: fix warning about non-const string
Since dev_set_name takes a printf style string, new gcc complains
if arg is not const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-10 05:22:43 -07:00
Stephen Hemminger 7546dd97d2 net: convert usage of packet_type to read_mostly
Protocols that use packet_type can be __read_mostly section for better
locality. Elminate any unnecessary initializations of NULL.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-10 05:22:43 -07:00
David S. Miller d5df2a1613 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bnx2x_main.c
	drivers/net/wireless/iwlwifi/iwl3945-base.c
	drivers/net/wireless/rt2x00/rt73usb.c
2009-03-10 05:04:16 -07:00
Roel Kluin bd05f28e1a cfg80211: test before subtraction on unsigned
freq_diff is unsigned, so test before subtraction

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-06 15:54:32 -05:00
Sujith 707c1b4e68 mac80211: Update IBSS beacon timestamp properly
In IBSS mode, the beacon timestamp has to be filled with the
BSS's timestamp when joining, and set to zero when creating
a new BSS.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-05 14:39:40 -05:00
Vivek Natarajan 25c9c87528 mac80211: Always send a null data frame if TIM bit is set.
If the AP thinks we are in power save state eventhough we are not truly
in that state, it sets the TIM bit and does not send a data frame unless
we send a null data frame to correct the state in the AP.
This might happen if the null data frame for wake up is lost in the air
after we disable power save.

Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-05 14:39:38 -05:00
Sujith e65c22633c mac80211: Fix TKIP/WEP HT capability handling
There is no need to parse the AP's HT capabilities if
the STA uses TKIP/WEP cipher. This allows the rate control
module to choose the correct(legacy) rate table.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-05 14:39:37 -05:00
Johannes Berg 24776cfd55 mac80211: Fix quality reporting for wireless stats
Since "mac80211/cfg80211: move iwrange handler to cfg80211", the
results for link quality from "iwlist scan" and "iwconfig" commands
have been very different. The results are now consistent.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reported- and tested-by: Larry Finger <larry.finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-05 14:39:35 -05:00
Sujith e31ae05083 mac80211: Notify the driver only when the beacon interval changes
Currently, the driver is unconditionally notified of beacon
interval. This is a problem in AP mode, because the driver has
to know that the beacon interval has actualy changed to recalculate
TBTT and reset the HW TSF. Fix this to make mac80211 notify the driver
only when the beacon interval has been reconfigured to a new value.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-05 14:39:32 -05:00
David S. Miller 508827ff0a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/tokenring/tmspci.c
	drivers/net/ucc_geth_mii.c
2009-03-05 02:06:47 -08:00
David S. Miller 9d40bbda59 vlan: Fix vlan-in-vlan crashes.
As analyzed by Patrick McHardy, vlan needs to reset it's
netdev_ops pointer in it's ->init() function but this
leaves the compat method pointers stale.

Add a netdev_resync_ops() and call it from the vlan code.

Any other driver which changes ->netdev_ops after register_netdevice()
will need to call this new function after doing so too.

With help from Patrick McHardy.

Tested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 23:46:25 -08:00
David S. Miller 54acd0efab net: Fix missing dev->neigh_setup in register_netdevice().
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 23:01:02 -08:00
Jarek Poplawski a883bf564e pkt_sched: act_police: Fix a rate estimator test.
A commit c1b56878fb "tc: policing requires
a rate estimator" introduced a test which invalidates previously working
configs, based on examples from iproute2: doc/actions/actions-general.
This is too rigorous: a rate estimator is needed only when police's
"avrate" option is used.

Reported-by: Joao Correia <joaomiguelcorreia@gmail.com>
Diagnosed-by: John Dykstra <john.dykstra1@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 17:38:10 -08:00
Brian Haley fb13d9f9e4 SCTP: change sctp_ctl_sock_init() to try IPv4 if IPv6 fails
Change sctp_ctl_sock_init() to try IPv4 if IPv6 socket registration
fails.  Required if the IPv6 module is loaded with "disable=1", else
SCTP will fail to load.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 03:20:26 -08:00
Brian Haley fe7ca2e1e8 IPv6: add "disable" module parameter support to ipv6.ko
Add "disable" module parameter support to ipv6.ko by specifying
"disable=1" on module load.  We just do the minimum of initializing
inetsw6[] so calls from other modules to inet6_register_protosw()
won't OOPs, then bail out.  No IPv6 addresses or sockets can be
created as a result, and a reboot is required to enable IPv6.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 03:19:08 -08:00
Eric Biederman 0c5c2d3089 neigh: Allow for user space users of the neighbour table
Currently it is possible to do just about everything with the arp table
from user space except treat an entry like you are using it.  To that end
implement and a flag NTF_USE that when set in a netwlink update request
treats the neighbour table entry like the kernel does on the output path.

This allows user space applications to share the kernel's arp cache.

Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 00:03:08 -08:00
Meelis Roos 4222474519 net: fix tokenring license
Currently, modular tokenring ("tr") lacks a license and fails to load:

tr: module license 'unspecified' taints kernel.
tr: Unknown symbol proc_net_fops_create

Beacuse of this, no tokenring driver can load if it depends on modular 
tr. Fix this by adding GPL module license as it is in the kernel.

With this fix, tr module loads fine and tms380 driver also loads. Well, 
it does'nt work but that's a different bug.

Signed-off-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 23:48:50 -08:00
Pablo Neira Ayuso 4843b93c96 netlink: invert error code in netlink_set_err()
The callers of netlink_set_err() currently pass a negative value
as parameter for the error code. However, sk->sk_err wants a
positive error value. Without this patch, skb_recv_datagram() called
by netlink_recvmsg() may return a positive value to report an error.

Another choice to fix this is to change callers to pass a positive
error value, but this seems a bit inconsistent and error prone
to me. Indeed, the callers of netlink_set_err() assumed that the
(usual) negative value for error codes was fine before this patch :).

This patch also includes some documentation in docbook format
for netlink_set_err() to avoid this sort of confusion.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 23:37:30 -08:00
Geert Uytterhoeven e9cc8bddae netlink: Move netlink attribute parsing support to lib
Netlink attribute parsing may be used even if CONFIG_NET is not set.
Move it from net/netlink to lib and control its inclusion based on the new
config symbol CONFIG_NLATTR, which is selected by CONFIG_NET.

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2009-03-04 14:53:30 +08:00
Randy Dunlap abb79972b4 rds: fix iband RDMA dependencies
Fix RDS Infiniband dependencies for RDMA so that these
build errors won't happen:

ERROR: "rdma_accept" [net/rds/rds.ko] undefined!
ERROR: "rdma_destroy_id" [net/rds/rds.ko] undefined!
ERROR: "rdma_connect" [net/rds/rds.ko] undefined!
ERROR: "rdma_destroy_qp" [net/rds/rds.ko] undefined!
ERROR: "rdma_listen" [net/rds/rds.ko] undefined!
ERROR: "rdma_notify" [net/rds/rds.ko] undefined!
ERROR: "rdma_create_id" [net/rds/rds.ko] undefined!
ERROR: "rdma_create_qp" [net/rds/rds.ko] undefined!
ERROR: "rdma_bind_addr" [net/rds/rds.ko] undefined!
ERROR: "rdma_resolve_route" [net/rds/rds.ko] undefined!
ERROR: "rdma_disconnect" [net/rds/rds.ko] undefined!
ERROR: "rdma_reject" [net/rds/rds.ko] undefined!
ERROR: "rdma_resolve_addr" [net/rds/rds.ko] undefined!

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 21:39:40 -08:00
Ingo Molnar 91d75e209b Merge branch 'x86/core' into core/percpu 2009-03-04 02:29:19 +01:00
Eric W. Biederman 17edde5209 netns: Remove net_alive
It turns out that net_alive is unnecessary, and the original problem
that led to it being added was simply that the icmp code thought
it was a network device and wound up being unable to handle packets
while there were still packets in the network namespace.

Now that icmp and tcp have been fixed to properly register themselves
this problem is no longer present and we have a stronger guarantee
that packets will not arrive in a network namespace then that provided
by net_alive in netif_receive_skb.  So remove net_alive allowing
packet reception run a little faster.

Additionally document the strong reason why network namespace cleanup
is safe so that if something happens again someone else will have
a chance of figuring it out.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 01:14:27 -08:00
Eric W. Biederman 2f20d2e667 tcp: Like icmp use register_pernet_subsys
To remove the possibility of packets flying around when network
devices are being cleaned up use reisger_pernet_subsys instead of
register_pernet_device.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 01:14:21 -08:00
Eric W. Biederman 6eb0777228 netns: Fix icmp shutdown.
Recently I had a kernel panic in icmp_send during a network namespace
cleanup.  There were packets in the arp queue that failed to be sent
and we attempted to generate an ICMP host unreachable message, but
failed because icmp_sk_exit had already been called.

The network devices are removed from a network namespace and their
arp queues are flushed before we do attempt to shutdown subsystems
so this error should have been impossible.

It turns out icmp_init is using register_pernet_device instead
of register_pernet_subsys.  Which resulted in icmp being shut down
while we still had the possibility of packets in flight, making
a nasty NULL pointer deference in interrupt context possible.

Changing this to register_pernet_subsys fixes the problem in
my testing.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 01:14:15 -08:00
Daniel Lezcano 176c39af29 netns: fix addrconf_ifdown kernel panic
When a network namespace is destroyed the network interfaces are
all unregistered, making addrconf_ifdown called by the netdevice
notifier. 
In the other hand, the addrconf exit method does a loop on the network
devices and does addrconf_ifdown on each of them. But the ordering of 
the netns subsystem is not right because it uses the register_pernet_device
instead of register_pernet_subsys. If we handle the loopback as
any network device, we can safely use register_pernet_subsys.

But if we use register_pernet_subsys, the addrconf exit method will do
exactly what was already done with the unregistering of the network
devices. So in definitive, this code is pointless.

I removed the netns addrconf exit method and moved the code to the
addrconf cleanup function.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 01:06:45 -08:00
Stephen Hemminger b325fddb7f ipv6: Fix sysctl unregistration deadlock
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 00:47:47 -08:00
Stephen Hemminger 5a5990d309 net: Avoid race between network down and sysfs
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 00:47:46 -08:00
Vlad Yasevich 7e99013a50 sctp: Fix broken RTO-doubling for data retransmits
Commit faee47cdbf
(sctp: Fix the RTO-doubling on idle-link heartbeats)
broke the RTO doubling for data retransmits.  If the
heartbeat was sent before the data T3-rtx time, the
the RTO will not double upon the T3-rtx expiration.
Distingish between the operations by passing an argument
to the function.

Additionally, Wei Youngjun pointed out that our treatment
of requested HEARTBEATS and timer HEARTBEATS is the same
wrt resetting congestion window.  That needs to be separated,
since user requested HEARTBEATS should not treat the link
as idle.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:49:18 -08:00
Wei Yongjun f61f6f82c9 sctp: use time_before or time_after for comparing jiffies
The functions time_before or time_after are more robust
for comparing jiffies against other values.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:49:18 -08:00
Wei Yongjun c6db93a58f sctp: fix the length check in sctp_getsockopt_maxburst()
The code in sctp_getsockopt_maxburst() doesn't allow len to be larger
then struct sctp_assoc_value, which is a common case where app writers
just pass down the sizeof(buf) or something similar.

This patch fix the problem.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:49:17 -08:00
Wei Yongjun d212318c9d sctp: remove dup code in net/sctp/socket.c
Remove dup check of "if (optlen < sizeof(int))".

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:49:16 -08:00
Wei Yongjun 906f8257ee sctp: Add some missing types for debug message
This patch add the type name "AUTH" and primitive type name
"PRIMITIVE_ASCONF" for debug message.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:49:16 -08:00
Hantzis Fotis ee7537b63a tcp: tcp_init_wl / tcp_update_wl argument cleanup
The above functions from include/net/tcp.h have been defined with an
argument that they never use. The argument is 'u32 ack' which is never
used inside the function body, and thus it can be removed. The rest of
the patch involves the necessary changes to the function callers of the
above two functions.

Signed-off-by: Hantzis Fotis <xantzis@ceid.upatras.gr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:42:02 -08:00
Wei Yongjun 3df2678737 sctp: fix kernel panic with ERROR chunk containing too many error causes
If ERROR chunk is received with too many error causes in ESTABLISHED
state, the kernel get panic.

This is because sctp limit the max length of cmds to 14, but while
ERROR chunk is received, one error cause will add around 2 cmds by
sctp_add_cmd_sf(). So many error causes will fill the limit of cmds
and panic.

This patch fixed the problem.

This bug can be test by SCTP Conformance Test Suite
<http://networktest.sourceforge.net/>.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:27:39 -08:00
Vlad Yasevich d1dd524785 sctp: fix crash during module unload
An extra list_del() during the module load failure and unload
resulted in a crash with a list corruption.  Now sctp can
be unloaded again.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:27:38 -08:00
Gerrit Renker 86739fb96e dccp: Do not let initial option overhead shrink the MPS
This fixes a problem caused by the overlap of the connection-setup and
established-state phases of DCCP connections.

During connection setup, the client retransmits Confirm Feature-Negotiation
options until a response from the server signals that it can move from the
half-established PARTOPEN into the OPEN state, whereupon the connection is
fully established on both ends (RFC 4340, 8.1.5).

However, since the client may already send data while it is in the PARTOPEN
state, consequences arise for the Maximum Packet Size: the problem is that the
initial option overhead is much higher than for the subsequent established
phase, as it involves potentially many variable-length list-type options
(server-priority options, RFC 4340, 6.4).

Applying the standard MPS is insufficient here: especially with larger
payloads this can lead to annoying, counter-intuitive EMSGSIZE errors.

On the other hand, reducing the MPS available for the established phase by
the added initial overhead is highly wasteful and inefficient.

The solution chosen therefore is a two-phase strategy:

   If the payload length of the DataAck in PARTOPEN is too large, an Ack is sent
   to carry the options, and the feature-negotiation list is then flushed.

   This means that the server gets two Acks for one Response. If both Acks get
   lost, it is probably better to restart the connection anyway and devising yet
   another special-case does not seem worth the extra complexity.

The result is a higher utilisation of the available packet space for the data
transmission phase (established state) of a connection.

The patch (over-)estimates the initial overhead to be 32*4 bytes -- commonly
seen values were around 90 bytes for initial feature-negotiation options.

It uses sizeof(u32) to mean "aligned units of 4 bytes".
For consistency, another use of 4-byte alignment is adapted.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:07:23 -08:00
Gerrit Renker 361a5c1dd0 dccp: Minimise header option overhead in setting the MPS
This patch resolves a long-standing FIXME to dynamically update the Maximum
Packet Size depending on actual options usage.

It uses the flags set by the feature-negotiation infrastructure to compute
the required header option size.

Most options are fixed-size, a notable exception are Ack Vectors (required
currently only by CCID-2). These can have any length between 3 and 1020
bytes. As a result of testing, 16 bytes (2 bytes for type/length plus 14 Ack
Vector cells) have been found to be sufficient for loss-free situations.

There are currently no CCID-specific header options which may appear on data
packets, thus it is not necessary to define a corresponding CCID field as
suggested in the old comment.

Further changes:
----------------
 Adjusted the type of 'cur_mps' to match the unsigned return type of the
 function.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:07:23 -08:00
Ilpo Järvinen 9ce0146102 tcp: get rid of two unnecessary u16s in TCP skb flags copying
I guess these fields were one day 16-bit in the struct but
nowadays they're just using 8 bits anyway.

This is just a precaution, didn't result any change in my
case but who knows what all those varying gcc versions &
options do. I've been told that 16-bit is not so nice with
some cpus.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:17 -08:00
Ilpo Järvinen 0d6a775e27 tcp: in sendmsg/pages open code the real goto target
copied was assigned zero right before the goto, so if (copied)
cannot ever be true.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:16 -08:00
Ilpo Järvinen cabeccbd17 tcp: kill eff_sacks "cache", the sole user can calculate itself
Also fixes insignificant bug that would cause sending of stale
SACK block (would occur in some corner cases).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:16 -08:00
Ilpo Järvinen 758ce5c8d1 tcp: add helper for AI algorithm
It seems that implementation in yeah was inconsistent to what
other did as it would increase cwnd one ack earlier than the
others do.

Size benefits:

  bictcp_cong_avoid |  -36
  tcp_cong_avoid_ai |  +52
  bictcp_cong_avoid |  -34
  tcp_scalable_cong_avoid |  -36
  tcp_veno_cong_avoid |  -12
  tcp_yeah_cong_avoid |  -38

= -104 bytes total

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:15 -08:00
Ilpo Järvinen 571a5dd8d0 htcp: merge icsk_ca_state compare
Similar to what is done elsewhere in TCP code when double
state checks are being done.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:14 -08:00
Ilpo Järvinen e6c7d08579 tcp: drop unnecessary local var in collapse
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:13 -08:00
Ilpo Järvinen bc079e9ede tcp: cleanup ca_state mess in tcp_timer
Redundant checks made indentation impossible to follow.
However, it might be useful to make this ca_state+is_sack
indexed array.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:13 -08:00
Ilpo Järvinen 7363a5b233 tcp: separate timeout marking loop to it's own function
Some comment about its current state added. So far I have
seen very few cases where the thing is actually useful,
usually just marginally (though admittedly I don't usually
see top of window losses where it seems possible that there
could be some gain), instead, more often the cases suffer
from L-marking spike which is certainly not desirable
(I'll bury improving it to my todo list, but on a low
prio position).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:12 -08:00
Ilpo Järvinen d0af4160d1 tcp: remove redundant code from tcp_mark_lost_retrans
Arnd Hannemann <hannemann@nets.rwth-aachen.de> noticed and was
puzzled by the fact that !tcp_is_fack(tp) leads to early return
near the beginning and the later on tcp_is_fack(tp) was still
used in an if condition. The later check was a left-over from
RFC3517 SACK stuff (== !tcp_is_fack(tp) behavior nowadays) as
there wasn't clear way how to handle this particular check
cheaply in the spirit of RFC3517 (using only SACK blocks, not
holes + SACK blocks as with FACK). I sort of left it there as
a reminder but since it's confusing other people just remove
it and comment the missing-feature stuff instead.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:11 -08:00
Ilpo Järvinen 02276f3c96 tcp: fix corner case issue in segmentation during rexmitting
If cur_mss grew very recently so that the previously G/TSOed skb
now fits well into a single segment it would get send up in
parts unless we calculate # of segments again. This corner-case
could happen eg. after mtu probe completes or less than
previously sack blocks are required for the opposite direction.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:11 -08:00
Ilpo Järvinen d3d2ae4545 tcp: Don't clear hints when tcp_fragmenting
1) We didn't remove any skbs, so no need to handle stale refs.

2) scoreboard_skb_hint is trivial, no timestamps were changed
   so no need to clear that one

3) lost_skb_hint needs tweaking similar to that of
   tcp_sacktag_one().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:10 -08:00
Ilpo Järvinen 62ad27619c tcp: deferring in middle of queue makes very little sense
If skb can be sent right away, we certainly should do that
if it's in the middle of the queue because it won't get
more data into it.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:10 -08:00
Ilpo Järvinen 59a08cba6a tcp: fix lost_cnt_hint miscounts
It is possible that lost_cnt_hint gets underflow in
tcp_clean_rtx_queue because the cumulative ACK can cover
the segment where lost_skb_hint points to only partially,
which means that the hint is not cleared, opposite to what
my (earlier) comment claimed.

Also I don't agree what I ended up writing about non-trivial
case there to be what I intented to say. It was not supposed
to happen that the hint won't get cleared and we underflow
in any scenario.

In general, this is quite hard to trigger in practice.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:09 -08:00