Commit graph

11825 commits

Author SHA1 Message Date
Roel Kluin a2025b8b10 tcp: '< 0' test on unsigned
promote 'cnt' to size_t, to match 'len'.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 16:05:14 -07:00
Roel Kluin 8db09f26f9 x25: '< 0' and '>= 0' test on unsigned
skb->len is an unsigned int, so the test in x25_rx_call_request() always
evaluates to true.

len in x25_sendmsg() is unsigned as well. so -ERRORS returned by x25_output()
are not noticed.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 16:04:12 -07:00
Denys Fedoryshchenko 73ce7b01b4 ipv4: arp announce, arp_proxy and windows ip conflict verification
Windows (XP at least) hosts on boot, with configured static ip, performing 
address conflict detection, which is defined in RFC3927.
Here is quote of important information:

"
An ARP announcement is identical to the ARP Probe described above, 
except    that now the sender and target IP addresses are both set 
to the host's newly selected IPv4 address. 
"

But it same time this goes wrong with RFC5227.
"
The 'sender IP address' field MUST be set to all zeroes; this is to avoid
polluting ARP caches in other hosts on the same link in the case
where the address turns out to be already in use by another host.
"

When ARP proxy configured, it must not answer to both cases, because 
it is address conflict verification in any case. For Windows it is just 
causing to detect false "ip conflict". Already there is code for RFC5227, so 
just trivially we just check also if source ip == target ip.

Signed-off-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 16:02:07 -07:00
Neil Horman 273ae44b9c Network Drop Monitor: Adding Build changes to enable drop monitor
Network Drop Monitor: Adding Build changes to enable drop monitor

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/linux/Kbuild |    1 +
 net/Kconfig          |   11 +++++++++++
 net/core/Makefile    |    1 +
 3 files changed, 13 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 12:09:29 -07:00
Neil Horman 9a8afc8d39 Network Drop Monitor: Adding drop monitor implementation & Netlink protocol
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/linux/net_dropmon.h |   56 +++++++++
 net/core/drop_monitor.c     |  263 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 319 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 12:09:29 -07:00
Neil Horman ead2ceb0ec Network Drop Monitor: Adding kfree_skb_clean for non-drops and modifying end-of-line points for skbs
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/linux/skbuff.h |    4 +++-
 net/core/datagram.c    |    2 +-
 net/core/skbuff.c      |   22 ++++++++++++++++++++++
 net/ipv4/arp.c         |    2 +-
 net/ipv4/udp.c         |    2 +-
 net/packet/af_packet.c |    2 +-
 6 files changed, 29 insertions(+), 5 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 12:09:28 -07:00
Neil Horman 4893d39e86 Network Drop Monitor: Add trace declaration for skb frees
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/trace/skb.h   |    8 ++++++++
 net/core/Makefile     |    2 ++
 net/core/net-traces.c |   29 +++++++++++++++++++++++++++++
 3 files changed, 39 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 12:09:27 -07:00
malc 6fc791ee63 sctp: add Adaptation Layer Indication parameter only when it's set
RFC5061 states:

        Each adaptation layer that is defined that wishes
        to use this parameter MUST specify an adaptation code point in an
        appropriate RFC defining its use and meaning.

If the user has not set one - assume they don't want to sent the param
with a zero Adaptation Code Point.

Rationale - Currently the IANA defines zero as reserved - and
1 as the only valid value - so we consider zero to be unset - to save
adding a boolean to the socket structure.

Including this parameter unconditionally causes endpoints that do not
understand it to report errors unnecessarily.

Signed-off-by: Malcolm Lashley <mlashley@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 11:37:58 -07:00
Wei Yongjun 76595024ff sctp: fix to send FORWARD-TSN chunk only if peer has such capable
RFC3758 Section 3.3.1.  Sending Forward-TSN-Supported param in INIT

   Note that if the endpoint chooses NOT to include the parameter, then
   at no time during the life of the association can it send or process
   a FORWARD TSN.

If peer does not support PR-SCTP capable, don't send FORWARD-TSN chunk
to peer.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 11:37:58 -07:00
Wei Yongjun 5ffad5aceb sctp: fix to indicate ASCONF support in INIT-ACK only if peer has such capable
This patch fix to indicate ASCONF support in INIT-ACK only if peer has
such capable.

This patch also fix to calc the chunk size if peer has no FWD-TSN
capable.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 11:37:56 -07:00
Vlad Yasevich 5e8f3f703a sctp: simplify sctp listening code
sctp_inet_listen() call is split between UDP and TCP style.  Looking
at the code, the two functions are almost the same and can be
merged into a single helper.  This also fixes a bug that was
fixed in the UDP function, but missed in the TCP function.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 11:37:56 -07:00
Eric Dumazet fc1ad92dfc tcp: allow timestamps even if SYN packet has tsval=0
Some systems send SYN packets with apparently wrong RFC1323 timestamp
option values [timestamp tsval=0 tsecr=0].
It might be for security reasons (http://www.secuobs.com/plugs/25220.shtml )

Linux TCP stack ignores this option and sends back a SYN+ACK packet
without timestamp option, thus many TCP flows cannot use timestamps
and lose some benefit of RFC1323.

Other operating systems seem to not care about initial tsval value, and let
tcp flows to negotiate timestamp option.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-11 09:23:57 -07:00
Stephen Hemminger a2205472c3 net: fix warning about non-const string
Since dev_set_name takes a printf style string, new gcc complains
if arg is not const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-10 05:22:43 -07:00
Stephen Hemminger 7546dd97d2 net: convert usage of packet_type to read_mostly
Protocols that use packet_type can be __read_mostly section for better
locality. Elminate any unnecessary initializations of NULL.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-10 05:22:43 -07:00
David S. Miller d5df2a1613 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bnx2x_main.c
	drivers/net/wireless/iwlwifi/iwl3945-base.c
	drivers/net/wireless/rt2x00/rt73usb.c
2009-03-10 05:04:16 -07:00
Roel Kluin bd05f28e1a cfg80211: test before subtraction on unsigned
freq_diff is unsigned, so test before subtraction

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-06 15:54:32 -05:00
Sujith 707c1b4e68 mac80211: Update IBSS beacon timestamp properly
In IBSS mode, the beacon timestamp has to be filled with the
BSS's timestamp when joining, and set to zero when creating
a new BSS.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-05 14:39:40 -05:00
Vivek Natarajan 25c9c87528 mac80211: Always send a null data frame if TIM bit is set.
If the AP thinks we are in power save state eventhough we are not truly
in that state, it sets the TIM bit and does not send a data frame unless
we send a null data frame to correct the state in the AP.
This might happen if the null data frame for wake up is lost in the air
after we disable power save.

Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-05 14:39:38 -05:00
Sujith e65c22633c mac80211: Fix TKIP/WEP HT capability handling
There is no need to parse the AP's HT capabilities if
the STA uses TKIP/WEP cipher. This allows the rate control
module to choose the correct(legacy) rate table.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-05 14:39:37 -05:00
Johannes Berg 24776cfd55 mac80211: Fix quality reporting for wireless stats
Since "mac80211/cfg80211: move iwrange handler to cfg80211", the
results for link quality from "iwlist scan" and "iwconfig" commands
have been very different. The results are now consistent.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reported- and tested-by: Larry Finger <larry.finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-05 14:39:35 -05:00
Sujith e31ae05083 mac80211: Notify the driver only when the beacon interval changes
Currently, the driver is unconditionally notified of beacon
interval. This is a problem in AP mode, because the driver has
to know that the beacon interval has actualy changed to recalculate
TBTT and reset the HW TSF. Fix this to make mac80211 notify the driver
only when the beacon interval has been reconfigured to a new value.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-05 14:39:32 -05:00
David S. Miller 508827ff0a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/tokenring/tmspci.c
	drivers/net/ucc_geth_mii.c
2009-03-05 02:06:47 -08:00
David S. Miller 9d40bbda59 vlan: Fix vlan-in-vlan crashes.
As analyzed by Patrick McHardy, vlan needs to reset it's
netdev_ops pointer in it's ->init() function but this
leaves the compat method pointers stale.

Add a netdev_resync_ops() and call it from the vlan code.

Any other driver which changes ->netdev_ops after register_netdevice()
will need to call this new function after doing so too.

With help from Patrick McHardy.

Tested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 23:46:25 -08:00
David S. Miller 54acd0efab net: Fix missing dev->neigh_setup in register_netdevice().
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 23:01:02 -08:00
Jarek Poplawski a883bf564e pkt_sched: act_police: Fix a rate estimator test.
A commit c1b56878fb "tc: policing requires
a rate estimator" introduced a test which invalidates previously working
configs, based on examples from iproute2: doc/actions/actions-general.
This is too rigorous: a rate estimator is needed only when police's
"avrate" option is used.

Reported-by: Joao Correia <joaomiguelcorreia@gmail.com>
Diagnosed-by: John Dykstra <john.dykstra1@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 17:38:10 -08:00
Brian Haley fb13d9f9e4 SCTP: change sctp_ctl_sock_init() to try IPv4 if IPv6 fails
Change sctp_ctl_sock_init() to try IPv4 if IPv6 socket registration
fails.  Required if the IPv6 module is loaded with "disable=1", else
SCTP will fail to load.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 03:20:26 -08:00
Brian Haley fe7ca2e1e8 IPv6: add "disable" module parameter support to ipv6.ko
Add "disable" module parameter support to ipv6.ko by specifying
"disable=1" on module load.  We just do the minimum of initializing
inetsw6[] so calls from other modules to inet6_register_protosw()
won't OOPs, then bail out.  No IPv6 addresses or sockets can be
created as a result, and a reboot is required to enable IPv6.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 03:19:08 -08:00
Eric Biederman 0c5c2d3089 neigh: Allow for user space users of the neighbour table
Currently it is possible to do just about everything with the arp table
from user space except treat an entry like you are using it.  To that end
implement and a flag NTF_USE that when set in a netwlink update request
treats the neighbour table entry like the kernel does on the output path.

This allows user space applications to share the kernel's arp cache.

Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 00:03:08 -08:00
Meelis Roos 4222474519 net: fix tokenring license
Currently, modular tokenring ("tr") lacks a license and fails to load:

tr: module license 'unspecified' taints kernel.
tr: Unknown symbol proc_net_fops_create

Beacuse of this, no tokenring driver can load if it depends on modular 
tr. Fix this by adding GPL module license as it is in the kernel.

With this fix, tr module loads fine and tms380 driver also loads. Well, 
it does'nt work but that's a different bug.

Signed-off-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 23:48:50 -08:00
Pablo Neira Ayuso 4843b93c96 netlink: invert error code in netlink_set_err()
The callers of netlink_set_err() currently pass a negative value
as parameter for the error code. However, sk->sk_err wants a
positive error value. Without this patch, skb_recv_datagram() called
by netlink_recvmsg() may return a positive value to report an error.

Another choice to fix this is to change callers to pass a positive
error value, but this seems a bit inconsistent and error prone
to me. Indeed, the callers of netlink_set_err() assumed that the
(usual) negative value for error codes was fine before this patch :).

This patch also includes some documentation in docbook format
for netlink_set_err() to avoid this sort of confusion.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 23:37:30 -08:00
Randy Dunlap abb79972b4 rds: fix iband RDMA dependencies
Fix RDS Infiniband dependencies for RDMA so that these
build errors won't happen:

ERROR: "rdma_accept" [net/rds/rds.ko] undefined!
ERROR: "rdma_destroy_id" [net/rds/rds.ko] undefined!
ERROR: "rdma_connect" [net/rds/rds.ko] undefined!
ERROR: "rdma_destroy_qp" [net/rds/rds.ko] undefined!
ERROR: "rdma_listen" [net/rds/rds.ko] undefined!
ERROR: "rdma_notify" [net/rds/rds.ko] undefined!
ERROR: "rdma_create_id" [net/rds/rds.ko] undefined!
ERROR: "rdma_create_qp" [net/rds/rds.ko] undefined!
ERROR: "rdma_bind_addr" [net/rds/rds.ko] undefined!
ERROR: "rdma_resolve_route" [net/rds/rds.ko] undefined!
ERROR: "rdma_disconnect" [net/rds/rds.ko] undefined!
ERROR: "rdma_reject" [net/rds/rds.ko] undefined!
ERROR: "rdma_resolve_addr" [net/rds/rds.ko] undefined!

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 21:39:40 -08:00
Eric W. Biederman 17edde5209 netns: Remove net_alive
It turns out that net_alive is unnecessary, and the original problem
that led to it being added was simply that the icmp code thought
it was a network device and wound up being unable to handle packets
while there were still packets in the network namespace.

Now that icmp and tcp have been fixed to properly register themselves
this problem is no longer present and we have a stronger guarantee
that packets will not arrive in a network namespace then that provided
by net_alive in netif_receive_skb.  So remove net_alive allowing
packet reception run a little faster.

Additionally document the strong reason why network namespace cleanup
is safe so that if something happens again someone else will have
a chance of figuring it out.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 01:14:27 -08:00
Eric W. Biederman 2f20d2e667 tcp: Like icmp use register_pernet_subsys
To remove the possibility of packets flying around when network
devices are being cleaned up use reisger_pernet_subsys instead of
register_pernet_device.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 01:14:21 -08:00
Eric W. Biederman 6eb0777228 netns: Fix icmp shutdown.
Recently I had a kernel panic in icmp_send during a network namespace
cleanup.  There were packets in the arp queue that failed to be sent
and we attempted to generate an ICMP host unreachable message, but
failed because icmp_sk_exit had already been called.

The network devices are removed from a network namespace and their
arp queues are flushed before we do attempt to shutdown subsystems
so this error should have been impossible.

It turns out icmp_init is using register_pernet_device instead
of register_pernet_subsys.  Which resulted in icmp being shut down
while we still had the possibility of packets in flight, making
a nasty NULL pointer deference in interrupt context possible.

Changing this to register_pernet_subsys fixes the problem in
my testing.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 01:14:15 -08:00
Daniel Lezcano 176c39af29 netns: fix addrconf_ifdown kernel panic
When a network namespace is destroyed the network interfaces are
all unregistered, making addrconf_ifdown called by the netdevice
notifier. 
In the other hand, the addrconf exit method does a loop on the network
devices and does addrconf_ifdown on each of them. But the ordering of 
the netns subsystem is not right because it uses the register_pernet_device
instead of register_pernet_subsys. If we handle the loopback as
any network device, we can safely use register_pernet_subsys.

But if we use register_pernet_subsys, the addrconf exit method will do
exactly what was already done with the unregistering of the network
devices. So in definitive, this code is pointless.

I removed the netns addrconf exit method and moved the code to the
addrconf cleanup function.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 01:06:45 -08:00
Stephen Hemminger b325fddb7f ipv6: Fix sysctl unregistration deadlock
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 00:47:47 -08:00
Stephen Hemminger 5a5990d309 net: Avoid race between network down and sysfs
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 00:47:46 -08:00
Vlad Yasevich 7e99013a50 sctp: Fix broken RTO-doubling for data retransmits
Commit faee47cdbf
(sctp: Fix the RTO-doubling on idle-link heartbeats)
broke the RTO doubling for data retransmits.  If the
heartbeat was sent before the data T3-rtx time, the
the RTO will not double upon the T3-rtx expiration.
Distingish between the operations by passing an argument
to the function.

Additionally, Wei Youngjun pointed out that our treatment
of requested HEARTBEATS and timer HEARTBEATS is the same
wrt resetting congestion window.  That needs to be separated,
since user requested HEARTBEATS should not treat the link
as idle.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:49:18 -08:00
Wei Yongjun f61f6f82c9 sctp: use time_before or time_after for comparing jiffies
The functions time_before or time_after are more robust
for comparing jiffies against other values.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:49:18 -08:00
Wei Yongjun c6db93a58f sctp: fix the length check in sctp_getsockopt_maxburst()
The code in sctp_getsockopt_maxburst() doesn't allow len to be larger
then struct sctp_assoc_value, which is a common case where app writers
just pass down the sizeof(buf) or something similar.

This patch fix the problem.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:49:17 -08:00
Wei Yongjun d212318c9d sctp: remove dup code in net/sctp/socket.c
Remove dup check of "if (optlen < sizeof(int))".

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:49:16 -08:00
Wei Yongjun 906f8257ee sctp: Add some missing types for debug message
This patch add the type name "AUTH" and primitive type name
"PRIMITIVE_ASCONF" for debug message.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:49:16 -08:00
Hantzis Fotis ee7537b63a tcp: tcp_init_wl / tcp_update_wl argument cleanup
The above functions from include/net/tcp.h have been defined with an
argument that they never use. The argument is 'u32 ack' which is never
used inside the function body, and thus it can be removed. The rest of
the patch involves the necessary changes to the function callers of the
above two functions.

Signed-off-by: Hantzis Fotis <xantzis@ceid.upatras.gr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:42:02 -08:00
Wei Yongjun 3df2678737 sctp: fix kernel panic with ERROR chunk containing too many error causes
If ERROR chunk is received with too many error causes in ESTABLISHED
state, the kernel get panic.

This is because sctp limit the max length of cmds to 14, but while
ERROR chunk is received, one error cause will add around 2 cmds by
sctp_add_cmd_sf(). So many error causes will fill the limit of cmds
and panic.

This patch fixed the problem.

This bug can be test by SCTP Conformance Test Suite
<http://networktest.sourceforge.net/>.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:27:39 -08:00
Vlad Yasevich d1dd524785 sctp: fix crash during module unload
An extra list_del() during the module load failure and unload
resulted in a crash with a list corruption.  Now sctp can
be unloaded again.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:27:38 -08:00
Gerrit Renker 86739fb96e dccp: Do not let initial option overhead shrink the MPS
This fixes a problem caused by the overlap of the connection-setup and
established-state phases of DCCP connections.

During connection setup, the client retransmits Confirm Feature-Negotiation
options until a response from the server signals that it can move from the
half-established PARTOPEN into the OPEN state, whereupon the connection is
fully established on both ends (RFC 4340, 8.1.5).

However, since the client may already send data while it is in the PARTOPEN
state, consequences arise for the Maximum Packet Size: the problem is that the
initial option overhead is much higher than for the subsequent established
phase, as it involves potentially many variable-length list-type options
(server-priority options, RFC 4340, 6.4).

Applying the standard MPS is insufficient here: especially with larger
payloads this can lead to annoying, counter-intuitive EMSGSIZE errors.

On the other hand, reducing the MPS available for the established phase by
the added initial overhead is highly wasteful and inefficient.

The solution chosen therefore is a two-phase strategy:

   If the payload length of the DataAck in PARTOPEN is too large, an Ack is sent
   to carry the options, and the feature-negotiation list is then flushed.

   This means that the server gets two Acks for one Response. If both Acks get
   lost, it is probably better to restart the connection anyway and devising yet
   another special-case does not seem worth the extra complexity.

The result is a higher utilisation of the available packet space for the data
transmission phase (established state) of a connection.

The patch (over-)estimates the initial overhead to be 32*4 bytes -- commonly
seen values were around 90 bytes for initial feature-negotiation options.

It uses sizeof(u32) to mean "aligned units of 4 bytes".
For consistency, another use of 4-byte alignment is adapted.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:07:23 -08:00
Gerrit Renker 361a5c1dd0 dccp: Minimise header option overhead in setting the MPS
This patch resolves a long-standing FIXME to dynamically update the Maximum
Packet Size depending on actual options usage.

It uses the flags set by the feature-negotiation infrastructure to compute
the required header option size.

Most options are fixed-size, a notable exception are Ack Vectors (required
currently only by CCID-2). These can have any length between 3 and 1020
bytes. As a result of testing, 16 bytes (2 bytes for type/length plus 14 Ack
Vector cells) have been found to be sufficient for loss-free situations.

There are currently no CCID-specific header options which may appear on data
packets, thus it is not necessary to define a corresponding CCID field as
suggested in the old comment.

Further changes:
----------------
 Adjusted the type of 'cur_mps' to match the unsigned return type of the
 function.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:07:23 -08:00
Ilpo Järvinen 9ce0146102 tcp: get rid of two unnecessary u16s in TCP skb flags copying
I guess these fields were one day 16-bit in the struct but
nowadays they're just using 8 bits anyway.

This is just a precaution, didn't result any change in my
case but who knows what all those varying gcc versions &
options do. I've been told that 16-bit is not so nice with
some cpus.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:17 -08:00
Ilpo Järvinen 0d6a775e27 tcp: in sendmsg/pages open code the real goto target
copied was assigned zero right before the goto, so if (copied)
cannot ever be true.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:16 -08:00
Ilpo Järvinen cabeccbd17 tcp: kill eff_sacks "cache", the sole user can calculate itself
Also fixes insignificant bug that would cause sending of stale
SACK block (would occur in some corner cases).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:16 -08:00