1
0
Fork 0
Commit Graph

122 Commits (d5950b4355049092739bea97d1bdc14433126cc5)

Author SHA1 Message Date
Olaf Kirch 0b7f22aab4 [IPV4]: Prevent oops when printing martian source
In some cases, we may be generating packets with a source address that
qualifies as martian. This can happen when we're in the middle of setting
up the network, and netfilter decides to reject a packet with an RST.
The IPv4 routing code would try to print a warning and oops, because
locally generated packets do not have a valid skb->mac.raw pointer
at this point.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11 21:01:42 -07:00
Julian Anastasov af9debd461 [IPVS]: Add and reorder bh locks after moving to keventd.
An addition to the last ipvs changes that move
update_defense_level/si_meminfo to keventd:

- ip_vs_random_dropentry now runs in process context and should use _bh
  locks to protect from softirqs

- update_defense_level still needs _bh locks after si_meminfo is called,
  for the same purpose

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11 20:59:57 -07:00
David L Stevens 84b42baef7 [IPV4]: fix IPv4 leave-group group matching
This patch fixes the multicast group matching for 
IP_DROP_MEMBERSHIP, similar to the IP_ADD_MEMBERSHIP fix in a prior
patch. Groups are identifiedby <group address,interface> and including
the interface address in the match will fail if a leave-group is done
by address when the join was done by index, or if different addresses
on the same interface are used in the join and leave.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 17:48:38 -07:00
David L Stevens 9951f036fe [IPV4]: (INCLUDE,empty)/leave-group equivalence for full-state MSF APIs & errno fix
1) Adds (INCLUDE, empty)/leave-group equivalence to the full-state 
   multicast source filter APIs (IPv4 and IPv6)

2) Fixes an incorrect errno in the IPv6 leave-group (ENOENT should be
   EADDRNOTAVAIL)

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 17:47:28 -07:00
David L Stevens 917f2f105e [IPV4]: multicast API "join" issues
1) In the full-state API when imsf_numsrc == 0
   errno should be "0", but returns EADDRNOTAVAIL

2) An illegal filter mode change
   errno should be EINVAL, but returns EADDRNOTAVAIL

3) Trying to do an any-source option without IP_ADD_MEMBERSHIP
   errno should be EINVAL, but returns EADDRNOTAVAIL

4) Adds comments for the less obvious error return values

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 17:45:16 -07:00
David L Stevens 8cdaaa15da [IPV4]: multicast API "join" issues
1) Changes IP_ADD_SOURCE_MEMBERSHIP and MCAST_JOIN_SOURCE_GROUP to ignore
   EADDRINUSE errors on a "courtesy join" -- prior membership or not
   is ok for these.

2) Adds "leave group" equivalence of (INCLUDE, empty) filters in the 
   delta-based API. Without this, mixing delta-based API calls that
   end in an (INCLUDE, empty) filter would not allow a subsequent
   regular IP_ADD_MEMBERSHIP. It also frees socket buffer memory that
   isn't needed for both the multicast group record and source filter.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 17:39:23 -07:00
David L Stevens ca9b907d14 [IPV4]: multicast API "join" issues
This patch corrects a few problems with the IP_ADD_MEMBERSHIP
socket option:

1) The existing code makes an attempt at reference counting joins when
   using the ip_mreqn/imr_ifindex interface. Joining the same group
   on the same socket is an error, whatever the API. This leads to
   unexpected results when mixing ip_mreqn by index with ip_mreqn by
   address, ip_mreq, or other API's. For example, ip_mreq followed by
   ip_mreqn of the same group will "work" while the same two reversed
   will not.
           Fixed to always return EADDRINUSE on a duplicate join and
   removed the (now unused) reference count in ip_mc_socklist.

2) The group-search list in ip_mc_join_group() is comparing a full 
   ip_mreqn structure and all of it must match for it to find the
   group. This doesn't correctly match a group that was joined with
   ip_mreq or ip_mreqn with an address (with or without an index). It
   also doesn't match groups that are joined by different addresses on
   the same interface. All of these are the same multicast group,
   which is identified by group address and interface index.
           Fixed the check to correctly match groups so we don't get
   duplicate group entries on the ip_mc_socklist.

3) The old code allocates a multicast address before searching for
   duplicates requiring it to free in various error cases. This
   patch moves the allocate until after the search and
   igmp_max_memberships check, so never a need to allocate, then free
   an entry.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 17:38:07 -07:00
Alexey Kuznetsov 4c866aa798 [IPV4]: Apply sysctl_icmp_echo_ignore_broadcasts to ICMP_TIMESTAMP as well.
This was the full intention of the original code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 17:34:46 -07:00
Victor Fusco 86a76caf87 [NET]: Fix sparse warnings
From: Victor Fusco <victor@cetuc.puc-rio.br>

Fix the sparse warning "implicit cast to nocast type"

Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 14:57:47 -07:00
David S. Miller b03efcfb21 [NET]: Transform skb_queue_len() binary tests into skb_queue_empty()
This is part of the grand scheme to eliminate the qlen
member of skb_queue_head, and subsequently remove the
'list' member of sk_buff.

Most users of skb_queue_len() want to know if the queue is
empty or not, and that's trivially done with skb_queue_empty()
which doesn't use the skb_queue_head->qlen member and instead
uses the queue list emptyness as the test.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 14:57:23 -07:00
David S. Miller 908a75c17a [TCP]: Never TSO defer under periods of congestion.
Congestion window recover after loss depends upon the fact
that if we have a full MSS sized frame at the head of the
send queue, we will send it.  TSO deferral can defeat the
ACK clocking necessary to exit cleanly from recovery.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:43:58 -07:00
David S. Miller c1b4a7e695 [TCP]: Move to new TSO segmenting scheme.
Make TSO segment transmit size decisions at send time not earlier.

The basic scheme is that we try to build as large a TSO frame as
possible when pulling in the user data, but the size of the TSO frame
output to the card is determined at transmit time.

This is guided by tp->xmit_size_goal.  It is always set to a multiple
of MSS and tells sendmsg/sendpage how large an SKB to try and build.

Later, tcp_write_xmit() and tcp_push_one() chop up the packet if
necessary and conditions warrant.  These routines can also decide to
"defer" in order to wait for more ACKs to arrive and thus allow larger
TSO frames to be emitted.

A general observation is that TSO elongates the pipe, thus requiring a
larger congestion window and larger buffering especially at the sender
side.  Therefore, it is important that applications 1) get a large
enough socket send buffer (this is accomplished by our dynamic send
buffer expansion code) 2) do large enough writes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:24:38 -07:00
David S. Miller 0d9901df62 [TCP]: Break out send buffer expansion test.
This makes it easier to understand, and allows easier
tweaking of the heuristic later on.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:21:10 -07:00
David S. Miller cb83199a29 [TCP]: Do not call tcp_tso_acked() if no work to do.
In tcp_clean_rtx_queue(), if the TSO packet is not even partially
acked, do not waste time calling tcp_tso_acked().

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:20:55 -07:00
David S. Miller a56476962e [TCP]: Kill bogus comment above tcp_tso_acked().
Everything stated there is out of data.  tcp_trim_skb()
does adjust the available socket send buffer space and
skb->truesize now.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:20:41 -07:00
David S. Miller b4e26f5ea0 [TCP]: Fix send-side cpu utiliziation regression.
Only put user data purely to pages when doing TSO.

The extra page allocations cause two problems:

1) Add the overhead of the page allocations themselves.
2) Make us do small user copies when we get to the end
   of the TCP socket cache page.

It is still beneficial to purely use pages for TSO,
so we will do it for that case.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:20:27 -07:00
David S. Miller aa93466bdf [TCP]: Eliminate redundant computations in tcp_write_xmit().
tcp_snd_test() is run for every packet output by a single
call to tcp_write_xmit(), but this is not necessary.

For one, the congestion window space needs to only be
calculated one time, then used throughout the duration
of the loop.

This cleanup also makes experimenting with different TSO
packetization schemes much easier.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:20:09 -07:00
David S. Miller 7f4dd0a943 [TCP]: Break out tcp_snd_test() into it's constituent parts.
tcp_snd_test() does several different things, use inline
functions to express this more clearly.

1) It initializes the TSO count of SKB, if necessary.
2) It performs the Nagle test.
3) It makes sure the congestion window is adhered to.
4) It makes sure SKB fits into the send window.

This cleanup also sets things up so that things like the
available packets in the congestion window does not need
to be calculated multiple times by packet sending loops
such as tcp_write_xmit().
    
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:19:54 -07:00
David S. Miller 55c97f3e99 [TCP]: Fix __tcp_push_pending_frames() 'nonagle' handling.
'nonagle' should be passed to the tcp_snd_test() function
as 'TCP_NAGLE_PUSH' if we are checking an SKB not at the
tail of the write_queue.  This is because Nagle does not
apply to such frames since we cannot possibly tack more
data onto them.

However, while doing this __tcp_push_pending_frames() makes
all of the packets in the write_queue use this modified
'nonagle' value.

Fix the bug and simplify this function by just calling
tcp_write_xmit() directly if sk_send_head is non-NULL.

As a result, we can now make tcp_data_snd_check() just call
tcp_push_pending_frames() instead of the specialized
__tcp_data_snd_check().

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:19:38 -07:00
David S. Miller a2e2a59c93 [TCP]: Fix redundant calculations of tcp_current_mss()
tcp_write_xmit() uses tcp_current_mss(), but some of it's callers,
namely __tcp_push_pending_frames(), already has this value available
already.

While we're here, fix the "cur_mss" argument to be "unsigned int"
instead of plain "unsigned".

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:19:23 -07:00
David S. Miller 92df7b518d [TCP]: tcp_write_xmit() tabbing cleanup
Put the main basic block of work at the top-level of
tabbing, and mark the TCP_CLOSE test with unlikely().

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:19:06 -07:00
David S. Miller a762a98007 [TCP]: Kill extra cwnd validate in __tcp_push_pending_frames().
The tcp_cwnd_validate() function should only be invoked
if we actually send some frames, yet __tcp_push_pending_frames()
will always invoke it.  tcp_write_xmit() does the call for us,
so the call here can simply be removed.

Also, tcp_write_xmit() can be marked static.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:18:51 -07:00
David S. Miller f44b527177 [TCP]: Add missing skb_header_release() call to tcp_fragment().
When we add any new packet to the TCP socket write queue,
we must call skb_header_release() on it in order for the
TSO sharing checks in the drivers to work.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:18:34 -07:00
David S. Miller 84d3e7b957 [TCP]: Move __tcp_data_snd_check into tcp_output.c
It reimplements portions of tcp_snd_check(), so it
we move it to tcp_output.c we can consolidate it's
logic much easier in a later change.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:18:18 -07:00
David S. Miller f6302d1d78 [TCP]: Move send test logic out of net/tcp.h
This just moves the code into tcp_output.c, no code logic changes are
made by this patch.

Using this as a baseline, we can begin to untangle the mess of
comparisons for the Nagle test et al.  We will also be able to reduce
all of the redundant computation that occurs when outputting data
packets.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:18:03 -07:00
David S. Miller fc6415bcb0 [TCP]: Fix quick-ack decrementing with TSO.
On each packet output, we call tcp_dec_quickack_mode()
if the ACK flag is set.  It drops tp->ack.quick until
it hits zero, at which time we deflate the ATO value.

When doing TSO, we are emitting multiple packets with
ACK set, so we should decrement tp->ack.quick that many
segments.

Note that, unlike this case, tcp_enter_cwr() should not
take the tcp_skb_pcount(skb) into consideration.  That
function, one time, readjusts tp->snd_cwnd and moves
into TCP_CA_CWR state.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:17:45 -07:00
David S. Miller c65f7f00c5 [TCP]: Simplify SKB data portion allocation with NETIF_F_SG.
The ideal and most optimal layout for an SKB when doing
scatter-gather is to put all the headers at skb->data, and
all the user data in the page array.

This makes SKB splitting and combining extremely simple,
especially before a packet goes onto the wire the first
time.

So, when sk_stream_alloc_pskb() is given a zero size, make
sure there is no skb_tailroom().  This is achieved by applying
SKB_DATA_ALIGN() to the header length used here.

Next, make select_size() in TCP output segmentation use a
length of zero when NETIF_F_SG is true on the outgoing
interface.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:17:25 -07:00
Robert Olsson 2f36895aa7 [IPV4]: More broken memory allocation fixes for fib_trie
Below a patch to preallocate memory when doing resize of trie (inflate halve)
If preallocations fails it just skips the resize of this tnode for this time.

The oops we got when killing bgpd (with full routing) is now gone. 
Patrick memory patch is also used.

Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:02:40 -07:00
Eric Dumazet bb1d23b026 [IPV4]: Bug fix in rt_check_expire()
- rt_check_expire() fixes (an overflow occured if size of the hash
  was >= 65536)

reminder of the bugfix:

The rt_check_expire() has a serious problem on machines with large
route caches, and a standard HZ value of 1000.

With default values, ie ip_rt_gc_interval = 60*HZ = 60000 ;

the loop count :

     for (t = ip_rt_gc_interval << rt_hash_log; t >= 0;


overflows (t is a 31 bit value) as soon rt_hash_log is >= 16  (65536
slots in route cache hash table).

In this case, rt_check_expire() does nothing at all

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:00:32 -07:00
Eric Dumazet 424c4b70cc [IPV4]: Use the fancy alloc_large_system_hash() function for route hash table
- rt hash table allocated using alloc_large_system_hash() function.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:58:19 -07:00
Eric Dumazet 22c047ccbc [NET]: Hashed spinlocks in net/ipv4/route.c
- Locking abstraction
- Spinlocks moved out of rt hash table : Less memory (50%) used by rt 
  hash table. it's a win even on UP.
- Sizing of spinlocks table depends on NR_CPUS

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:55:24 -07:00
Patrick McHardy f0e36f8cee [IPV4]: Handle large allocations in fib_trie
Inflating a node a couple of times makes it exceed the 128k kmalloc limit.
Use __get_free_pages for allocations > PAGE_SIZE, as in fib_hash.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Robert Olsson <Robert.Olsson@data.slu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:44:55 -07:00
Herbert Xu 30e224d76f [IPV4]: Fix crash in ip_rcv while booting related to netconsole
Makes IPv4 ip_rcv registration happen last in af_inet.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:40:10 -07:00
Thomas Graf e176fe8954 [NET]: Remove unused security member in sk_buff
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:12:44 -07:00
Patrick McHardy 9666dae510 [NETFILTER]: Fix connection tracking bug in 2.6.12
In 2.6.12 we started dropping the conntrack reference when a packet
leaves the IP layer. This broke connection tracking on a bridge,
because bridge-netfilter defers calling some NF_IP_* hooks to the bridge
layer for locally generated packets going out a bridge, where the
conntrack reference is no longer available. This patch keeps the
reference in this case as a temporary solution, long term we will
remove the defered hook calling. No attempt is made to drop the
reference in the bridge-code when it is no longer needed, tc actions
could already have sent the packet anywhere.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 16:04:44 -07:00
Neil Horman fb3d89498d [IPVS]: Close race conditions on ip_vs_conn_tab list modification
In an smp system, it is possible for an connection timer to expire, calling
ip_vs_conn_expire while the connection table is being flushed, before
ct_write_lock_bh is acquired.

Since the list iterator loop in ip_vs_con_flush releases and re-acquires the
spinlock (even though it doesn't re-enable softirqs), it is possible for the
expiration function to modify the connection list, while it is being traversed
in ip_vs_conn_flush.

The result is that the next pointer gets set to NULL, and subsequently
dereferenced, resulting in an oops.

Signed-off-by: Neil Horman <nhorman@redhat.com>
Acked-by: JulianAnastasov
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 15:40:02 -07:00
Robert Olsson f835e471b5 [IPV4]: Broken memory allocation in fib_trie
This should help up the insertion... but the resize is more crucial.
and complex and needs some thinking. 

Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 15:00:39 -07:00
Maxime Bizon 7a1af5d7bb [IPV4]: ipconfig.c: fix dhcp timeout behaviour
I think there is a small bug in ipconfig.c in case IPCONFIG_DHCP is set
and dhcp is used.

When a DHCPOFFER is received, ip address is kept until we get DHCPACK.
If no ack is received, ic_dynamic() returns negatively, but leaves the
offered ip address in ic_myaddr.

This makes the main loop in ip_auto_config() break and uses the maybe
incomplete configuration.

Not sure if it's the best way to do, but the following trivial patch
correct this. 

Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 13:21:12 -07:00
Dietmar Eggemann 2c2910a401 [IPV4]: Snmpv2 Mib IP counter ipInAddrErrors support
I followed Thomas' proposal to see every martian destination as a case
where the ipInAddrErrors counter has to be incremented. There are
two advantages by doing so: (1) The relation between the ipInReceive
counter and all the other ipInXXX counters is more accurate in the
case the RTN_UNICAST code check fails and (2) it makes the code in
ip_route_input_slow easier.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 13:06:23 -07:00
Patrick McHardy 9ef1d4c7c7 [NETLINK]: Missing initializations in dumped data
Mostly missing initialization of padding fields of 1 or 2 bytes length,
two instances of uninitialized nlmsgerr->msg of 16 bytes length.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 12:55:30 -07:00
Harald Welte 4095ebf1e6 [NETFILTER]: ipt_CLUSTERIP: fix ARP mangling
This patch adds mangling of ARP requests (in addition to replies),
since ARP caches are made from snooping both requests and replies.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 12:49:30 -07:00
pageexec 4da62fc70d [IPVS]: Fix for overflows
From: <pageexec@freemail.hu>

$subject was fixed in 2.4 already, 2.6 needs it as well.

The impact of the bugs is a kernel stack overflow and privilege escalation
from CAP_NET_ADMIN via the IP_VS_SO_SET_STARTDAEMON/IP_VS_SO_GET_DAEMON
ioctls.  People running with 'root=all caps' (i.e., most users) are not
really affected (there's nothing to escalate), but SELinux and similar
users should take it seriously if they grant CAP_NET_ADMIN to other users.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-26 16:00:19 -07:00
Adrian Bunk 60fe740320 [TCP]: Let TCP_CONG_ADVANCED default to n
It doesn't seem to make much sense to let an "If unsure, say N." option 
default to y.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-26 15:21:15 -07:00
David S. Miller 6c3607676c [IPV4]: Fix thinko in TCP_CONG_BIC default.
Since it is tristate when we offer it as a choice, we should
definte it also as tristate when forcing it as the default.
Otherwise kconfig warns.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-26 15:20:20 -07:00
David S. Miller a6484045fd [TCP]: Do not present confusing congestion control options by default.
Create TCP_CONG_ADVANCED option, akin to IP_ADVANCED_ROUTER, which
when disabled will bypass all of the congestion control Kconfig
options and leave the user with a safe default.

That safe default is currently BIC-TCP with new Reno as a fallback.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-24 18:07:51 -07:00
David S. Miller bb298ca3ce [IPV4]: Move FIB lookup algorithm choice under IP_ADVANCED_ROUTING
Most users need not be concerned with a complex choice of what
FIB lookup algorithm to use.  So give them the safe default of
IP_FIB_HASH if IP_ADVANCED_ROUTING is disabled.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-24 17:50:53 -07:00
Stephen Hemminger 5f8ef48d24 [TCP]: Allow choosing TCP congestion control via sockopt.
Allow using setsockopt to set TCP congestion control to use on a per
socket basis.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 20:37:36 -07:00
John Heffner 0e57976b63 [TCP]: Add Scalable TCP congestion control module.
This patch implements Tom Kelly's Scalable TCP congestion control algorithm 
for the modular framework.

The algorithm has some nice scaling properties, and has been used a fair bit 
in research, though is known to have significant fairness issues, so it's not 
really suitable for general purpose use.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 12:29:07 -07:00
Baruch Even a7868ea68d [TCP]: Add H-TCP congestion control module.
H-TCP is a congestion control algorithm developed at the Hamilton Institute, by
Douglas Leith and Robert Shorten. It is extending the standard Reno algorithm
with mode switching is thus a relatively simple modification.

H-TCP is defined in a layered manner as it is still a research platform. The
basic form includes the modification of beta according to the ratio of maxRTT
to min RTT and the alpha=2*factor*(1-beta) relation, where factor is dependant
on the time since last congestion.

The other layers improve convergence by adding appropriate factors to alpha.

The following patch implements the H-TCP algorithm in it's basic form.

Signed-Off-By: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 12:28:11 -07:00
Stephen Hemminger b87d8561d8 [TCP]: Add TCP Vegas congestion control module.
TCP Vegas code modified for the new TCP infrastructure.  
Vegas now uses microsecond resolution timestamps for 
better estimation of performance over higher speed links.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 12:27:19 -07:00