redonkable/alistair23-linux

Author	SHA1	Message	Date
Ilpo Järvinen	95eacd27e2	[TCP]: fix comments that got messed up during code move Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:53:59 -07:00
Ilpo Järvinen	dc86967b54	[TCP]: No fackets_out/highest_sack tuning when SACK isn't enabled This was found due to bug report from Cedric Le Goater though it turned this turned out to be unrelated bug. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:53:58 -07:00
Patrick McHardy	f73e924cdd	[NETFILTER]: ctnetlink: use netlink policy Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:53:35 -07:00
Patrick McHardy	fdf708322d	[NETFILTER]: nfnetlink: rename functions containing 'nfattr' There is no struct nfattr anymore, rename functions to 'nlattr'. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:53:32 -07:00
Patrick McHardy	df6fb868d6	[NETFILTER]: nfnetlink: convert to generic netlink attribute functions Get rid of the duplicated rtnetlink macros and use the generic netlink attribute functions. The old duplicated stuff is moved to a new header file that exists just for userspace. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:53:31 -07:00
Stephen Hemminger	3b04ddde02	[NET]: Move hardware header operations out of netdevice. Since hardware header operations are part of the protocol class not the device instance, make them into a separate object and save memory. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:52 -07:00
Stephen Hemminger	b95cce3576	[NET]: Wrap hard_header_parse Wrap the hard_header_parse function to simplify next step of header_ops conversion. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:51 -07:00
Stephen Hemminger	0c4e85813d	[NET]: Wrap netdevice hardware header creation. Add inline for common usage of hardware header creation, and fix bug in IPV6 mcast where the assumption about negative return is an errno. Negative return from hard_header means not enough space was available,(ie -N bytes). Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:50 -07:00
Eric W. Biederman	2774c7aba6	[NET]: Make the loopback device per network namespace. This patch makes loopback_dev per network namespace. Adding code to create a different loopback device for each network namespace and adding the code to free a loopback device when a network namespace exits. This patch modifies all users the loopback_dev so they access it as init_net.loopback_dev, keeping all of the code compiling and working. A later pass will be needed to update the users to use something other than the initial network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:49 -07:00
Eric W. Biederman	0cc217e16c	[IPV4]: When possible test for IFF_LOOPBACK and not dev == loopback_dev Now that multiple loopback devices are becoming possible it makes the code a little cleaner and more maintainable to test if a deivice is th a loopback device by testing dev->flags & IFF_LOOPBACK instead of dev == loopback_dev. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:48 -07:00
Eric W. Biederman	5967789dbc	[IPV4]: Remove unnecessary test for the loopback device from inetdev_destroy Currently we never call unregister_netdev for the loopback device so it is impossible for us to reach inetdev_destroy with the loopback device. So the test in inetdev_destroy is unnecessary. Further when testing with my network namespace patches removing unregistering the loopback device and calling inetdev_destroy works fine so there appears to be no reason for avoiding unregistering the loopback device. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:48 -07:00
Ilpo Järvinen	912d8f0b1f	[TCP] MIB: Count FRTO's successfully detected spurious RTOs Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:39 -07:00
Ilpo Järvinen	93e6802029	[TCP]: Reordered ACK's (old) SACKs not included to discarded MIB In case of ACK reordering, the SACK block might be valid in it's time but is already obsoleted since we've received another kind of confirmation about arrival of the segments through snd_una advancement of an earlier packet. I didn't bother to build distinguishing of valid and invalid SACK blocks but simply made reordered SACK blocks that are too old always not counted regardless of their "real" validity which could be determined by using the ack field of the reordered packet (won't be significant IMHO). DSACKs can very well be considered useful even in this situation, so won't do any of this for them. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:38 -07:00
Ilpo Järvinen	a6963a6b3d	[TCP]: Re-place highest_sack check to a more robust position I previously added checking to position that is rather poor as state has already been adjusted quite a bit. Re-placing it above all state changes should be more robust though the return should never ever get executed regardless of its place :-). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:38 -07:00
Daniel Lezcano	de3cb747ff	[NET]: Dynamically allocate the loopback device, part 1. This patch replaces all occurences to the static variable loopback_dev to a pointer loopback_dev. That provides the mindless, trivial, uninteressting change part for the dynamic allocation for the loopback. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-By: Kirill Korotaev <dev@sw.ru> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:14 -07:00
Ilpo Järvinen	b76892051c	[TCP]: Avoid clearing sacktag hint in trivial situations There's no reason to clear the sacktag skb hint when small part of the rexmit queue changes. Account changes (if any) instead when fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so no need to discard SACK tag hint at all. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:12 -07:00
Ilpo Järvinen	c96fd3d461	[TCP]: Enable SACK enhanced FRTO (RFC4138) by default Most of the description that follows comes from my mail to netdev (some editing done): Main obstacle to FRTO use is its deployment as it has to be on the sender side where as wireless link is often the receiver's access link. Take initiative on behalf of unlucky receivers and enable it by default in future Linux TCP senders. Also IETF seems to interested in advancing FRTO from experimental [1]. How does FRTO help? =================== FRTO detects spurious RTOs and avoids a number of unnecessary retransmissions and a couple of other problems that can arise due to incorrect guess made at RTO (i.e., that segments were lost when they actually got delayed which is likely to occur e.g. in wireless environments with link-layer retransmission). Though FRTO cannot prevent the first (potentially unnecessary) retransmission at RTO, I suspect that it won't cost that much even if you have to pay for each bit (won't be that high percentage out of all packets after all :-)). However, usually when you have a spurious RTO, not only the first segment unnecessarily retransmitted but the whole window. It goes like this: all cumulative ACKs got delayed due to in-order delivery, then TCP will actually send 1.5*original cwnd worth of data in the RTO's slow-start when the delayed ACKs arrive (basically the original cwnd worth of it unnecessarily). In case one is interested in minimizing unnecessary retransmissions e.g. due to cost, those rexmissions must never see daylight. Besides, in the worst case the generated burst overloads the bottleneck buffers which is likely to significantly delay the further progress of the flow. In case of ll rexmissions, ACK compression often occurs at the same time making the burst very "sharp edged" (in that case TCP often loses most of the segments above high_seq => very bad performance too). When FRTO is enabled, those unnecessary retransmissions are fully avoided except for the first segment and the cwnd behavior after detected spurious RTO is determined by the response (one can tune that by sysctl). Basic version (non-SACK enhanced one), FRTO can fail to detect spurious RTO as spurious and falls back to conservative behavior. ACK lossage is much less significant than reordering, usually the FRTO can detect spurious RTO if at least 2 cumulative ACKs from original window are preserved (excluding the ACK that advances to high_seq). With SACK-enhanced version, the detection is quite robust. FRTO should remove the need to set a high lower bound for the RTO estimator due to delay spikes that occur relatively common in some environments (esp. in wireless/cellular ones). [1] http://www1.ietf.org/mail-archive/web/tcpm/current/msg02862.html Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:12 -07:00
Ilpo Järvinen	009a2e3e4e	[TCP] FRTO: Improve interoperability with other undo_marker users Basically this change enables it, previously other undo_marker users were left with nothing. Reverse undo_marker logic completely to get it set right in CA_Loss. On the other hand, when spurious RTO is detected, clear it. Clearing might be too heavy for some scenarios but seems safe enough starting point for now and shouldn't have much effect except in majority of cases (if in any). By adding a new FLAG_ we avoid looping through write_queue when RTO occurs. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:11 -07:00
Ilpo Järvinen	7c46a03e67	[TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue Implements following cleanups: - Comment re-placement (CodingStyle) - tcp_tso_acked() local (wrapper-like) variable removal (readability) - __-types removed (IMHO they make local variables jumpy looking and just was space) - acked -> flag (naming conventions elsewhere in TCP code) - linebreak adjustments (readability) - nested if()s combined (reduced indentation) - clarifying newlines added Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:10 -07:00
Ilpo Järvinen	13fcf850cc	[TCP]: Move accounting from tso_acked to clean_rtx_queue The accounting code is pretty much the same, so it's a shame we do it in two places. I'm not too sure if added fully_acked check in MTU probing is really what we want perhaps the added end_seq could be used in the after() comparison. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:09 -07:00
Ilpo Järvinen	5af4ec236f	[TCP]: clear_all_retrans_hints prefixed by tcp_ In addition, fix its function comment spacing. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>	2007-10-10 16:52:09 -07:00
Ilpo Järvinen	91fed7a15c	[TCP]: Make fackets_out accurate Substraction for fackets_out is unconditional when snd_una advances, thus there's no need to do it inside the loop. Just make sure correct bounds are honored. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:08 -07:00
Ilpo Järvinen	0dde7b5404	[TCP]: Maintain highest_sack accurately to the highest skb In general, it should not be necessary to call tcp_fragment for already SACKed skbs, but it's better to be safe than sorry. And indeed, it can be called from sacktag when a DSACK arrives or some ACK (with SACK) reordering occurs (sacktag could be made to avoid the call in the latter case though I'm not sure if it's worth of the trouble and added complexity to cover such marginal case). The collapse case has return for SACKED_ACKED case earlier, so just WARN_ON if internal inconsistency is detected for some reason. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:08 -07:00
Rick Jones	5ee3afba88	[TCP]: Return useful listenq info in tcp_info and INET_DIAG_INFO. Return some useful information such as the maximum listen backlog and the current listen backlog in the tcp_info structure and INET_DIAG_INFO. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:35 -07:00
David L Stevens	96793b4825	[IPV4]: Add ICMPMsgStats MIB (RFC 4293) Background: RFC 4293 deprecates existing individual, named ICMP type counters to be replaced with the ICMPMsgStatsTable. This table includes entries for both IPv4 and IPv6, and requires counting of all ICMP types, whether or not the machine implements the type. These patches "remove" (but not really) the existing counters, and replace them with the ICMPMsgStats tables for v4 and v6. It includes the named counters in the /proc places they were, but gets the values for them from the new tables. It also counts packets generated from raw socket output (e.g., OutEchoes, MLD queries, RA's from radvd, etc). Changes: 1) create icmpmsg_statistics mib 2) create icmpv6msg_statistics mib 3) modify existing counters to use these 4) modify /proc/net/snmp to add "IcmpMsg" with all ICMP types listed by number for easy SNMP parsing 5) modify /proc/net/snmp printing for "Icmp" to get the named data from new counters. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:28 -07:00
Denis Cheng	c40f6fff40	[IPV4] af_inet.c: use ARRAY_SIZE macro from kernel.h instead Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:26 -07:00
Herbert Xu	0cfad07555	[NETLINK]: Avoid pointer in netlink_run_queue I was looking at Patrick's fix to inet_diag and it occured to me that we're using a pointer argument to return values unnecessarily in netlink_run_queue. Changing it to return the value will allow the compiler to generate better code since the value won't have to be memory-backed. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:24 -07:00
Denis V. Lunev	76c72d4f44	[IPV4/IPV6/DECNET]: Small cleanup for fib rules. This patch slightly cleanups FIB rules framework. rules_list as a pointer on struct fib_rules_ops is useless. It is always assigned with a static per/subsystem list in IPv4, IPv6 and DecNet. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:22 -07:00
Paul Moore	63d804eade	[CIPSO]: remove duplicated code in the cipso_v4_*_getattr() functions The bulk of the CIPSO option parsing/processing in the cipso_v4_sock_getattr() and cipso_v4_skb_getattr() functions are identical, the only real difference being where the functions obtain the CIPSO option itself. This patch creates a new function, cipso_v4_getattr(), which contains the common CIPSO option parsing/processing code and modifies the existing functions to call this new helper function. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:17 -07:00
Ralf Baechle	10d024c1b2	[NET]: Nuke SET_MODULE_OWNER macro. It's been a useless no-op for long enough in 2.6 so I figured it's time to remove it. The number of people that could object because they're maintaining unified 2.4 and 2.6 drivers is probably rather small. [ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ] Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:13 -07:00
Eric Dumazet	39c90ece75	[IPV4]: Convert rt_check_expire() from softirq processing to workqueue. On loaded/big hosts, rt_check_expire() if of litle use, because it generally breaks out of its main loop because of a jiffies change. It can take a long time (read : timer invocations) to actually scan the whole hash table, freeing unused entries. Converting it to use a workqueue instead of softirq is a nice move because we can allow rt_check_expire() to do the scan it is supposed to do, without hogging the CPU. This has an impact on the average number of entries in cache, reducing ram usage. Cache is more responsive to parameter changes (/proc/sys/net/ipv4/route/gc_timeout and /proc/sys/net/ipv4/route/gc_interval) Note: Maybe the default value of gc_interval (60 seconds) is too high, since this means we actually need 5 (300/60) invocations to scan the whole table. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:25 -07:00
Thomas Graf	8f4c1f9b04	[NETLINK]: Introduce nested and byteorder flag to netlink attribute This change allows the generic attribute interface to be used within the netfilter subsystem where this flag was initially introduced. The byte-order flag is yet unused, it's intended use is to allow automatic byte order convertions for all atomic types. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:16 -07:00
Eric W. Biederman	881d966b48	[NET]: Make the device list and device lookups per namespace. This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:10 -07:00
Eric W. Biederman	b4b510290b	[NET]: Support multiple network namespaces with netlink Each netlink socket will live in exactly one network namespace, this includes the controlling kernel sockets. This patch updates all of the existing netlink protocols to only support the initial network namespace. Request by clients in other namespaces will get -ECONREFUSED. As they would if the kernel did not have the support for that netlink protocol compiled in. As each netlink protocol is updated to be multiple network namespace safe it can register multiple kernel sockets to acquire a presence in the rest of the network namespaces. The implementation in af_netlink is a simple filter implementation at hash table insertion and hash table look up time. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:09 -07:00
Eric W. Biederman	e9dc865340	[NET]: Make device event notification network namespace safe Every user of the network device notifiers is either a protocol stack or a pseudo device. If a protocol stack that does not have support for multiple network namespaces receives an event for a device that is not in the initial network namespace it quite possibly can get confused and do the wrong thing. To avoid problems until all of the protocol stacks are converted this patch modifies all netdev event handlers to ignore events on devices that are not in the initial network namespace. As the rest of the code is made network namespace aware these checks can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:09 -07:00
Eric W. Biederman	e730c15519	[NET]: Make packet reception network namespace safe This patch modifies every packet receive function registered with dev_add_pack() to drop packets if they are not from the initial network namespace. This should ensure that the various network stacks do not receive packets in a anything but the initial network namespace until the code has been converted and is ready for them. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:08 -07:00
Eric W. Biederman	1b8d7ae42d	[NET]: Make socket creation namespace safe. This patch passes in the namespace a new socket should be created in and has the socket code do the appropriate reference counting. By virtue of this all socket create methods are touched. In addition the socket create methods are modified so that they will fail if you attempt to create a socket in a non-default network namespace. Failing if we attempt to create a socket outside of the default network namespace ensures that as we incrementally make the network stack network namespace aware we will not export functionality that someone has not audited and made certain is network namespace safe. Allowing us to partially enable network namespaces before all of the exotic protocols are supported. Any protocol layers I have missed will fail to compile because I now pass an extra parameter into the socket creation code. [ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:07 -07:00
Eric W. Biederman	457c4cbc5a	[NET]: Make /proc/net per network namespace This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:06 -07:00
Ilpo Järvinen	172589ccdd	[NET]: DIV_ROUND_UP cleanup (part two) Hopefully captured all single statement cases under net/. I'm not too sure if there is some policy about #includes that are "guaranteed" (ie., in the current tree) to be available through some other #included header, so I just added linux/kernel.h to each changed file that didn't #include it previously. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:37 -07:00
Masahide NAKAMURA	3b26a9a655	[IPV4] IPSEC: Omit redirect for tunnelled packet. IPv4 IPsec tunnel gateway incorrectly sends redirect to sender if it is onlink host when network device the IPsec tunnelled packet is arrived is the same as the one the decapsulated packet is sent. With this patch, it omits to send the redirect when the forwarding skbuff carries secpath, since such skbuff should be assumed as a decapsulated packet from IPsec tunnel by own. Request for comments: Alternatively we'd have another way to change net/ipv4/route.c (__mkroute_input) to use RTCF_DOREDIRECT flag unless skbuff has no secpath. It is better than this patch at performance point of view because IPv4 redirect judgement is done at routing slow-path. However, it should be taken care of resource changes between SAD(XFRM states) and routing table. In other words, When IPv4 SAD is changed does the related routing entry go to its slow-path? If not, it is reasonable to apply this patch. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:33 -07:00
Stephen Hemminger	32c1da7081	[UDP]: Randomize port selection. This patch causes UDP port allocation to be randomized like TCP. The earlier code would always choose same port (ie first empty list). Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:31 -07:00
Ilpo Järvinen	356f89e12e	[NET] Cleanup: DIV_ROUND_UP Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:30 -07:00
Ilpo Järvinen	18f02545a9	[TCP] MIB: Add counters for discarded SACK blocks In DSACK case, some events are not extraordinary, such as packet duplication generated DSACK. They can arrive easily below snd_una when undo_marker is not set (TCP being in CA_Open), counting such DSACKs amoung SACK discards will likely just mislead if they occur in some scenario when there are other problems as well. Similarly, excessively delayed packets could cause "normal" DSACKs. Therefore, separate counters are allocated for DSACK events. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:30 -07:00
Ilpo Järvinen	5b3c98821a	[TCP]: Discard fuzzy SACK blocks SACK processing code has been a sort of russian roulette as no validation of SACK blocks is previously attempted. Besides, it is not very clear what all kinds of broken SACK blocks really mean (e.g., one that has start and end sequence numbers reversed). So now close the roulette once and for all. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:29 -07:00
Ilpo Järvinen	6728e7dc3e	[TCP]: Rename tcp_ack_packets_out -> tcp_rearm_rto Only thing that tiny function does is rearming the RTO (if necessary), name it accordingly. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:28 -07:00
Ilpo Järvinen	6ff03ac355	[TCP]: tcp_packets_out_inc to tcp_output.c (no callers elsewhere) Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:28 -07:00
Ilpo Järvinen	e9144bd8da	[TCP]: Remove unnecessary wrapper tcp_packets_out_dec Makes caller side more obvious, there's no need to have a wrapper for this oneliner! Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:27 -07:00
Stephen Hemminger	ab66b4a7a3	[IPV4] fib_trie: macro cleanup This patch converts the messy macro for MASK_PFX to inline function and expands TKEY_GET_MASK in the one place it is used. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:01 -07:00
Stephen Hemminger	0680191642	[IPV4] fib_trie: cleanup Try this out: * replace macro's with inlines * get rid of places doing multiple evaluations of NODE_PARENT [akpm@linux-foundation.org: rcu_dereference wants an lval] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:01 -07:00
Ilpo Järvinen	e60402d0a9	[TCP]: Move sack_ok access to obviously named funcs & cleanup Previously code had IsReno/IsFack defined as macros that were local to tcp_input.c though sack_ok field has user elsewhere too for the same purpose. This changes them to static inlines as preferred according the current coding style and unifies the access to sack_ok across multiple files. Magic bitops of sack_ok for FACK and DSACK are also abstracted to functions with appropriate names. Note: - One sack_ok = 1 remains but that's self explanary, i.e., it enables sack - Couple of !IsReno cases are changed to tcp_is_sack - There were no users for IsDSack => I dropped it Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:00 -07:00
Ilpo Järvinen	1b6d427bb7	[TCP]: Reduce sacked_out with reno when purging write_queue Previously TCP had a transitional state during which reno counted segments that are already below the current window into sacked_out, which is now prevented. In addition, re-try now the unconditional S+L skb catching. This approach conservatively calls just remove_sack and leaves reset_sack() calls alone. The best solution to the whole problem would be to first calculate the new sacked_out fully (this patch does not move reno_sack_reset calls from original sites and thus does not implement this). However, that would require very invasive change to fastretrans_alert (perhaps even slicing it to two halves). Alternatively, all callers of tcp_packets_in_flight (i.e., users that depend on sacked_out) should be postponed until the new sacked_out has been calculated but it isn't any simpler alternative. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:58 -07:00
Ilpo Järvinen	d02596e329	[TCP]: Keep state in Disorder also if only lost_out > 0 This happens rather infrequently and is only possible during FRTO. We must not allow TCP to slip to Open state because tcp_fastretrans_alert might then not be called on it's time when FRTO has exited. This become a problem when left_out got removed and was replaced by just sacked_out. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:58 -07:00
Ilpo Järvinen	86426c22d2	[TCP]: Restore over-zealous tcp_sync_left_out-like removals tcp_verify_left_out is useful for verifying S+L condition, so add it back to couple of places in where the code was not calling to tcp_sync_left_out but used own ad-hoc solution (before the tcp_sync_left_out got removed). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:57 -07:00
Ilpo Järvinen	005903bc3a	[TCP]: Left out sync->verify (the new meaning of it) & definify Left_out was dropped a while ago, thus leaving verifying consistency of the "left out" as only task for the function in question. Thus make it's name more appropriate. In addition, it is intentionally converted to #define instead of static inline because the location of the invariant failure is the most important thing to have if this ever triggers. I think it would have been helpful e.g. in this case where the location of the failure point had to be based on some quesswork: http://lkml.org/lkml/2007/5/2/464 ...Luckily the guesswork seems to have proved to be correct. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:57 -07:00
Ilpo Järvinen	83ae40885f	[TCP]: Add tcp_left_out(tp) "back" to get cleaner looking lines tp->left_out got removed but nothing came to replace it back then (users just did addition by themselves), so add function for users now. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:56 -07:00
Ilpo Järvinen	b5860bbac7	[TCP]: Tighten tcp_sock's belt, drop left_out It is easily calculable when needed and user are not that many after all. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:55 -07:00
Ilpo Järvinen	35e8694198	[TCP]: Remove num_acked>0 checks from cong.ctrl mods pkts_acked There is no need for such check in pkts_acked because the callback is not invoked unless at least one segment got fully ACKed (i.e., the snd_una moved past skb's end_seq) by the cumulative ACK's snd_una advancement. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:55 -07:00
Ilpo Järvinen	af610b4ca1	[TCP]: Add tcp_dec_pcount_approx int variant Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:54 -07:00
Ilpo Järvinen	bdf1ee5d3b	[TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it No other users exist for tcp_ecn.h. Very few things remain in tcp.h, for most TCP ECN functions callers reside within a single .c file and can be placed there. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:54 -07:00
Ilpo Järvinen	539d243fdd	[TCP]: Access to highest_sack obsoletes forward_cnt_hint In addition, added a reference about the purpose of the loop. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:53 -07:00
Ilpo Järvinen	9bff40fda0	[TCP] FRTO: remove unnecessary fackets/sacked_out recounting F-RTO does not touch SACKED_ACKED bits at all, so there is no need to recount them in tcp_enter_frto_loss. After removal of the else branch, nested ifs can be combined. This must also reset sacked_out when SACK is not in use as TCP could have received some duplicate ACKs prior RTO. To achieve that in a sane manner, tcp_reset_reno_sack was re-placed by the previous patch. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:53 -07:00
Ilpo Järvinen	4ddf66769d	[TCP]: Move Reno SACKed_out counter functions earlier Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:52 -07:00
David S. Miller	d06e021d71	[TCP]: Extract DSACK detection code from tcp_sacktag_write_queue(). Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:51 -07:00
Ilpo Järvinen	19b2b48658	[TCP]: Rexmit hint must be cleared instead of setting it Stupid error from my side. Even though now that I noticed this, I hoped it would have been an optimization but no, the counter hint is then incorrect. Thus clearing is necessary for now (I still suspect though that this path is never executed). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:51 -07:00
Ilpo Järvinen	d8f4f2235a	[TCP]: Extracted rexmit hint clearing from the LOST marking code Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:50 -07:00
Ilpo Järvinen	d738cd8fca	[TCP]: Add highest_sack seqno, points to globally highest SACK It is guaranteed to be valid only when !tp->sacked_out. In most cases this seqno is available in the last ACK but there is no guarantee for that. The new fast recovery loss marking algorithm needs this as entry point. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:50 -07:00
Jan-Bernd Themann	71c87e0ced	[NET]: Generic Large Receive Offload for TCP traffic This patch provides generic Large Receive Offload (LRO) functionality for IPv4/TCP traffic. LRO combines received tcp packets to a single larger tcp packet and passes them then to the network stack in order to increase performance (throughput). The interface supports two modes: Drivers can either pass SKBs or fragment lists to the LRO engine. Signed-off-by: Jan-Bernd Themann <themann@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:46 -07:00
Ilpo Järvinen	48611c47d0	[TCP]: Fix fastpath_cnt_hint when GSO skb is partially ACKed When only GSO skb was partially ACKed, no hints are reset, therefore fastpath_cnt_hint must be tweaked too or else it can corrupt fackets_out. The corruption to occur, one must have non-trivial ACK/SACK sequence, so this bug is not very often that harmful. There's a fackets_out state reset in TCP because fackets_out is known to be inaccurate and that fixes the issue eventually anyway. In case there was also at least one skb that got fully ACKed, the fastpath_skb_hint is set to NULL which causes a recount for fastpath_cnt_hint (the old value won't be accessed anymore), thus it can safely be decremented without additional checking. Reported by Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-07 23:43:10 -07:00
David S. Miller	f8ab18d2d9	[TCP]: Fix MD5 signature handling on big-endian. Based upon a report and initial patch by Peter Lieven. tcp4_md5sig_key and tcp6_md5sig_key need to start with the exact same members as tcp_md5sig_key. Because they are both cast to that type by tcp_v{4,6}_md5_do_lookup(). Unfortunately tcp{4,6}_md5sig_key use a u16 for the key length instead of a u8, which is what tcp_md5sig_key uses. This just so happens to work by accident on little-endian, but on big-endian it doesn't. Instead of casting, just place tcp_md5sig_key as the first member of the address-family specific structures, adjust the access sites, and kill off the ugly casts. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-28 15:18:35 -07:00
YOSHIFUJI Hideaki	2a0c6c980d	[IPV4]: Just increment OutDatagrams once per a datagram. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-14 17:15:19 -07:00
Patrick McHardy	0a9c730144	[INET_DIAG]: Fix oops in netlink_rcv_skb netlink_run_queue() doesn't handle multiple processes processing the queue concurrently. Serialize queue processing in inet_diag to fix a oops in netlink_rcv_skb caused by netlink_run_queue passing a NULL for the skb. BUG: unable to handle kernel NULL pointer dereference at virtual address 00000054 [349587.500454] printing eip: [349587.500457] c03318ae [349587.500459] *pde = 00000000 [349587.500464] Oops: 0000 [#1] [349587.500466] PREEMPT SMP [349587.500474] Modules linked in: w83627hf hwmon_vid i2c_isa [349587.500483] CPU: 0 [349587.500485] EIP: 0060:[<c03318ae>] Not tainted VLI [349587.500487] EFLAGS: 00010246 (2.6.22.3 #1) [349587.500499] EIP is at netlink_rcv_skb+0xa/0x7e [349587.500506] eax: 00000000 ebx: 00000000 ecx: c148d2a0 edx: c0398819 [349587.500510] esi: 00000000 edi: c0398819 ebp: c7a21c8c esp: c7a21c80 [349587.500517] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 [349587.500521] Process oidentd (pid: 17943, ti=c7a20000 task=cee231c0 task.ti=c7a20000) [349587.500527] Stack: 00000000 c7a21cac f7c8ba78 c7a21ca4 c0331962 c0398819 f7c8ba00 0000004c [349587.500542] f736f000 c7a21cb4 c03988e3 00000001 f7c8ba00 c7a21cc4 c03312a5 0000004c [349587.500558] f7c8ba00 c7a21cd4 c0330681 f7c8ba00 e4695280 c7a21d00 c03307c6 7fffffff [349587.500578] Call Trace: [349587.500581] [<c010361a>] show_trace_log_lvl+0x1c/0x33 [349587.500591] [<c01036d4>] show_stack_log_lvl+0x8d/0xaa [349587.500595] [<c010390e>] show_registers+0x1cb/0x321 [349587.500604] [<c0103bff>] die+0x112/0x1e1 [349587.500607] [<c01132d2>] do_page_fault+0x229/0x565 [349587.500618] [<c03c8d3a>] error_code+0x72/0x78 [349587.500625] [<c0331962>] netlink_run_queue+0x40/0x76 [349587.500632] [<c03988e3>] inet_diag_rcv+0x1f/0x2c [349587.500639] [<c03312a5>] netlink_data_ready+0x57/0x59 [349587.500643] [<c0330681>] netlink_sendskb+0x24/0x45 [349587.500651] [<c03307c6>] netlink_unicast+0x100/0x116 [349587.500656] [<c0330f83>] netlink_sendmsg+0x1c2/0x280 [349587.500664] [<c02fcce9>] sock_sendmsg+0xba/0xd5 [349587.500671] [<c02fe4d1>] sys_sendmsg+0x17b/0x1e8 [349587.500676] [<c02fe92d>] sys_socketcall+0x230/0x24d [349587.500684] [<c01028d2>] syscall_call+0x7/0xb [349587.500691] ======================= [349587.500693] Code: f0 ff 4e 18 0f 94 c0 84 c0 0f 84 66 ff ff ff 89 f0 e8 86 e2 fc ff e9 5a ff ff ff f0 ff 40 10 eb be 55 89 e5 57 89 d7 56 89 c6 53 <8b> 50 54 83 fa 10 72 55 8b 9e 9c 00 00 00 31 c9 8b 03 83 f8 0f Reported by Athanasius <link@miggy.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-11 11:33:28 +02:00
Neil Horman	16fcec35e7	[NETFILTER]: Fix/improve deadlock condition on module removal netfilter So I've had a deadlock reported to me. I've found that the sequence of events goes like this: 1) process A (modprobe) runs to remove ip_tables.ko 2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket, increasing the ip_tables socket_ops use count 3) process A acquires a file lock on the file ip_tables.ko, calls remove_module in the kernel, which in turn executes the ip_tables module cleanup routine, which calls nf_unregister_sockopt 4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the calling process into uninterruptible sleep, expecting the process using the socket option code to wake it up when it exits the kernel 4) the user of the socket option code (process B) in do_ipt_get_ctl, calls ipt_find_table_lock, which in this case calls request_module to load ip_tables_nat.ko 5) request_module forks a copy of modprobe (process C) to load the module and blocks until modprobe exits. 6) Process C. forked by request_module process the dependencies of ip_tables_nat.ko, of which ip_tables.ko is one. 7) Process C attempts to lock the request module and all its dependencies, it blocks when it attempts to lock ip_tables.ko (which was previously locked in step 3) Theres not really any great permanent solution to this that I can see, but I've developed a two part solution that corrects the problem Part 1) Modifies the nf_sockopt registration code so that, instead of using a use counter internal to the nf_sockopt_ops structure, we instead use a pointer to the registering modules owner to do module reference counting when nf_sockopt calls a modules set/get routine. This prevents the deadlock by preventing set 4 from happening. Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking remove operations (the same way rmmod does), and add an option to explicity request blocking operation. So if you select blocking operation in modprobe you can still cause the above deadlock, but only if you explicity try (and since root can do any old stupid thing it would like.... :) ). Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-11 11:28:26 +02:00
Patrick McHardy	0fb9670137	[NETFILTER]: nf_conntrack_ipv4: fix "Frag of proto ..." messages Since we're now using a generic tuple decoding function in ICMP connection tracking, ipv4_get_l4proto() might get called with a fragmented packet from within an ICMP error. Remove the error message we used to print when this happens. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-11 11:27:01 +02:00
Stephen Hemminger	596e415095	[IPV4] devinet: show all addresses assigned to interface Bug: http://bugzilla.kernel.org/show_bug.cgi?id=8876 Not all ips are shown by "ip addr show" command when IPs number assigned to an interface is more than 60-80 (in fact it depends on broadcast/label etc presence on each address). Steps to reproduce: It's terribly simple to reproduce: # for i in $(seq 1 100); do ip ad add 10.0.$i.1/24 dev eth10 ; done # ip addr show this will _not_ show all IPs. Looks like the problem is in netlink/ipv4 message processing. This is fix from bug submitter, it looks correct. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-11 10:41:04 +02:00
David S. Miller	5c127c58ae	[TCP]: 'dst' can be NULL in tcp_rto_min() Reported by Rick Jones. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-31 14:39:44 -07:00
David S. Miller	05bb1fad1c	[TCP]: Allow minimum RTO to be configurable via routing metrics. Cell phone networks do link layer retransmissions and other things that cause unnecessary timeout retransmits. So allow the minimum RTO to be inflated per-route to deal with this. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-30 22:10:28 -07:00
David S. Miller	26722873a4	[TCP]: Describe tcp_init_cwnd() thoroughly in a comment. People often get tripped up by this function and think that it does not implemented the prescribed algorithms from RFC2414 and RFC3390, even though it does. So add a comment to head off such misunderstandings in the future. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-26 18:35:36 -07:00
Flavio Leitner	a96fb49be3	[NET]: Fix IP_ADD/DROP_MEMBERSHIP to handle only connectionless Fix IP[V6]_ADD_MEMBERSHIP and IP[V6]_DROP_MEMBERSHIP to return -EPROTO for connection oriented sockets. Signed-off-by: Flavio Leitner <fleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-26 18:35:35 -07:00
Nick Bowler	96fe1c0237	[IPSEC] AH4: Update IPv4 options handling to conform to RFC 4302. In testing our ESP/AH offload hardware, I discovered an issue with how AH handles mutable fields in IPv4. RFC 4302 (AH) states the following on the subject: For IPv4, the entire option is viewed as a unit; so even though the type and length fields within most options are immutable in transit, if an option is classified as mutable, the entire option is zeroed for ICV computation purposes. The current implementation does not zero the type and length fields, resulting in authentication failures when communicating with hosts that do (i.e. FreeBSD). I have tested record route and timestamp options (ping -R and ping -T) on a small network involving Windows XP, FreeBSD 6.2, and Linux hosts, with one router. In the presence of these options, the FreeBSD and Linux hosts (with the patch or with the hardware) can communicate. The Windows XP host simply fails to accept these packets with or without the patch. I have also been trying to test source routing options (using traceroute -g), but haven't had much luck getting this option to work without AH, let alone with. Signed-off-by: Nick Bowler <nbowler@ellipticsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-26 18:35:33 -07:00
Patrick McHardy	45241a7a07	[NETFILTER]: nf_nat_sip: don't drop short packets Don't drop packets shorter than "SIP/2.0", just ignore them. Keep-alives can validly be shorter for example. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-14 13:14:58 -07:00
Heiko Carstens	cae7ca3d3d	[IPVS]: Use IP_VS_WAIT_WHILE when encessary. For architectures that don't have a volatile atomic_ts constructs like while (atomic_read(&something)); might result in endless loops since a barrier() is missing which forces the compiler to generate code that actually reads memory contents. Fix this in ipvs by using the IP_VS_WAIT_WHILE macro which resolves to while (expr) { cpu_relax(); } (why isn't this open coded btw?) Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-13 22:52:15 -07:00
Jesper Juhl	f49f9967b2	[IPV4]: Clean up duplicate includes in net/ipv4/ This patch cleans up duplicate includes in net/ipv4/ Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-13 22:52:02 -07:00
Joakim Tjernlund	dcbdc93c6c	[IPCONFIG]: ip_auto_config fix The following commandline: root=/dev/mtdblock6 rw rootfstype=jffs2 ip=192.168.1.10:::255.255.255.0:localhost.localdomain:eth1:off console=ttyS0,115200 makes ip_auto_config fall back to DHCP and complain "IP-Config: Incomplete network configuration information." depending on if CONFIG_IP_PNP_DHCP is set or not. The only way I can make ip_auto_config accept my IP config is to add an entry for the server IP: ip=192.168.1.10:192.168.1.15::255.255.255.0:localhost.localdomain:eth1:off I think this is a bug since I am not using a NFS root FS. The following patch fixes the above problem. From: Andrew Morton <akpm@linux-foundation.org> Davem said (in February!): Well, first of all the change in question is not in 2.4.x either. I just checked the current 2.4.x GIT tree and the test is exactly: if (ic_myaddr == INADDR_NONE \|\| #ifdef CONFIG_ROOT_NFS (MAJOR(ROOT_DEV) == UNNAMED_MAJOR && root_server_addr == INADDR_NONE && ic_servaddr == INADDR_NONE) \|\| #endif ic_first_dev->next) { which matches 2.6.x I even checked 2.4.x when it was branched for 2.5.x and the test was the same at the point in time too. Looking at the proposed change a bit it appears that it is probably correct, as it's trying to check that ROOT_DEV is nfs root. But if it is correct then the UNNAMED_MAJOR comparison in the same code block should be removed as it becomes superfluous. I'm happy to apply this patch with that modification made. Signed-off-by: Joakim Tjernlund <joakim.tjernlund@transmode.se> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-13 22:51:59 -07:00
Stephen Hemminger	f34d1955df	[TCP]: H-TCP maxRTT estimation at startup Small patch to H-TCP from Douglas Leith. Fix estimation of maxRTT. The original code ignores rtt measurements during slow start (via the check tp->snd_ssthresh < 0xFFFF) yet this is probably a good time to try to estimate max rtt as delayed acking is disabled and slow start will only exit on a loss which presumably corresponds to a maxrtt measurement. Second, the original code (via the check htcp_ccount(ca) > 3) ignores rtt data during what it estimates to be the first 3 round-trip times. This seems like an unnecessary check now that the RCV timestamp are no longer used for rtt estimation. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-07 18:29:05 -07:00
Patrick McHardy	591e620693	[NETFILTER]: nf_nat: add symbolic dependency on IPv4 conntrack Loading nf_nat causes the conntrack core to be loaded, but we need IPv4 as well. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-07 18:12:01 -07:00
Jesper Juhl	3af8e31cf5	[NETFILTER]: ipt_recent: avoid a possible NULL pointer deref in recent_seq_open() If the call to seq_open() returns != 0 then the code calls kfree(st) but then on the very next line proceeds to dereference the pointer - not good. Problem spotted by the Coverity checker. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-07 18:10:54 -07:00
Ilpo Järvinen	49ff4bb4cd	[TCP]: DSACK signals data receival, be conservative In case a DSACK is received, it's better to lower cwnd as it's a sign of data receival. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-02 19:47:59 -07:00
Ilpo Järvinen	2e6052941a	[TCP]: Also handle snd_una changes in tcp_cwnd_down tcp_cwnd_down must check for it too as it should be conservative in case of collapse stuff and also when receiver is trying to lie (though that wouldn't be very successful/useful anyway). Note: - Separated also is_dupack and do_lost in fast_retransalert * Much cleaner look-and-feel now * This time it really fixes cumulative ACK with many new SACK blocks recovery entry (I claimed this fixes with last patch but it wasn't). TCP will now call tcp_update_scoreboard regardless of is_dupack when in recovery as long as there is enough fackets_out. - Introduce FLAG_SND_UNA_ADVANCED * Some prior_snd_una arguments are unnecessary after it - Added helper FLAG_ANY_PROGRESS to avoid long FLAG...\|FLAG... constructs Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-02 19:46:58 -07:00
David S. Miller	3516ffb0fe	[TCP]: Invoke tcp_sendmsg() directly, do not use inet_sendmsg(). As discovered by Evegniy Polyakov, if we try to sendmsg after a connection reset, we can do incredibly stupid things. The core issue is that inet_sendmsg() tries to autobind the socket, but we should never do that for TCP. Instead we should just go straight into TCP's sendmsg() code which will do all of the necessary state and pending socket error checks. TCP's sendpage already directly vectors to tcp_sendpage(), so this merely brings sendmsg() in line with that. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-02 19:42:28 -07:00
Mariusz Kozlowski	1bcabbdb0b	[IPV4] route.c: mostly kmalloc + memset conversion to k[cz]alloc Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-02 19:42:27 -07:00
Mariusz Kozlowski	4487b2f657	[IPV4] raw.c: kmalloc + memset conversion to kzalloc Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-02 19:42:26 -07:00
Mariusz Kozlowski	8adc546552	[NETFILTER] nf_conntrack_l3proto_ipv4_compat.c: kmalloc + memset conversion to kzalloc Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-02 19:42:24 -07:00
Mariusz Kozlowski	376407039c	[IPV4] ip_options.c: kmalloc + memset conversion to kzalloc Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 14:06:45 -07:00
Ilpo Järvinen	b8ed601cef	[TCP]: Bidir flow must not disregard SACK blocks for lost marking It's possible that new SACK blocks that should trigger new LOST markings arrive with new data (which previously made is_dupack false). In addition, I think this fixes a case where we get a cumulative ACK with enough SACK blocks to trigger the fast recovery (is_dupack would be false there too). I'm not completely pleased with this solution because readability of the code is somewhat questionable as 'is_dupack' in SACK case is no longer about dupacks only but would mean something like 'lost_marker_work_todo' too... But because of Eifel stuff done in CA_Recovery, the FLAG_DATA_SACKED check cannot be placed to the if statement which seems attractive solution. Nevertheless, I didn't like adding another variable just for that either... :-) Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:31 -07:00
Ilpo Järvinen	1e757f9996	[TCP]: Fix ratehalving with bidirectional flows Actually, the ratehalving seems to work too well, as cwnd is reduced on every second ACK even though the packets in flight remains unchanged. Recoveries in a bidirectional flows suffer quite badly because of this, both NewReno and SACK are affected. After this patch, rate halving is performed for ACK only if packets in flight was supposedly changed too. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:30 -07:00
Herbert Xu	b217d616a1	[IPV4/IPV6]: Fail registration if inet device construction fails Now that netdev notifications can fail, we can use this to signal errors during registration for IPv4/IPv6. In particular, if we fail to allocate memory for the inet device, we can fail the netdev registration. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:16 -07:00
Herbert Xu	ccc7911fbd	[IPVS]: Use skb_forward_csum As a path that forwards packets, IPVS should be using skb_forward_csum instead of directly setting ip_summed to CHECKSUM_NONE. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:11 -07:00
Stephen Hemminger	113bbbd8d2	[TCP]: htcp - use measured rtt Change HTCP to use measured RTT rather than smooth RTT. Srtt is computed using the TCP receive timestamp options, so it is vulnerable to hostile receivers. To avoid any problems this might cause use the measured RTT instead. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:27:59 -07:00
Stephen Hemminger	e7d0c88586	[TCP]: cubic - eliminate use of receive time stamp Remove use of received timestamp option value from RTT calculation in Cubic. A hostile receiver may be returning a larger timestamp option than the original value. This would cause the sender to believe the malevolent receiver had a larger RTT and because Cubic tries to provide some RTT friendliness, the sender would then favor the liar. Instead, use the jiffie resolutionRTT value already computed and passed back after ack. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:27:58 -07:00
Stephen Hemminger	30cfd0baf0	[TCP]: congestion control API pass RTT in microseconds This patch changes the API for the callback that is done after an ACK is received. It solves a couple of issues: * Some congestion controls want higher resolution value of RTT (controlled by TCP_CONG_RTT_SAMPLE flag). These don't really want a ktime, but all compute a RTT in microseconds. * Other congestion control could use RTT at jiffies resolution. To keep API consistent the units should be the same for both cases, just the resolution should change. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:27:57 -07:00

1 2 3 4 5 ...

1686 commits