alistair23-linux

redonkable

Author	SHA1	Message	Date
Pavel Emelyanov	32f0c4cbe4	[NETNS]: Don't memset() netns to zero manually The newly created net namespace is set to 0 with memset() in setup_net(). The setup_net() is also called for the init_net_ns(), which is zeroed naturally as a global var. So remove this memset and allocate new nets with the kmem_cache_zalloc(). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:59 -07:00
Pavel Emelyanov	4665079cbb	[NETNS]: Move some code into __init section when CONFIG_NET_NS=n With the net namespaces many code leaved the __init section, thus making the kernel occupy more memory than it did before. Since we have a config option that prohibits the namespace creation, the functions that initialize/finalize some netns stuff are simply not needed and can be freed after the boot. Currently, this is almost not noticeable, since few calls are no longer in __init, but when the namespaces will be merged it will be possible to free more code. I propose to use the __net_init, __net_exit and __net_initdata "attributes" for functions/variables that are not used if the CONFIG_NET_NS is not set to save more space in memory. The exiting functions cannot just reside in the __exit section, as noticed by David, since the init section will have references on it and the compilation will fail due to modpost checks. These references can exist, since the init namespace never dies and the exit callbacks are never called. So I introduce the __exit_refok attribute just like it is already done with the __init_refok. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:58 -07:00
Jeff Garzik	14e3e07979	[NET]: split dev_ifsioc() according to locking This always bugged me: dev_ioctl() called dev_ifsioc() either inside read_lock(dev_base_lock) or rtnl_lock(), depending on the ioctl being executed. This change moves the ioctls executed inside dev_base_lock to a new function, dev_ifsioc_locked(). Now the locking context is completely clear to the reader. Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:49 -07:00
Stephen Hemminger	cfcabdcc2d	[NET]: sparse warning fixes Fix a bunch of sparse warnings. Mostly about 0 used as NULL pointer, and shadowed variable declarations. One notable case was that hash size should have been unsigned. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:48 -07:00
Eric W. Biederman	f4618d39a3	[NETNS]: Simplify the network namespace list locking rules. Denis V. Lunev <den@sw.ru> noticed that the locking rules for the network namespace list are over complicated and broken. In particular the current register_netdev_notifier currently does not take any lock making the for_each_net iteration racy with network namespace creation and destruction. Oops. The fact that we need to use for_each_net in rtnl_unlock() when the rtnetlink support becomes per network namespace makes designing the proper locking tricky. In addition we need to be able to call rtnl_lock() and rtnl_unlock() when we have the net_mutex held. After thinking about it and looking at the alternatives carefully it looks like the simplest and most maintainable solution is to remove net_list_mutex altogether, and to use the rtnl_mutex instead. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:55 -07:00
Stephen Hemminger	3b04ddde02	[NET]: Move hardware header operations out of netdevice. Since hardware header operations are part of the protocol class not the device instance, make them into a separate object and save memory. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:52 -07:00
Stephen Hemminger	0c4e85813d	[NET]: Wrap netdevice hardware header creation. Add inline for common usage of hardware header creation, and fix bug in IPV6 mcast where the assumption about negative return is an errno. Negative return from hard_header means not enough space was available,(ie -N bytes). Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:50 -07:00
Eric W. Biederman	2774c7aba6	[NET]: Make the loopback device per network namespace. This patch makes loopback_dev per network namespace. Adding code to create a different loopback device for each network namespace and adding the code to free a loopback device when a network namespace exits. This patch modifies all users the loopback_dev so they access it as init_net.loopback_dev, keeping all of the code compiling and working. A later pass will be needed to update the users to use something other than the initial network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:49 -07:00
Eric W. Biederman	9dd776b6d7	[NET]: Add network namespace clone & unshare support. This patch allows you to create a new network namespace using sys_clone, or sys_unshare. As the network namespace is still experimental and under development clone and unshare support is only made available when CONFIG_NET_NS is selected at compile time. As this patch introduces network namespace support into code paths that exist when the CONFIG_NET is not selected there are a few additions made to net_namespace.h to allow a few more functions to be used when the networking stack is not compiled in. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:46 -07:00
Eric W. Biederman	8b41d1887d	[NET]: Fix running without sysfs When sysfs support is compiled out the kernel still keeps and maintains the kobject tree. So it is not safe to skip our kobject reference counting or to avoid becoming members of the kobject tree. It is safe to not add the networking specific sysfs attributes. This patch removes the sysfs special cases from net/core/dev.c renames functions from netdev_sysfs_xxxx to netdev_kobject_xxxx and always compiles in net-sysfs.c net-sysfs.c is modified with a CONFIG_SYSFS guard around the parts that are actually sysfs specific. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:46 -07:00
Daniel Lezcano	de3cb747ff	[NET]: Dynamically allocate the loopback device, part 1. This patch replaces all occurences to the static variable loopback_dev to a pointer loopback_dev. That provides the mindless, trivial, uninteressting change part for the dynamic allocation for the loopback. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-By: Kirill Korotaev <dev@sw.ru> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:14 -07:00
Joe Perches	0795af5729	[NET]: Introduce and use print_mac() and DECLARE_MAC_BUF() This is nicer than the MAC_FMT stuff. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:42 -07:00
Pavel Emelyanov	768f3591e2	[NETNS]: Cleanup list walking in setup_net and cleanup_net I proposed introducing a list_for_each_entry_continue_reverse macro to be used in setup_net() when unrolling the failed ->init callback. Here is the macro and some more cleanup in the setup_net() itself to remove one variable from the stack :) The same thing is for the cleanup_net() - the existing list_for_each_entry_reverse() is used. Minor, but the code looks nicer. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:35 -07:00
Herbert Xu	52886051ff	[SKBUFF]: Fix up csum_start when head room changes Thanks for noticing the bug where csum_start is not updated when the head room changes. This patch fixes that. It also moves the csum/ip_summed copying into copy_skb_header so that skb_copy_expand gets it too. I've checked its callers and no one should be upset by this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:24 -07:00
Herbert Xu	0cfad07555	[NETLINK]: Avoid pointer in netlink_run_queue I was looking at Patrick's fix to inet_diag and it occured to me that we're using a pointer argument to return values unnecessarily in netlink_run_queue. Changing it to return the value will allow the compiler to generate better code since the value won't have to be memory-backed. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:24 -07:00
Denis V. Lunev	76c72d4f44	[IPV4/IPV6/DECNET]: Small cleanup for fib rules. This patch slightly cleanups FIB rules framework. rules_list as a pointer on struct fib_rules_ops is useless. It is always assigned with a static per/subsystem list in IPv4, IPv6 and DecNet. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:22 -07:00
Pavel Emelyanov	056925ab31	[NET]: Cleanup calling netdev notifiers. The call_netdev_notifiers routine can successfully be used in the net/core_dev.c itself. This will save 6 lines of code and 62 ;) bytes of .text section. 62 is rather small, but I have one more patch saving ~30 bytes from netns code (sent to Eric), so altogether they can save some more noticeable amount. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:21 -07:00
Pavel Emelyanov	30d97d3585	[NETNS]: Consolidate hashes creation in netdev_init() The dev_name_hash and the dev_index_hash are now booth kmalloc-ed (and each element is properly initialized as usually) so I think it's worth consolidating this code making it look nicer (and saving 28 bytes of .text section ;) ) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:21 -07:00
Eric W. Biederman	ad7379d494	[NET]: Fix the prototype of call_netdevice_notifiers. This replaces the void * parameter with a struct net_device * which is what is actually required. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:20 -07:00
Jamal Hadi Salim	22dd749501	[NET]: migrate HARD_TX_LOCK to header file HARD_TX_LOCK micro is a nice aggregation that could be used in other spots. move it to netdevice.h Also makes sure the previously superflous cpu arguement is used. Thanks to DaveM for the suggestions. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:20 -07:00
Jeff Garzik	88d3aafdae	[ETHTOOL] Provide default behaviors for a few ethtool sub-ioctls For the operations get-tx-csum get-sg get-tso get-ufo the default ethtool_op_xxx behavior is fine for all drivers, so we permit op==NULL to imply the default behavior. This provides a more uniform behavior across all drivers, eliminating ethtool(8) "ioctl not supported" errors on older drivers that had not been updated for the latest sub-ioctls. The ethtool_op_xxx() functions are left exported, in case anyone wishes to call them directly from a driver-private implementation -- a not-uncommon case. Should an ethtool_op_xxx() helper remain unused for a while, except by net/core/ethtool.c, we can un-export it at a later date. [ Resolved conflicts with set/get value ethtool patch... -DaveM ] Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:17 -07:00
Eric W. Biederman	077130c0cf	[NET]: Fix race when opening a proc file while a network namespace is exiting. The problem: proc_net files remember which network namespace the are against but do not remember hold a reference count (as that would pin the network namespace). So we currently have a small window where the reference count on a network namespace may be incremented when opening a /proc file when it has already gone to zero. To fix this introduce maybe_get_net and get_proc_net. maybe_get_net increments the network namespace reference count only if it is greater then zero, ensuring we don't increment a reference count after it has gone to zero. get_proc_net handles all of the magic to go from a proc inode to the network namespace instance and call maybe_get_net on it. PROC_NET the old accessor is removed so that we don't get confused and use the wrong helper function. Then I fix up the callers to use get_proc_net and handle the case case where get_proc_net returns NULL. In that case I return -ENXIO because effectively the network namespace has already gone away so the files we are trying to access don't exist anymore. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:22 -07:00
David S. Miller	9d5010db7e	[NET]: Add a might_sleep() to dev_close(). Requested by Johannes Berg. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:15 -07:00
Eric Dumazet	86bba269d0	[PATCH] NET : convert IP route cache garbage collection from softirq processing to a workqueue When the periodic IP route cache flush is done (every 600 seconds on default configuration), some hosts suffer a lot and eventually trigger the "soft lockup" message. dst_run_gc() is doing a scan of a possibly huge list of dst_entries, eventually freeing some (less than 1%) of them, while holding the dst_lock spinlock for the whole scan. Then it rearms a timer to redo the full thing 1/10 s later... The slowdown can last one minute or so, depending on how active are the tcp sessions. This second version of the patch converts the processing from a softirq based one to a workqueue. Even if the list of entries in garbage_list is huge, host is still responsive to softirqs and can make progress. Instead of resetting gc timer to 0.1 second if one entry was freed in a gc run, we do this if more than 10% of entries were freed. Before patch : Aug 16 06:21:37 SRV1 kernel: BUG: soft lockup detected on CPU#0! Aug 16 06:21:37 SRV1 kernel: Aug 16 06:21:37 SRV1 kernel: Call Trace: Aug 16 06:21:37 SRV1 kernel: <IRQ> [<ffffffff802286f0>] wake_up_process+0x10/0x20 Aug 16 06:21:37 SRV1 kernel: [<ffffffff80251e09>] softlockup_tick+0xe9/0x110 Aug 16 06:21:37 SRV1 kernel: [<ffffffff803cd380>] dst_run_gc+0x0/0x140 Aug 16 06:21:37 SRV1 kernel: [<ffffffff802376f3>] run_local_timers+0x13/0x20 Aug 16 06:21:37 SRV1 kernel: [<ffffffff802379c7>] update_process_times+0x57/0x90 Aug 16 06:21:37 SRV1 kernel: [<ffffffff80216034>] smp_local_timer_interrupt+0x34/0x60 Aug 16 06:21:37 SRV1 kernel: [<ffffffff802165cc>] smp_apic_timer_interrupt+0x5c/0x80 Aug 16 06:21:37 SRV1 kernel: [<ffffffff8020a816>] apic_timer_interrupt+0x66/0x70 Aug 16 06:21:37 SRV1 kernel: [<ffffffff803cd3d3>] dst_run_gc+0x53/0x140 Aug 16 06:21:37 SRV1 kernel: [<ffffffff803cd3c6>] dst_run_gc+0x46/0x140 Aug 16 06:21:37 SRV1 kernel: [<ffffffff80237148>] run_timer_softirq+0x148/0x1c0 Aug 16 06:21:37 SRV1 kernel: [<ffffffff8023340c>] __do_softirq+0x6c/0xe0 Aug 16 06:21:37 SRV1 kernel: [<ffffffff8020ad6c>] call_softirq+0x1c/0x30 Aug 16 06:21:37 SRV1 kernel: <EOI> [<ffffffff8020cb34>] do_softirq+0x34/0x90 Aug 16 06:21:37 SRV1 kernel: [<ffffffff802331cf>] local_bh_enable_ip+0x3f/0x60 Aug 16 06:21:37 SRV1 kernel: [<ffffffff80422913>] _spin_unlock_bh+0x13/0x20 Aug 16 06:21:37 SRV1 kernel: [<ffffffff803dfde8>] rt_garbage_collect+0x1d8/0x320 Aug 16 06:21:37 SRV1 kernel: [<ffffffff803cd4dd>] dst_alloc+0x1d/0xa0 Aug 16 06:21:37 SRV1 kernel: [<ffffffff803e1433>] __ip_route_output_key+0x573/0x800 Aug 16 06:21:37 SRV1 kernel: [<ffffffff803c02e2>] sock_common_recvmsg+0x32/0x50 Aug 16 06:21:37 SRV1 kernel: [<ffffffff803e16dc>] ip_route_output_flow+0x1c/0x60 Aug 16 06:21:37 SRV1 kernel: [<ffffffff80400160>] tcp_v4_connect+0x150/0x610 Aug 16 06:21:37 SRV1 kernel: [<ffffffff803ebf07>] inet_bind_bucket_create+0x17/0x60 Aug 16 06:21:37 SRV1 kernel: [<ffffffff8040cd16>] inet_stream_connect+0xa6/0x2c0 Aug 16 06:21:37 SRV1 kernel: [<ffffffff80422981>] _spin_lock_bh+0x11/0x30 Aug 16 06:21:37 SRV1 kernel: [<ffffffff803c0bdf>] lock_sock_nested+0xcf/0xe0 Aug 16 06:21:37 SRV1 kernel: [<ffffffff80422981>] _spin_lock_bh+0x11/0x30 Aug 16 06:21:37 SRV1 kernel: [<ffffffff803be551>] sys_connect+0x71/0xa0 Aug 16 06:21:37 SRV1 kernel: [<ffffffff803eee3f>] tcp_setsockopt+0x1f/0x30 Aug 16 06:21:37 SRV1 kernel: [<ffffffff803c030f>] sock_common_setsockopt+0xf/0x20 Aug 16 06:21:37 SRV1 kernel: [<ffffffff803be4bd>] sys_setsockopt+0x9d/0xc0 Aug 16 06:21:37 SRV1 kernel: [<ffffffff8028881e>] sys_ioctl+0x5e/0x80 Aug 16 06:21:37 SRV1 kernel: [<ffffffff80209c4e>] system_call+0x7e/0x83 After patch : (RT_CACHE_DEBUG set to 2 to get following traces) dst_total: 75469 delayed: 74109 work_perf: 141 expires: 150 elapsed: 8092 us dst_total: 78725 delayed: 73366 work_perf: 743 expires: 400 elapsed: 8542 us dst_total: 86126 delayed: 71844 work_perf: 1522 expires: 775 elapsed: 8849 us dst_total: 100173 delayed: 68791 work_perf: 3053 expires: 1256 elapsed: 9748 us dst_total: 121798 delayed: 64711 work_perf: 4080 expires: 1997 elapsed: 10146 us dst_total: 154522 delayed: 58316 work_perf: 6395 expires: 25 elapsed: 11402 us dst_total: 154957 delayed: 58252 work_perf: 64 expires: 150 elapsed: 6148 us dst_total: 157377 delayed: 57843 work_perf: 409 expires: 400 elapsed: 6350 us dst_total: 163745 delayed: 56679 work_perf: 1164 expires: 775 elapsed: 7051 us dst_total: 176577 delayed: 53965 work_perf: 2714 expires: 1389 elapsed: 8120 us dst_total: 198993 delayed: 49627 work_perf: 4338 expires: 1997 elapsed: 8909 us dst_total: 226638 delayed: 46865 work_perf: 2762 expires: 2748 elapsed: 7351 us I successfully reduced the IP route cache of many hosts by a four factor thanks to this patch. Previously, I had to disable "ip route flush cache" to avoid crashes. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:15 -07:00
David S. Miller	678aa8e4eb	[NET]: #if 0 out net_alloc() for now. We will undo this once it is actually used. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:14 -07:00
Eric W. Biederman	d8a5ec6727	[NET]: netlink support for moving devices between network namespaces. The simplest thing to implement is moving network devices between namespaces. However with the same attribute IFLA_NET_NS_PID we can easily implement creating devices in the destination network namespace as well. However that is a little bit trickier so this patch sticks to what is simple and easy. A pid is used to identify a process that happens to be a member of the network namespace we want to move the network device to. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:13 -07:00
Eric W. Biederman	ce286d3273	[NET]: Implement network device movement between namespaces This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate a network device is local to a single network namespace and should never be moved. Useful for pseudo devices that we need an instance in each network namespace (like the loopback device) and for any device we find that cannot handle multiple network namespaces so we may trap them in the initial network namespace. This patch introduces the function dev_change_net_namespace a function used to move a network device from one network namespace to another. To the network device nothing special appears to happen, to the components of the network stack it appears as if the network device was unregistered in the network namespace it is in, and a new device was registered in the network namespace the device was moved to. This patch sets up a namespace device destructor that upon the exit of a network namespace moves all of the movable network devices to the initial network namespace so they are not lost. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:12 -07:00
Eric W. Biederman	b267b17964	[NET]: Factor out __dev_alloc_name from dev_alloc_name When forcibly changing the network namespace of a device I need something that can generate a name for the device in the new namespace without overwriting the old name. __dev_alloc_name provides me that functionality. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:11 -07:00
Eric W. Biederman	881d966b48	[NET]: Make the device list and device lookups per namespace. This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:10 -07:00
Eric W. Biederman	b4b510290b	[NET]: Support multiple network namespaces with netlink Each netlink socket will live in exactly one network namespace, this includes the controlling kernel sockets. This patch updates all of the existing netlink protocols to only support the initial network namespace. Request by clients in other namespaces will get -ECONREFUSED. As they would if the kernel did not have the support for that netlink protocol compiled in. As each netlink protocol is updated to be multiple network namespace safe it can register multiple kernel sockets to acquire a presence in the rest of the network namespaces. The implementation in af_netlink is a simple filter implementation at hash table insertion and hash table look up time. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:09 -07:00
Eric W. Biederman	e9dc865340	[NET]: Make device event notification network namespace safe Every user of the network device notifiers is either a protocol stack or a pseudo device. If a protocol stack that does not have support for multiple network namespaces receives an event for a device that is not in the initial network namespace it quite possibly can get confused and do the wrong thing. To avoid problems until all of the protocol stacks are converted this patch modifies all netdev event handlers to ignore events on devices that are not in the initial network namespace. As the rest of the code is made network namespace aware these checks can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:09 -07:00
Eric W. Biederman	6d34b1c27a	[NET]: Initialize the network namespace of network devices. Except for carefully selected pseudo devices all network interfaces should start out in the initial network namespace. Ultimately it will be register_netdev that examines what dev->nd_net is set to and places a device in a network namespace. This patch modifies alloc_netdev to initialize the network namespace a device is in with the initial network namespace. This gets it right for the vast majority of devices so their drivers need not be modified and for those few pseudo devices that need something different they can change this parameter before calling register_netdevice. The network namespace parameter on a network device is not reference counted as the devices are inside of a network namespace and cannot remain in that namespace past the lifetime of the network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:07 -07:00
Eric W. Biederman	1b8d7ae42d	[NET]: Make socket creation namespace safe. This patch passes in the namespace a new socket should be created in and has the socket code do the appropriate reference counting. By virtue of this all socket create methods are touched. In addition the socket create methods are modified so that they will fail if you attempt to create a socket in a non-default network namespace. Failing if we attempt to create a socket outside of the default network namespace ensures that as we incrementally make the network stack network namespace aware we will not export functionality that someone has not audited and made certain is network namespace safe. Allowing us to partially enable network namespaces before all of the exotic protocols are supported. Any protocol layers I have missed will fail to compile because I now pass an extra parameter into the socket creation code. [ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:07 -07:00
Eric W. Biederman	457c4cbc5a	[NET]: Make /proc/net per network namespace This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:06 -07:00
Eric W. Biederman	5f256becd8	[NET]: Basic network namespace infrastructure. This is the basic infrastructure needed to support network namespaces. This infrastructure is: - Registration functions to support initializing per network namespace data when a network namespaces is created or destroyed. - struct net. The network namespace data structure. This structure will grow as variables are made per network namespace but this is the minimal starting point. - Functions to grab a reference to the network namespace. I provide both get/put functions that keep a network namespace from being freed. And hold/release functions serve as weak references and will warn if their count is not zero when the data structure is freed. Useful for dealing with more complicated data structures like the ipv4 route cache. - A list of all of the network namespaces so we can iterate over them. - A slab for the network namespace data structure allowing leaks to be spotted. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:03 -07:00
John Heffner	d2e9117c7a	[NET]: Change type of owner in sock_lock_t to int, rename The type of owner in sock_lock_t is currently (struct sock_iocb ), presumably for historical reasons. It is never used as this type, only tested as NULL or set to (void )1. For clarity, this changes it to type int, and renames to owned, to avoid any possible type casting errors. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:01 -07:00
Robert Olsson	b163911f8a	[PKTGEN]: Remove softirq scheduling. It's not a job for pktgen. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:36 -07:00
Robert Olsson	45b270f880	[PKTGEN]: Multiqueue support. Below some pktgen support to send into different TX queues. This can of course be feed into input queues on other machines Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:35 -07:00
Jeff Garzik	13c99b248f	[ETHTOOL]: Internal cleanup of ethtool_value-related handlers Several get/set functions can be handled by a passing the ethtool_op function pointer directly to a generic function. This permits deletion of a fair bit of redundant code. Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:09 -07:00
Jeff Garzik	339bf02475	[ETHTOOL]: Introduce ->{get,set}_priv_flags, ETHTOOL_[GS]PFLAGS Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:08 -07:00
Jeff Garzik	ff03d49f0c	[ETHTOOL]: Introduce get_sset_count. Obsolete get_stats_count, self_test_count Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:08 -07:00
Jeff Garzik	3ae7c0b2e3	[ETHTOOL]: Add ETHTOOL_[GS]FLAGS sub-ioctls Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:07 -07:00
Satyam Sharma	0bcc181618	[NET] netconsole: Support dynamic reconfiguration using configfs Based upon initial work by Keiichi Kii <k-keiichi@bx.jp.nec.com>. This patch introduces support for dynamic reconfiguration (adding, removing and/or modifying parameters of netconsole targets at runtime) using a userspace interface exported via configfs. Documentation is also updated accordingly. Issues and brief design overview: (1) Kernel-initiated creation / destruction of kernel objects is not possible with configfs -- the lifetimes of the "config items" is managed exclusively from userspace. But netconsole must support boot/module params too, and these are parsed in kernel and hence netpolls must be setup from the kernel. Joel Becker suggested to separately manage the lifetimes of the two kinds of netconsole_target objects -- those created via configfs mkdir(2) from userspace and those specified from the boot/module option string. This adds complexity and some redundancy here and also means that boot/module param-created targets are not exposed through the configfs namespace (and hence cannot be updated / destroyed dynamically). However, this saves us from locking / refcounting complexities that would need to be introduced in configfs to support kernel-initiated item creation / destroy there. (2) In configfs, item creation takes place in the call chain of the mkdir(2) syscall in the driver subsystem. If we used an ioctl(2) to create / destroy objects from userspace, the special userspace program is able to fill out the structure to be passed into the ioctl and hence specify attributes such as local interface that are required at the time we set up the netpoll. For configfs, this information is not available at the time of mkdir(2). So, we keep all newly-created targets (via configfs) disabled by default. The user is expected to set various attributes appropriately (including the local network interface if required) and then write(2) "1" to the "enabled" attribute. Thus, netpoll_setup() is then called on the set parameters in the context of _this_ write(2) on the "enabled" attribute itself. This design enables the user to reconfigure existing netconsole targets at runtime to be attached to newly-come-up interfaces that may not have existed when netconsole was loaded or when the targets were actually created. All this effectively enables us to get rid of custom ioctls. (3) Ultra-paranoid configfs attribute show() and store() operations, with sanity and input range checking, using only safe string primitives, and compliant with the recommendations in Documentation/filesystems/sysfs.txt. (4) A new function netpoll_print_options() is created in the netpoll API, that just prints out the configured parameters for a netpoll structure. netpoll_parse_options() is modified to use that and it is also exported to be used from netconsole. Signed-off-by: Satyam Sharma <satyam@infradead.org> Acked-by: Keiichi Kii <k-keiichi@bx.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:06 -07:00
Thomas Graf	d961db358f	[NEIGH]: Netlink notifications Currently neighbour event notifications are limited to update notifications and only sent if the ARP daemon is enabled. This patch extends the existing notification code by also reporting neighbours being removed due to gc or administratively and removes the dependency on the ARP daemon. This allows to keep track of neighbour states without periodically fetching the complete neighbour table. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:49 -07:00
Thomas Graf	4f494554f9	[NEIGH]: Combine neighbour cleanup and release Introduces neigh_cleanup_and_release() to be used after a neighbour has been removed from its neighbour table. Serves as preparation to add event notifications. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:48 -07:00
Pavel Emelianov	e71992889e	[RTNETLINK]: Introduce generic rtnl_create_link(). This routine gets the parsed rtnl attributes and creates a new link with generic info (IFLA_LINKINFO policy). Its intention is to help the drivers, that need to create several links at once (like VETH). This is nothing but a copy-paste-ed part of rtnl_newlink() function that is responsible for creation of new device. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:45 -07:00
Stephen Hemminger	bea3348eef	[NET]: Make NAPI polling independent of struct net_device objects. Several devices have multiple independant RX queues per net device, and some have a single interrupt doorbell for several queues. In either case, it's easier to support layouts like that if the structure representing the poll is independant from the net device itself. The signature of the ->poll() call back goes from: int foo_poll(struct net_device dev, int budget) to int foo_poll(struct napi_struct napi, int budget) The caller is returned the number of RX packets processed (or the number of "NAPI credits" consumed if you want to get abstract). The callee no longer messes around bumping dev->quota, budget, etc. because that is all handled in the caller upon return. The napi_struct is to be embedded in the device driver private data structures. Furthermore, it is the driver's responsibility to disable all NAPI instances in it's ->stop() device close handler. Since the napi_struct is privatized into the driver's private data structures, only the driver knows how to get at all of the napi_struct instances it may have per-device. With lots of help and suggestions from Rusty Russell, Roland Dreier, Michael Chan, Jeff Garzik, and Jamal Hadi Salim. Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra, Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan. [ Ported to current tree and all drivers converted. Integrated Stephen's follow-on kerneldoc additions, and restored poll_list handling to the old style to fix mutual exclusion issues. -DaveM ] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:45 -07:00
Adit Ranadive	ce5d0b47f1	[PKTGEN]: srcmac fix From: Adit Ranadive <adit.262@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-16 14:52:15 -07:00
David S. Miller	4878809f71	[NET]: Fix two issues wrt. SO_BINDTODEVICE. 1) Comments suggest that setting optlen to zero will unbind the socket from whatever device it might be attached to. This hasn't been the case since at least 2.2.x because the first thing this function does is return -EINVAL if 'optlen' is less than sizeof(int). This check also means that passing in a two byte string doesn't work so well. It's almost as if this code was testing with "eth?" patterned strings and nothing else :-) Fix this by breaking the logic of this facility out into a seperate function which validates optlen more appropriately. The optlen==0 and small string cases now work properly. 2) We should reset the cached route of the socket after we have made the device binding changes, not before. Reported by Ben Greear. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-14 16:41:03 -07:00
Herbert Xu	ef8aef55ce	[NET]: Do not dereference iov if length is zero When msg_iovlen is zero we shouldn't try to dereference msg_iov. Right now the only thing that tries to do so is skb_copy_and_csum_datagram_iovec. Since the total length should also be zero if msg_iovlen is zero, it's sufficient to check the total length there and simply return if it's zero. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-11 10:29:07 +02:00

1 2 3 4 5 ...

696 Commits (1c1e87edb9aa38b9e1ae807a6d88fcfd350a55c0)