Commit graph

20729 commits

Author SHA1 Message Date
Eric Dumazet 0ad92ad03a udp: fix a race in encap_rcv handling
udp_queue_rcv_skb() has a possible race in encap_rcv handling, since
this pointer can be changed anytime.

We should use ACCESS_ONCE() to close the race.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-02 00:51:27 -04:00
Dave Jones 501e89d3ae x25: Fix NULL dereference in x25_recvmsg
commit cb101ed2 in 3.0 introduced a bug in x25_recvmsg()
When passed bogus junk from userspace, x25->neighbour can be NULL,
as shown in this oops..

BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
IP: [<ffffffffa05482bd>] x25_recvmsg+0x4d/0x280 [x25]
PGD 1015f3067 PUD 105072067 PMD 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU 0
Pid: 27928, comm: iknowthis Not tainted 3.1.0+ #2 Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H
RIP: 0010:[<ffffffffa05482bd>]  [<ffffffffa05482bd>] x25_recvmsg+0x4d/0x280 [x25]
RSP: 0018:ffff88010c0b7cc8  EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88010c0b7d78 RCX: 0000000000000c02
RDX: ffff88010c0b7d78 RSI: ffff88011c93dc00 RDI: ffff880103f667b0
RBP: ffff88010c0b7d18 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880103f667b0
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f479ce7f700(0000) GS:ffff88012a600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000000001c CR3: 000000010529e000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process iknowthis (pid: 27928, threadinfo ffff88010c0b6000, task ffff880103faa4f0)
Stack:
 0000000000000c02 0000000000000c02 ffff88010c0b7d18 ffffff958153cb37
 ffffffff8153cb60 0000000000000c02 ffff88011c93dc00 0000000000000000
 0000000000000c02 ffff88010c0b7e10 ffff88010c0b7de8 ffffffff815372c2
Call Trace:
 [<ffffffff8153cb60>] ? sock_update_classid+0xb0/0x180
 [<ffffffff815372c2>] sock_aio_read.part.10+0x142/0x150
 [<ffffffff812d6752>] ? inode_has_perm+0x62/0xa0
 [<ffffffff815372fd>] sock_aio_read+0x2d/0x40
 [<ffffffff811b05e2>] do_sync_read+0xd2/0x110
 [<ffffffff812d3796>] ? security_file_permission+0x96/0xb0
 [<ffffffff811b0a91>] ? rw_verify_area+0x61/0x100
 [<ffffffff811b103d>] vfs_read+0x16d/0x180
 [<ffffffff811b109d>] sys_read+0x4d/0x90
 [<ffffffff81657282>] system_call_fastpath+0x16/0x1b
Code: 8b 66 20 4c 8b 32 48 89 d3 48 89 4d b8 45 89 c7 c7 45 cc 95 ff ff ff 4d 85 e4 0f 84 ed 01 00 00 49 8b 84 24 18 05 00 00 4c 89 e7
 78 1c 01 45 19 ed 31 f6 e8 d5 37 ff e0 41 0f b6 44 24 0e 41

Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-02 00:49:49 -04:00
Arjan van de Ven 73cb88ecb9 net: make the tcp and udp file_operations for the /proc stuff const
the tcp and udp code creates a set of struct file_operations at runtime
while it can also be done at compile time, with the added benefit of then
having these file operations be const.

the trickiest part was to get the "THIS_MODULE" reference right; the naive
method of declaring a struct in the place of registration would not work
for this reason.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-01 17:56:14 -04:00
Matthijs Kooijman deede2fabe vlan: Don't propagate flag changes on down interfaces.
When (de)configuring a vlan interface, the IFF_ALLMULTI ans IFF_PROMISC
flags are cleared or set on the underlying interface. So, if these flags
are changed on a vlan interface that is not up, the flags underlying
interface might be set or cleared twice.

Only propagating flag changes when a device is up makes sure this does
not happen. It also makes sure that an underlying device is not set to
promiscuous or allmulti mode for a vlan device that is down.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-01 17:51:03 -04:00
David S. Miller 045f7b3b00 neigh: Kill bogus SMP protected debugging message.
Whatever situations make this state legitimate when SMP
also would be legitimate when !SMP and f.e. preemption is
enabled.

This is dubious enough that we should just delete it entirely.  If we
want to add debugging for neigh timer races, better more thorough
mechanisms are needed.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-01 17:45:55 -04:00
Florian Westphal 563e123264 netfilter: do not propagate nf_queue errors in nf_hook_slow
commit f158508618
(netfilter: nfnetlink_queue: return error number to caller)
erronously assigns the return value of nf_queue() to the "ret" value.

This can cause bogus return values if we encounter QUEUE verdict
when bypassing is enabled, the listener does not exist and the
next hook returns NF_STOLEN.

In this case nf_hook_slow returned -ESRCH instead of 0.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-11-01 09:57:21 +01:00
Florian Westphal 2dad81adf2 netfilter: ipv6: fix afinfo->route refcnt leak on error
Several callers (h323 conntrack, xt_addrtype) assume that the
returned **dst only needs to be released if the function returns 0.

This is true for the ipv4 implementation, but not for the ipv6 one.

Instead of changing the users, change the ipv6 implementation
to behave like the ipv4 version by only providing the dst_entry result
in the success case.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-11-01 09:20:07 +01:00
Krzysztof Wilczynski ad542ced25 ipvs: Remove unused variable "cs" from ip_vs_leave function.
This is to address the following warning during compilation time:

  net/netfilter/ipvs/ip_vs_core.c: In function ‘ip_vs_leave’:
  net/netfilter/ipvs/ip_vs_core.c:532: warning: unused variable ‘cs’

This variable is indeed no longer in use.

Signed-off-by: Krzysztof Wilczynski <krzysztof.wilczynski@linux.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-11-01 09:19:57 +01:00
Joe Perches 0a9ee81349 netfilter: Remove unnecessary OOM logging messages
Site specific OOM messages are duplications of a generic MM
out of memory message and aren't really useful, so just
delete them.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-11-01 09:19:49 +01:00
Simon Horman b6338b55bd ipvs: Removed unused variables
ipvs is not used in ip_vs_genl_set_cmd() or ip_vs_genl_get_cmd()

Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-11-01 09:19:37 +01:00
Simon Horman 4a516f1108 ipvs: Remove unused return value of protocol state transitions
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-11-01 09:19:33 +01:00
Simon Horman 3c2de2ae02 ipvs: Remove unused parameter from ip_vs_confirm_conntrack()
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-11-01 09:19:29 +01:00
Krzysztof Wilczynski 52669dfa83 ipvs: Expose ip_vs_ftp module parameters via sysfs.
This is to expose "ports" parameter via sysfs so it can be read
at any time in order to determine what port or ports were passed
to the module at the point when it was loaded.

Signed-off-by: Krzysztof Wilczynski <krzysztof.wilczynski@linux.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-11-01 09:19:21 +01:00
Joe Perches b9075fa968 treewide: use __printf not __attribute__((format(printf,...)))
Standardize the style for compiler based printf format verification.
Standardized the location of __printf too.

Done via script and a little typing.

$ grep -rPl --include=*.[ch] -w "__attribute__" * | \
  grep -vP "^(tools|scripts|include/linux/compiler-gcc.h)" | \
  xargs perl -n -i -e 'local $/; while (<>) { s/\b__attribute__\s*\(\s*\(\s*format\s*\(\s*printf\s*,\s*(.+)\s*,\s*(.+)\s*\)\s*\)\s*\)/__printf($1, $2)/g ; print; }'

[akpm@linux-foundation.org: revert arch bits]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:54 -07:00
Paul Gortmaker 79bb1ee46a net: fix implicit kmod.h usage in bridge/br_stp_if.c
To fix this, once the implicit presence of module.h is removed:

net/bridge/br_stp_if.c: In function ‘br_stp_start’:
net/bridge/br_stp_if.c:131: error: implicit declaration of function ‘call_usermodehelper’
net/bridge/br_stp_if.c:131: error: ‘UMH_WAIT_PROC’ undeclared (first use in this function)

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:30 -04:00
Paul Gortmaker bc3b2d7fb9 net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules
These files are non modular, but need to export symbols using
the macros now living in export.h -- call out the include so
that things won't break when we remove the implicit presence
of module.h from everywhere.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:30 -04:00
Paul Gortmaker d9b9384215 net: add moduleparam.h for users of module_param/MODULE_PARM_DESC
These files were getting access to these two via the implicit
presence of module.h everywhere.  They aren't modules, so they
don't need the full module.h inclusion though.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:29 -04:00
Paul Gortmaker 3a9a231d97 net: Fix files explicitly needing to include module.h
With calls to modular infrastructure, these files really
needs the full module.h header.  Call it out so some of the
cleanups of implicit and unrequired includes elsewhere can be
cleaned up.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:28 -04:00
Linus Torvalds 1a4ceab195 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
  vlan: allow nested vlan_do_receive()
  ipv6: fix route lookup in addrconf_prefix_rcv()
  bonding: eliminate bond_close race conditions
  qlcnic: fix beacon and LED test.
  qlcnic: Updated License file
  qlcnic: updated reset sequence
  qlcnic: reset loopback mode if promiscous mode setting fails.
  qlcnic: skip IDC ack check in fw reset path.
  i825xx: Fix incorrect dependency for BVME6000_NET
  ipv6: fix route error binding peer in func icmp6_dst_alloc
  ipv6: fix error propagation in ip6_ufo_append_data()
  stmmac: update normal descriptor structure (v2)
  stmmac: fix NULL pointer dereference in capabilities fixup (v2)
  stmmac: fix a bug while checking the HW cap reg (v2)
  be2net: Changing MAC Address of a VF was broken.
  be2net: Refactored be_cmds.c file.
  bnx2x: update driver version to 1.70.30-0
  bnx2x: use FW 7.0.29.0
  bnx2x: Enable changing speed when port type is PORT_DA
  bnx2x: Fix 54618se LED behavior
  ...
2011-10-31 15:22:44 -07:00
Johan Hedberg dafbde395e Bluetooth: Set HCI_MGMT flag only in read_controller_info
The HCI_MGMT flag should only be set when user space requests the full
controller information. This way we avoid potential issues with setting
change events ariving before the actual read_controller_info command
finishes.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-10-31 17:31:02 -02:00
Szymon Janc e1b6eb3ccb Bluetooth: Increase HCI reset timeout in hci_dev_do_close
I've noticed that my CSR usb dongle was not working if it was plugged in when
PC was booting. It looks like I get two HCI reset command complete events (see
hcidump logs below).
The root cause is reset called from off_timer. Timeout for this reset to
complete is set to 250ms and my bt dongle requires more time for replying with
command complete event. After that, chip seems to reply with reset command
complete event for next non-reset command.

Attached patch increase mentioned timeout to HCI_INIT_TIMEOUT, this value is
already used for timeouting hci_reset_req in hci_dev_reset().

This might also be related to BT not working after suspend that was reported
here some time ago.

Hcidump log:

2011-09-12 23:13:27.379465 < HCI Command: Reset (0x03|0x0003) plen 0
2011-09-12 23:13:27.380797 > HCI Event: Command Complete (0x0e) plen 4
    Reset (0x03|0x0003) ncmd 1
    status 0x00
2011-09-12 23:13:27.380859 < HCI Command: Read Local Supported Features (0x04|0x000
3) plen 0
2011-09-12 23:13:27.760789 > HCI Event: Command Complete (0x0e) plen 4
    Reset (0x03|0x0003) ncmd 1
    status 0x00
2011-09-12 23:13:27.760831 < HCI Command: Read Local Version Information (0x04|0x00
01) plen 0
2011-09-12 23:13:27.764780 > HCI Event: Command Complete (0x0e) plen 12
    Read Local Version Information (0x04|0x0001) ncmd 1
    status 0x00
    HCI Version: 1.1 (0x1) HCI Revision: 0x36f
    LMP Version: 1.1 (0x1) LMP Subversion: 0x36f
    Manufacturer: Cambridge Silicon Radio (10)

Signed-off-by: Szymon Janc <szymon@janc.net.pl>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-10-31 17:10:18 -02:00
Eric Dumazet 6a32e4f9dd vlan: allow nested vlan_do_receive()
commit 2425717b27 (net: allow vlan traffic to be received under bond)
broke ARP processing on vlan on top of bonding.

       +-------+
eth0 --| bond0 |---bond0.103
eth1 --|       |
       +-------+

52870.115435: skb_gro_reset_offset <-napi_gro_receive
52870.115435: dev_gro_receive <-napi_gro_receive
52870.115435: napi_skb_finish <-napi_gro_receive
52870.115435: netif_receive_skb <-napi_skb_finish
52870.115435: get_rps_cpu <-netif_receive_skb
52870.115435: __netif_receive_skb <-netif_receive_skb
52870.115436: vlan_do_receive <-__netif_receive_skb
52870.115436: bond_handle_frame <-__netif_receive_skb
52870.115436: vlan_do_receive <-__netif_receive_skb
52870.115436: arp_rcv <-__netif_receive_skb
52870.115436: kfree_skb <-arp_rcv

Packet is dropped in arp_rcv() because its pkt_type was set to
PACKET_OTHERHOST in the first vlan_do_receive() call, since no eth0.103
exists.

We really need to change pkt_type only if no more rx_handler is about to
be called for the packet.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-30 04:43:30 -04:00
Andreas Hofmeister 14ef37b6d0 ipv6: fix route lookup in addrconf_prefix_rcv()
The route lookup to find a previously auto-configured route for a prefixes used
to use rt6_lookup(), with the prefix from the RA used as an address. However,
that kind of lookup ignores routing tables, the prefix length and route flags,
so when there were other matching routes, even in different tables and/or with
a different prefix length, the wrong route would be manipulated.

Now, a new function "addrconf_get_prefix_route()" is used for the route lookup,
which searches in RT6_TABLE_PREFIX and takes the prefix-length and route flags
into account.

Signed-off-by: Andreas Hofmeister <andi@collax.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-30 04:12:36 -04:00
David S. Miller 9eeebb5bc8 Merge branch 'batman-adv/maint' of git://git.open-mesh.org/linux-merge 2011-10-30 03:05:07 -04:00
Linus Torvalds 97d2eb13a0 Merge branch 'for-linus' of git://ceph.newdream.net/git/ceph-client
* 'for-linus' of git://ceph.newdream.net/git/ceph-client:
  libceph: fix double-free of page vector
  ceph: fix 32-bit ino numbers
  libceph: force resend of osd requests if we skip an osdmap
  ceph: use kernel DNS resolver
  ceph: fix ceph_monc_init memory leak
  ceph: let the set_layout ioctl set single traits
  Revert "ceph: don't truncate dirty pages in invalidate work thread"
  ceph: replace leading spaces with tabs
  libceph: warn on msg allocation failures
  libceph: don't complain on msgpool alloc failures
  libceph: always preallocate mon connection
  libceph: create messenger with client
  ceph: document ioctls
  ceph: implement (optional) max read size
  ceph: rename rsize -> rasize
  ceph: make readpages fully async
2011-10-28 16:42:18 -07:00
Gao feng 7011687f0f ipv6: fix route error binding peer in func icmp6_dst_alloc
in func icmp6_dst_alloc,dst_metric_set call ipv6_cow_metrics to set metric.
ipv6_cow_metrics may will call rt6_bind_peer to set rt6_info->rt6i_peer.
So,we should move ipv6_addr_copy before dst_metric_set to make sure rt6_bind_peer success.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-28 16:36:07 -04:00
Zheng Yan 504744e4ed ipv6: fix error propagation in ip6_ufo_append_data()
We should return errcode from sock_alloc_send_skb()

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-28 00:26:00 -04:00
Eric Dumazet b903d324be ipv6: tcp: fix TCLASS value in ACK messages sent from TIME_WAIT
commit 66b13d99d9 (ipv4: tcp: fix TOS value in ACK messages sent from
TIME_WAIT) fixed IPv4 only.

This part is for the IPv6 side, adding a tclass param to ip6_xmit()

We alias tw_tclass and tw_tos, if socket family is INET6.

[ if sockets is ipv4-mapped, only IP_TOS socket option is used to fill
TOS field, TCLASS is not taken into account ]

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-27 00:44:35 -04:00
Linus Torvalds 37d96c28ec Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  caif: Fix BUG() with network namespaces
  net: make bonding slaves honour master's skb->priority
  net: Unlock sock before calling sk_free()
2011-10-26 16:08:52 +02:00
Linus Torvalds efb8d21b2c Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
* 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (79 commits)
  TTY: serial_core: Fix crash if DCD drop during suspend
  tty/serial: atmel_serial: bootconsole removed from auto-enumerates
  Revert "TTY: call tty_driver_lookup_tty unconditionally"
  tty/serial: atmel_serial: add device tree support
  tty/serial: atmel_serial: auto-enumerate ports
  tty/serial: atmel_serial: whitespace and braces modifications
  tty/serial: atmel_serial: change platform_data variable name
  tty/serial: RS485 bindings for device tree
  TTY: call tty_driver_lookup_tty unconditionally
  TTY: pty, release tty in all ptmx_open fail paths
  TTY: make tty_add_file non-failing
  TTY: drop driver reference in tty_open fail path
  8250_pci: Fix kernel panic when pch_uart is disabled
  h8300: drivers/serial/Kconfig was moved
  parport_pc: release IO region properly if unsupported ITE887x card is found
  tty: Support compat_ioctl get/set termios_locked
  hvc_console: display printk messages on console.
  TTY: snyclinkmp: forever loop in tx_load_dma_buffer()
  tty/n_gsm: avoid fifo overflow in gsm_dlci_data_output
  tty/n_gsm: fix a bug in gsm_dlci_data_output (adaption = 2 case)
  ...

Fix up Conflicts in:
 - drivers/tty/serial/8250_pci.c
	Trivial conflict with removed duplicate device ID
 - drivers/tty/serial/atmel_serial.c
	Annoying silly conflict between "specify the port num via
	platform_data" and other changes to atmel_console_init
2011-10-26 15:11:09 +02:00
Linus Torvalds e33bae14fd Merge branch 'for-linus' of git://github.com/ericvh/linux
* 'for-linus' of git://github.com/ericvh/linux:
  9p: fix 9p.txt to advertise msize instead of maxdata
  net/9p: Convert net/9p protocol dumps to tracepoints
  fs/9p: change an int to unsigned int
  fs/9p: Cleanup option parsing in 9p
  9p: move dereference after NULL check
  fs/9p: inode file operation is properly initialized init_special_inode
  fs/9p: Update zero-copy implementation in 9p
2011-10-26 14:20:53 +02:00
David Woodhouse 08613e4626 caif: Fix BUG() with network namespaces
The caif code will register its own pernet_operations, and then register
a netdevice_notifier. Each time the netdevice_notifier is triggered,
it'll do some stuff... including a lookup of its own pernet stuff with
net_generic().

If the net_generic() call ever returns NULL, the caif code will BUG().
That doesn't seem *so* unreasonable, I suppose — it does seem like it
should never happen.

However, it *does* happen. When we clone a network namespace,
setup_net() runs through all the pernet_operations one at a time. It
gets to loopback before it gets to caif. And loopback_net_init()
registers a netdevice... while caif hasn't been initialised. So the caif
netdevice notifier triggers, and immediately goes BUG().

We could imagine a complex and overengineered solution to this generic
class of problems, but this patch takes the simple approach. It just
makes caif_device_notify() *not* go looking for its pernet data
structures if the device it's being notified about isn't a caif device
in the first place.

Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-25 19:22:23 -04:00
Thomas Gleixner b0691c8ee7 net: Unlock sock before calling sk_free()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-25 19:17:25 -04:00
Sage Weil 38d6453ca3 libceph: force resend of osd requests if we skip an osdmap
If we skip over one or more map epochs, we need to resend all osd requests
because it is possible they remapped to other servers and then back.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:17 -07:00
Noah Watkins ee3b56f265 ceph: use kernel DNS resolver
Change ceph_parse_ips to take either names given as
IP addresses or standard hostnames (e.g. localhost).
The DNS lookup is done using the dns_resolver facility
similar to its use in AFS, NFS, and CIFS.

This patch defines CONFIG_CEPH_LIB_USE_DNS_RESOLVER
that controls if this feature is on or off.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:16 -07:00
Noah Watkins 49d9224c04 ceph: fix ceph_monc_init memory leak
failure clean up does not consider ceph_auth_init.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:16 -07:00
Sage Weil f0ed1b7cef libceph: warn on msg allocation failures
Any non-masked msg allocation failure should generate a warning and stack
trace to the console.  All of these need to eventually be replaced by
safe preallocation or msgpools.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:16 -07:00
Sage Weil b61c27636f libceph: don't complain on msgpool alloc failures
The pool allocation failures are masked by the pool; there is no need to
spam the console about them.  (That's the whole point of having the pool
in the first place.)

Mark msg allocations whose failure is safely handled as such.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:15 -07:00
Sage Weil f6a2f5be07 libceph: always preallocate mon connection
Allocate the mon connection on init.  We already reuse it across
reconnects.  Remove now unnecessary (and incomplete) NULL checks.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:15 -07:00
Sage Weil 6ab00d465a libceph: create messenger with client
This simplifies the init/shutdown paths, and makes client->msgr available
during the rest of the setup process.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:15 -07:00
Linus Torvalds ef78cc75f1 Merge branch 'nfs-for-3.2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
* 'nfs-for-3.2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (26 commits)
  Check validity of cl_rpcclient in nfs_server_list_show
  NFS: Get rid of the nfs_rdata_mempool
  NFS: Don't rely on PageError in nfs_readpage_release_partial
  NFS: Get rid of unnecessary calls to ClearPageError() in read code
  NFS: Get rid of nfs_restart_rpc()
  NFS: Get rid of the unused nfs_write_data->flags field
  NFS: Get rid of the unused nfs_read_data->flags field
  NFSv4: Translate NFS4ERR_BADNAME into ENOENT when applied to a lookup
  NFS: Remove the unused "lookupfh()" version of nfs4_proc_lookup()
  NFS: Use the inode->i_version to cache NFSv4 change attribute information
  SUNRPC: Remove unnecessary export of rpc_sockaddr2uaddr
  SUNRPC: Fix rpc_sockaddr2uaddr
  nfs/super.c: local functions should be static
  pnfsblock: fix writeback deadlock
  pnfsblock: fix NULL pointer dereference
  pnfs: recoalesce when ld read pagelist fails
  pnfs: recoalesce when ld write pagelist fails
  pnfs: make _set_lo_fail generic
  pnfsblock: add missing rpc_put_mount and path_put
  SUNRPC/NFS: make rpc pipe upcall generic
  ...
2011-10-25 15:44:06 +02:00
Linus Torvalds 1442d1678c Merge branch 'for-3.2' of git://linux-nfs.org/~bfields/linux
* 'for-3.2' of git://linux-nfs.org/~bfields/linux: (103 commits)
  nfs41: implement DESTROY_CLIENTID operation
  nfsd4: typo logical vs bitwise negate for want_mask
  nfsd4: allow NFS4_SHARE_SIGNAL_DELEG_WHEN_RESRC_AVAIL | NFS4_SHARE_PUSH_DELEG_WHEN_UNCONTENDED
  nfsd4: seq->status_flags may be used unitialized
  nfsd41: use SEQ4_STATUS_BACKCHANNEL_FAULT when cb_sequence is invalid
  nfsd4: implement new 4.1 open reclaim types
  nfsd4: remove unneeded CLAIM_DELEGATE_CUR workaround
  nfsd4: warn on open failure after create
  nfsd4: preallocate open stateid in process_open1()
  nfsd4: do idr preallocation with stateid allocation
  nfsd4: preallocate nfs4_file in process_open1()
  nfsd4: clean up open owners on OPEN failure
  nfsd4: simplify process_open1 logic
  nfsd4: make is_open_owner boolean
  nfsd4: centralize renew_client() calls
  nfsd4: typo logical vs bitwise negate
  nfs: fix bug about IPv6 address scope checking
  nfsd4: more robust ignoring of WANT bits in OPEN
  nfsd4: move name-length checks to xdr
  nfsd4: move access/deny validity checks to xdr code
  ...
2011-10-25 15:42:01 +02:00
Linus Torvalds 7e0bb71e75 Merge branch 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
* 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (63 commits)
  PM / Clocks: Remove redundant NULL checks before kfree()
  PM / Documentation: Update docs about suspend and CPU hotplug
  ACPI / PM: Add Sony VGN-FW21E to nonvs blacklist.
  ARM: mach-shmobile: sh7372 A4R support (v4)
  ARM: mach-shmobile: sh7372 A3SP support (v4)
  PM / Sleep: Mark devices involved in wakeup signaling during suspend
  PM / Hibernate: Improve performance of LZO/plain hibernation, checksum image
  PM / Hibernate: Do not initialize static and extern variables to 0
  PM / Freezer: Make fake_signal_wake_up() wake TASK_KILLABLE tasks too
  PM / Hibernate: Add resumedelay kernel param in addition to resumewait
  MAINTAINERS: Update linux-pm list address
  PM / ACPI: Blacklist Vaio VGN-FW520F machine known to require acpi_sleep=nonvs
  PM / ACPI: Blacklist Sony Vaio known to require acpi_sleep=nonvs
  PM / Hibernate: Add resumewait param to support MMC-like devices as resume file
  PM / Hibernate: Fix typo in a kerneldoc comment
  PM / Hibernate: Freeze kernel threads after preallocating memory
  PM: Update the policy on default wakeup settings
  PM / VT: Cleanup #if defined uglyness and fix compile error
  PM / Suspend: Off by one in pm_suspend()
  PM / Hibernate: Include storage keys in hibernation image on s390
  ...
2011-10-25 15:18:39 +02:00
Antonio Quartulli 93840ac40b batman-adv: unify hash_entry field position in tt_local/global_entry
Function tt_response_fill_table() actually uses a tt_local_entry pointer to
iterate either over the local or the global table entries (it depends on the
what hash table is passed as argument). To iterate over such entries the
hlist_for_each_entry_rcu() macro has to access their "hash_entry" field which
MUST be at the same position in both the tt_global/local_entry structures.

Reported-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-10-25 14:23:05 +02:00
Simon Wunderlich 6e8014947d batman-adv: add sanity check when removing global tts
After removing the batman-adv module, the hash may be already gone
when tt_global_del_orig() tries to clean the hash. This patch adds
a sanity check to avoid this.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Tested-by: Alexey Fisher <bug-track@fisher-privat.net>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-10-25 14:23:04 +02:00
Simon Wunderlich 531027fcdd batman-adv: remove references for global tt entries
struct tt_global_entry holds a reference to an orig_node which must be
decremented before deallocating the structure.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Tested-by: Alexey Fisher <bug-track@fisher-privat.net>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-10-25 14:23:04 +02:00
Linus Torvalds 8a9ea3237e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1745 commits)
  dp83640: free packet queues on remove
  dp83640: use proper function to free transmit time stamping packets
  ipv6: Do not use routes from locally generated RAs
  |PATCH net-next] tg3: add tx_dropped counter
  be2net: don't create multiple RX/TX rings in multi channel mode
  be2net: don't create multiple TXQs in BE2
  be2net: refactor VF setup/teardown code into be_vf_setup/clear()
  be2net: add vlan/rx-mode/flow-control config to be_setup()
  net_sched: cls_flow: use skb_header_pointer()
  ipv4: avoid useless call of the function check_peer_pmtu
  TCP: remove TCP_DEBUG
  net: Fix driver name for mdio-gpio.c
  ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT
  rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces
  ipv4: fix ipsec forward performance regression
  jme: fix irq storm after suspend/resume
  route: fix ICMP redirect validation
  net: hold sock reference while processing tx timestamps
  tcp: md5: add more const attributes
  Add ethtool -g support to virtio_net
  ...

Fix up conflicts in:
 - drivers/net/Kconfig:
	The split-up generated a trivial conflict with removal of a
	stale reference to Documentation/networking/net-modules.txt.
	Remove it from the new location instead.
 - fs/sysfs/dir.c:
	Fairly nasty conflicts with the sysfs rb-tree usage, conflicting
	with Eric Biederman's changes for tagged directories.
2011-10-25 13:25:22 +02:00
Stanislav Kinsbursky e20de37757 SUNRPC: remove rpcbind clients destruction on module cleanup
Rpcbind clients destruction during SUNRPC module removing is obsolete since now
those clients are destroying during last RPC service shutdown.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-25 13:20:50 +02:00
Stanislav Kinsbursky 0f0c01da44 SUNRPC: remove rpcbind clients creation during service registering
We don't need this code since rpcbind clients are creating during RPC service
creation.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-25 13:20:35 +02:00
Stanislav Kinsbursky 16d0587090 NFSd: call svc rpcbind cleanup explicitly
We have to call svc_rpcb_cleanup() explicitly from nfsd_last_thread() since
this function is registered as service shutdown callback and thus nobody else
will done it for us.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-25 13:19:40 +02:00
Stanislav Kinsbursky 8e356b1e2a SUNRPC: cleanup service destruction
svc_unregister() call have to be removed from svc_destroy() since it will be
called in sv_shutdown callback.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-25 13:19:13 +02:00
Stanislav Kinsbursky e40f5e29ef SUNRPC: setup rpcbind clients if service requires it
New function ("svc_uses_rpcbind") will be used to detect, that new service will
send portmapper register calls. For such services we will create rpcbind
clients and remove all stale portmap registrations.
Also, svc_rpcb_cleanup() will be set as sv_shutdown callback for such services
in case of this field wasn't initialized earlier. This will allow to destroy
rpcbind clients when no other users of them left.

Note: Currently, any creating service will be detected as portmap user.
Probably, this is wrong. But now it depends on program versions "vs_hidden"
flag.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-25 13:18:42 +02:00
Stanislav Kinsbursky d99085605c SUNRPC: introduce svc helpers for prepairing rpcbind infrastructure
This helpers will be used only for those services, that will send portmapper
registration calls.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-25 13:18:05 +02:00
Stanislav Kinsbursky 253fb070e7 SUNRPC: use rpcbind reference counting helpers
All is simple: we just increase users counter if rpcbind clients has been
created already. Otherwise we create them and set users counter to 1.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-25 13:16:36 +02:00
Stanislav Kinsbursky 914edb1bb2 SUNRPC: introduce helpers for reference counted rpcbind clients
v6:
1) added write memory barrier to rpcb_set_local to make sure, that rpcbind
clients become valid before rpcb_users assignment
2) explicitly set rpcb_users to 1 instead of incrementing it (looks clearer from
my pow).

v5: fixed races with rpcb_users in rpcb_get_local()

This helpers will be used for dynamical creation and destruction of rpcbind
clients.
Variable rpcb_users is actually a counter of lauched RPC services. If rpcbind
clients has been created already, then we just increase rpcb_users.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-25 13:15:48 +02:00
Linus Torvalds 2d03423b23 Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (38 commits)
  mm: memory hotplug: Check if pages are correctly reserved on a per-section basis
  Revert "memory hotplug: Correct page reservation checking"
  Update email address for stable patch submission
  dynamic_debug: fix undefined reference to `__netdev_printk'
  dynamic_debug: use a single printk() to emit messages
  dynamic_debug: remove num_enabled accounting
  dynamic_debug: consolidate repetitive struct _ddebug descriptor definitions
  uio: Support physical addresses >32 bits on 32-bit systems
  sysfs: add unsigned long cast to prevent compile warning
  drivers: base: print rejected matches with DEBUG_DRIVER
  memory hotplug: Correct page reservation checking
  memory hotplug: Refuse to add unaligned memory regions
  remove the messy code file Documentation/zh_CN/SubmitChecklist
  ARM: mxc: convert device creation to use platform_device_register_full
  new helper to create platform devices with dma mask
  docs/driver-model: Update device class docs
  docs/driver-model: Document device.groups
  kobj_uevent: Ignore if some listeners cannot handle message
  dynamic_debug: make netif_dbg() call __netdev_printk()
  dynamic_debug: make netdev_dbg() call __netdev_printk()
  ...
2011-10-25 12:13:59 +02:00
Linus Torvalds 59e5253417 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits)
  MAINTAINERS: linux-m32r is moderated for non-subscribers
  linux@lists.openrisc.net is moderated for non-subscribers
  Drop default from "DM365 codec select" choice
  parisc: Kconfig: cleanup Kernel page size default
  Kconfig: remove redundant CONFIG_ prefix on two symbols
  cris: remove arch/cris/arch-v32/lib/nand_init.S
  microblaze: add missing CONFIG_ prefixes
  h8300: drop puzzling Kconfig dependencies
  MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers
  tty: drop superfluous dependency in Kconfig
  ARM: mxc: fix Kconfig typo 'i.MX51'
  Fix file references in Kconfig files
  aic7xxx: fix Kconfig references to READMEs
  Fix file references in drivers/ide/
  thinkpad_acpi: Fix printk typo 'bluestooth'
  bcmring: drop commented out line in Kconfig
  btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888'
  doc: raw1394: Trivial typo fix
  CIFS: Don't free volume_info->UNC until we are entirely done with it.
  treewide: Correct spelling of successfully in comments
  ...
2011-10-25 12:11:02 +02:00
NeilBrown dc6f55e9f8 NFS/sunrpc: don't use a credential with extra groups.
The sunrpc layer keeps a cache of recently used credentials and
'unx_match' is used to find the credential which matches the current
process.

However unx_match allows a match when the cached credential has extra
groups at the end of uc_gids list which are not in the process group list.

So if a process with a list of (say) 4 group accesses a file and gains
access because of the last group in the list, then another process
with the same uid and gid, and a gid list being the first tree of the
gids of the original process tries to access the file, it will be
granted access even though it shouldn't as the wrong rpc credential
will be used.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2011-10-25 11:20:58 +02:00
Andreas Hofmeister 9f56220fad ipv6: Do not use routes from locally generated RAs
When hybrid mode is enabled (accept_ra == 2), the kernel also sees RAs
generated locally. This is useful since it allows the kernel to auto-configure
its own interface addresses.

However, if 'accept_ra_defrtr' and/or 'accept_ra_rtr_pref' are set and the
locally generated RAs announce the default route and/or other route information,
the kernel happily inserts bogus routes with its own address as gateway.

With this patch, adding routes from an RA will be skiped when the RAs source
address matches any local address, just as if 'accept_ra_defrtr' and
'accept_ra_rtr_pref' were set to 0.

Signed-off-by: Andreas Hofmeister <andi@collax.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 19:13:15 -04:00
Eric Dumazet 859c20123a net_sched: cls_flow: use skb_header_pointer()
Dan Siemon would like to add tunnelling support to cls_flow

This preliminary patch introduces use of skb_header_pointer() to help
this task, while avoiding skb head reallocation because of deep packet
inspection.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 18:40:14 -04:00
Gao feng 59445b6b1f ipv4: avoid useless call of the function check_peer_pmtu
In func ipv4_dst_check,check_peer_pmtu should be called only when peer is updated.
So,if the peer is not updated in ip_rt_frag_needed,we can not inc __rt_peer_genid.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 18:30:07 -04:00
David S. Miller 1805b2f048 Merge branch 'master' of ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2011-10-24 18:18:09 -04:00
Flavio Leitner 78d81d15b7 TCP: remove TCP_DEBUG
It was enabled by default and the messages guarded
by the define are useful.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 17:36:08 -04:00
Aneesh Kumar K.V 348b59012e net/9p: Convert net/9p protocol dumps to tracepoints
This helps in more control over debugging.
root@qemu-img-64:~# ls /pass/123
ls: cannot access /pass/123: No such file or directory
root@qemu-img-64:~# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
              ls-1536  [001]    70.928584: 9p_protocol_dump: clnt 18446612132784021504 P9_TWALK(tag = 1)
000: 16 00 00 00 6e 01 00 01 00 00 00 02 00 00 00 01
010: 00 03 00 31 32 33 00 00 00 ff ff ff ff 00 00 00

              ls-1536  [001]    70.928587: <stack trace>
 => trace_9p_protocol_dump
 => p9pdu_finalize
 => p9_client_rpc
 => p9_client_walk
 => v9fs_vfs_lookup
 => d_alloc_and_lookup
 => walk_component
 => path_lookupat
              ls-1536  [000]    70.929696: 9p_protocol_dump: clnt 18446612132784021504 P9_RLERROR(tag = 1)
000: 0b 00 00 00 07 01 00 02 00 00 00 4e 03 00 02 00
010: 00 00 00 00 03 00 02 00 00 00 00 00 ff 43 00 00

              ls-1536  [000]    70.929697: <stack trace>
 => trace_9p_protocol_dump
 => p9_client_rpc
 => p9_client_walk
 => v9fs_vfs_lookup
 => d_alloc_and_lookup
 => walk_component
 => path_lookupat
 => do_path_lookup

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-10-24 11:13:12 -05:00
Dan Carpenter ef6b0807e2 fs/9p: change an int to unsigned int
Without this msize=4294967295 will result in a crash

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-10-24 11:13:12 -05:00
Aneesh Kumar K.V 4d5077f1b2 fs/9p: Cleanup option parsing in 9p
Instead of saying all integer argument option should be listed in the beginning
move integer parsing to each option type.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-10-24 11:13:12 -05:00
Dan Carpenter 5635fd0ccf 9p: move dereference after NULL check
We dereferenced "req->tc" and "req->rc" before checking for NULL.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-10-24 11:13:11 -05:00
Aneesh Kumar K.V abfa034e4b fs/9p: Update zero-copy implementation in 9p
* remove lot of update to different data structure
* add a seperate callback for zero copy request.
* above makes non zero copy code path simpler
* remove conditionalizing TREAD/TREADDIR/TWRITE in the zero copy path
* Fix the dotu p9_check_errors with zero copy. Add sufficient doc around
* Add support for both in and output buffers in zero copy callback
* pin and unpin pages in the same context
* use helpers instead of defining page offset and rest of page ourself
* Fix mem leak in p9_check_errors
* Remove 'E' and 'F' in p9pdu_vwritef

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-10-24 11:13:11 -05:00
Eric Dumazet 66b13d99d9 ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT
There is a long standing bug in linux tcp stack, about ACK messages sent
on behalf of TIME_WAIT sockets.

In the IP header of the ACK message, we choose to reflect TOS field of
incoming message, and this might break some setups.

Example of things that were broken :
  - Routing using TOS as a selector
  - Firewalls
  - Trafic classification / shaping

We now remember in timewait structure the inet tos field and use it in
ACK generation, and route lookup.

Notes :
 - We still reflect incoming TOS in RST messages.
 - We could extend MuraliRaja Muniraju patch to report TOS value in
netlink messages for TIME_WAIT sockets.
 - A patch is needed for IPv6

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 03:06:21 -04:00
Eric W. Biederman d2237d3574 rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces
Renato Westphal noticed that since commit a2835763e1
"rtnetlink: handle rtnl_link netlink notifications manually" was merged
we no longer send a netlink message when a networking device is moved
from one network namespace to another.

Fix this by adding the missing manual notification in dev_change_net_namespaces.

Since all network devices that are processed by dev_change_net_namspaces are
in the initialized state the complicated tests that guard the manual
rtmsg_ifinfo calls in rollback_registered and register_netdevice are
unnecessary and we can just perform a plain notification.

Cc: stable@kernel.org
Tested-by: Renato Westphal <renatowestphal@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 03:03:37 -04:00
Yan, Zheng b73233960a ipv4: fix ipsec forward performance regression
There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
It makes xfrm4_fill_dst() modify wrong data structure.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Reported-by: Kim Phillips <kim.phillips@freescale.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 03:01:22 -04:00
Flavio Leitner 7cc9150ebe route: fix ICMP redirect validation
The commit f39925dbde
(ipv4: Cache learned redirect information in inetpeer.)
removed some ICMP packet validations which are required by
RFC 1122, section 3.2.2.2:
...
  A Redirect message SHOULD be silently discarded if the new
  gateway address it specifies is not on the same connected
  (sub-) net through which the Redirect arrived [INTRO:2,
  Appendix A], or if the source of the Redirect is not the
  current first-hop gateway for the specified destination (see
  Section 3.3.1).

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 02:56:38 -04:00
Richard Cochran da92b194cc net: hold sock reference while processing tx timestamps
The pair of functions,

 * skb_clone_tx_timestamp()
 * skb_complete_tx_timestamp()

were designed to allow timestamping in PHY devices. The first
function, called during the MAC driver's hard_xmit method, identifies
PTP protocol packets, clones them, and gives them to the PHY device
driver. The PHY driver may hold onto the packet and deliver it at a
later time using the second function, which adds the packet to the
socket's error queue.

As pointed out by Johannes, nothing prevents the socket from
disappearing while the cloned packet is sitting in the PHY driver
awaiting a timestamp. This patch fixes the issue by taking a reference
on the socket for each such packet. In addition, the comments
regarding the usage of these function are expanded to highlight the
rule that PHY drivers must use skb_complete_tx_timestamp() to release
the packet, in order to release the socket reference, too.

These functions first appeared in v2.6.36.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Cc: <stable@vger.kernel.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 02:54:50 -04:00
Eric Dumazet 318cf7aaa0 tcp: md5: add more const attributes
Now tcp_md5_hash_header() has a const tcphdr argument, we can add more
const attributes to callers.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 02:46:04 -04:00
Eric Dumazet ca35a0ef85 tcp: md5: dont write skb head in tcp_md5_hash_header()
tcp_md5_hash_header() writes into skb header a temporary zero value,
this might confuse other users of this area.

Since tcphdr is small (20 bytes), copy it in a temporary variable and
make the change in the copy.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 01:52:35 -04:00
Maciej Żenczykowski 2c67e9acb6 net: use INET_ECN_MASK instead of hardcoded 3
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-22 00:07:47 -04:00
Eric Dumazet cf533ea53e tcp: add const qualifiers where possible
Adding const qualifiers to pointers can ease code review, and spot some
bugs. It might allow compiler to optimize code further.

For example, is it legal to temporary write a null cksum into tcphdr
in tcp_md5_hash_header() ? I am afraid a sniffer could catch the
temporary null value...

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21 05:22:42 -04:00
Mihai Maruseac f04565ddf5 dev: use name hash for dev_seq_ops
Instead of using the dev->next chain and trying to resync at each call to
dev_seq_start, use the name hash, keeping the bucket and the offset in
seq->private field.

Tests revealed the following results for ifconfig > /dev/null
	* 1000 interfaces:
		* 0.114s without patch
		* 0.089s with patch
	* 3000 interfaces:
		* 0.489s without patch
		* 0.110s with patch
	* 5000 interfaces:
		* 1.363s without patch
		* 0.250s with patch
	* 128000 interfaces (other setup):
		* ~100s without patch
		* ~30s with patch

Signed-off-by: Mihai Maruseac <mmaruseac@ixiacom.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21 02:54:38 -04:00
Ian Campbell a8605c6063 net: add opaque struct around skb frag page
I've split this bit out of the skb frag destructor patch since it helps enforce
the use of the fragment API.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21 02:52:53 -04:00
Maciej Żenczykowski 6cc7a765c2 net: allow CAP_NET_RAW to set socket options IP{,V6}_TRANSPARENT
Up till now the IP{,V6}_TRANSPARENT socket options (which actually set
the same bit in the socket struct) have required CAP_NET_ADMIN
privileges to set or clear the option.

- we make clearing the bit not require any privileges.
- we allow CAP_NET_ADMIN to set the bit (as before this change)
- we allow CAP_NET_RAW to set this bit, because raw
  sockets already pretty much effectively allow you
  to emulate socket transparency.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20 18:21:36 -04:00
Eric Dumazet 20c4cb792d tcp: remove unused tcp_fin() parameters
tcp_fin() only needs socket pointer, we can remove skb and th params.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20 17:44:03 -04:00
David S. Miller 580043a27d Merge branch 'batman-adv/maint' of git://git.open-mesh.org/linux-merge 2011-10-20 17:40:43 -04:00
Eric Dumazet 33136d12be pktgen: remove ndelay() call
Daniel Turull reported inaccuracies in pktgen when using low packet
rates, because we call ndelay(val) with values bigger than 20000.

Instead of calling ndelay() for delays < 100us, we can instead loop
calling ktime_now() only.

Reported-by: Daniel Turull <daniel.turull@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20 17:00:21 -04:00
Eric Dumazet e9266a02b7 tcp: use TCP_DEFAULT_INIT_RCVWND in tcp_fixup_rcvbuf()
Since commit 356f039822 (TCP: increase default initial receive
window.), we allow sender to send 10 (TCP_DEFAULT_INIT_RCVWND) segments.

Change tcp_fixup_rcvbuf() to reflect this change, even if no real change
is expected, since sysctl_tcp_rmem[1] = 87380 and this value
is bigger than tcp_fixup_rcvbuf() computed rcvmem (~23720)

Note: Since commit 356f039822 limited default window to maximum of
10*1460 and 2*MSS, we use same heuristic in this patch.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20 16:54:51 -04:00
Eric Dumazet 113ab386c7 ip_gre: dont increase dev->needed_headroom on a live device
It seems ip_gre is able to change dev->needed_headroom on the fly.

Its is not legal unfortunately and triggers a BUG in raw_sendmsg()

skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev)

< another cpu change dev->needed_headromm (making it bigger)

...
skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev));

We end with LL_RESERVED_SPACE() being bigger than LL_ALLOCATED_SPACE()
-> we crash later because skb head is exhausted.

Bug introduced in commit 243aad83 in 2.6.34 (ip_gre: include route
header_len in max_headroom calculation)

Reported-by: Elmar Vonlanthen <evonlanthen@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Timo Teräs <timo.teras@iki.fi>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20 16:20:30 -04:00
Ian Campbell a0bec1cd8f net: do not take an additional reference in skb_frag_set_page
I audited all of the callers in the tree and only one of them (pktgen) expects
it to do so. Taking this reference is pretty obviously confusing and error
prone.

In particular I looked at the following commits which switched callers of
(__)skb_frag_set_page to the skb paged fragment api:

6a930b9f16 cxgb3: convert to SKB paged frag API.
5dc3e196ea myri10ge: convert to SKB paged frag API.
0e0634d20d vmxnet3: convert to SKB paged frag API.
86ee8130a4 virtionet: convert to SKB paged frag API.
4a22c4c919 sfc: convert to SKB paged frag API.
18324d690d cassini: convert to SKB paged frag API.
b061b39e3a benet: convert to SKB paged frag API.
b7b6a688d2 bnx2: convert to SKB paged frag API.
804cf14ea5 net: xfrm: convert to SKB frag APIs
ea2ab69379 net: convert core to skb paged frag APIs

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:40:39 -04:00
roy.qing.li@gmail.com e049f28883 neigh: fix rcu splat in neigh_update()
when use dst_get_neighbour to get neighbour, we need
rcu_read_lock to protect, since dst_get_neighbour uses
rcu_dereference.

The bug was reported by Ari Savolainen <ari.m.savolainen@gmail.com>

[  105.612095]
[  105.612096] ===================================================
[  105.612100] [ INFO: suspicious rcu_dereference_check() usage. ]
[  105.612101] ---------------------------------------------------
[  105.612103] include/net/dst.h:91 invoked rcu_dereference_check()
without protection!
[  105.612105]
[  105.612106] other info that might help us debug this:
[  105.612106]
[  105.612108]
[  105.612108] rcu_scheduler_active = 1, debug_locks = 0
[  105.612110] 1 lock held by dnsmasq/2618:
[  105.612111]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff815df8c7>]
rtnl_lock+0x17/0x20
[  105.612120]
[  105.612121] stack backtrace:
[  105.612123] Pid: 2618, comm: dnsmasq Not tainted 3.1.0-rc1 #41
[  105.612125] Call Trace:
[  105.612129]  [<ffffffff810ccdcb>] lockdep_rcu_dereference+0xbb/0xc0
[  105.612132]  [<ffffffff815dc5a9>] neigh_update+0x4f9/0x5f0
[  105.612135]  [<ffffffff815da001>] ? neigh_lookup+0xe1/0x220
[  105.612139]  [<ffffffff81639298>] arp_req_set+0xb8/0x230
[  105.612142]  [<ffffffff8163a59f>] arp_ioctl+0x1bf/0x310
[  105.612146]  [<ffffffff810baa40>] ? lock_hrtimer_base.isra.26+0x30/0x60
[  105.612150]  [<ffffffff8163fb75>] inet_ioctl+0x85/0x90
[  105.612154]  [<ffffffff815b5520>] sock_do_ioctl+0x30/0x70
[  105.612157]  [<ffffffff815b55d3>] sock_ioctl+0x73/0x280
[  105.612162]  [<ffffffff811b7698>] do_vfs_ioctl+0x98/0x570
[  105.612165]  [<ffffffff811a5c40>] ? fget_light+0x340/0x3a0
[  105.612168]  [<ffffffff811b7bbf>] sys_ioctl+0x4f/0x80
[  105.612172]  [<ffffffff816fdcab>] system_call_fastpath+0x16/0x1b

Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com>
Signed-off-by: RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:38:51 -04:00
Dan Carpenter 4f25af2782 filter: use unsigned int to silence static checker warning
This is just a cleanup.

My testing version of Smatch warns about this:
net/core/filter.c +380 check_load_and_stores(6)
	warn: check 'flen' for negative values

flen comes from the user.  We try to clamp the values here between 1
and BPF_MAXINSNS but the clamp doesn't work because it could be
negative.  This is a bug, but it's not exploitable.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:35:51 -04:00
Kevin Wilson 25c8295b5b cleanup: remove unnecessary include.
This cleanup patch removes unnecessary include from net/ipv6/ip6_fib.c.

Signed-off-by: Kevin Wilson <wkevils@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:26:16 -04:00
Gerrit Renker 686dc6b64b ipv4: compat_ioctl is local to af_inet.c, make it static
ipv4: compat_ioctl is local to af_inet.c, make it static

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:24:39 -04:00
Yan, Zheng afaef734e5 fib_rules: fix unresolved_rules counting
we should decrease ops->unresolved_rules when deleting a unresolved rule.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:17:41 -04:00
Richard Cochran 4dc360c5e7 net: validate HWTSTAMP ioctl parameters
This patch adds a sanity check on the values provided by user space for
the hardware time stamping configuration. If the values lie outside of
the absolute limits, then the ioctl request will be denied.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 17:00:35 -04:00
Eric W. Biederman 850a545bd8 net: Move rcu_barrier from rollback_registered_many to netdev_run_todo.
This patch moves the rcu_barrier from rollback_registered_many
(inside the rtnl_lock) into netdev_run_todo (just outside the rtnl_lock).
This allows us to gain the full benefit of sychronize_net calling
synchronize_rcu_expedited when the rtnl_lock is held.

The rcu_barrier in rollback_registered_many was originally a synchronize_net
but was promoted to be a rcu_barrier() when it was found that people were
unnecessarily hitting the 250ms wait in netdev_wait_allrefs().  Changing
the rcu_barrier back to a synchronize_net is therefore safe.

Since we only care about waiting for the rcu callbacks before we get
to netdev_wait_allrefs() it is also safe to move the wait into
netdev_run_todo.

This was tested by creating and destroying 1000 tap devices and observing
/proc/lock_stat.  /proc/lock_stat reports this change reduces the hold
times of the rtnl_lock by a factor of 10.  There was no observable
difference in the amount of time it takes to destroy a network device.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 16:59:42 -04:00
Trond Myklebust d00c5d4386 NFS: Get rid of nfs_restart_rpc()
It can trivially be replaced with rpc_restart_call_prepare.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-19 13:58:30 -07:00
Eric Dumazet 06a59ecb92 tcp: use TCP_INIT_CWND in tcp_fixup_sndbuf()
Initial cwnd being 10 (TCP_INIT_CWND) instead of 3, change
tcp_fixup_sndbuf() to get more than 16384 bytes (sysctl_tcp_wmem[1]) in
initial sk_sndbuf

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 16:53:30 -04:00
Andy Fleming 3d153a7c8b net: Allow skb_recycle_check to be done in stages
skb_recycle_check resets the skb if it's eligible for recycling.
However, there are times when a driver might want to optionally
manipulate the skb data with the skb before resetting the skb,
but after it has determined eligibility.  We do this by splitting the
eligibility check from the skb reset, creating two inline functions to
accomplish that task.

Signed-off-by: Andy Fleming <afleming@freescale.com>
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 15:59:45 -04:00
KOVACS Krisztian 58af19e387 tproxy: copy transparent flag when creating a time wait
The transparent socket option setting was not copied to the time wait
socket when an inet socket was being replaced by a time wait socket. This
broke the --transparent option of the socket match and may have caused
that FIN packets belonging to sockets in FIN_WAIT2 or TIME_WAIT state
were being dropped by the packet filter.

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 03:21:35 -04:00
Eric Dumazet 9e903e0852 net: add skb frag size accessors
To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.

Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 03:10:46 -04:00
Steffen Klassert dd767856a3 xfrm6: Don't call icmpv6_send on local error
Calling icmpv6_send() on a local message size error leads to
an incorrect update of the path mtu. So use xfrm6_local_rxpmtu()
to notify about the pmtu if the IPV6_DONTFRAG socket option is
set on an udp or raw socket, according RFC 3542 and use
ipv6_local_error() otherwise.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-18 23:53:10 -04:00
Steffen Klassert 299b076764 ipv6: Fix IPsec slowpath fragmentation problem
ip6_append_data() builds packets based on the mtu from dst_mtu(rt->dst.path).
On IPsec the effective mtu is lower because we need to add the protocol
headers and trailers later when we do the IPsec transformations. So after
the IPsec transformations the packet might be too big, which leads to a
slowpath fragmentation then. This patch fixes this by building the packets
based on the lower IPsec mtu from dst_mtu(&rt->dst) and adapts the exthdr
handling to this.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-18 23:53:10 -04:00