1
0
Fork 0
Commit Graph

29383 Commits (e05e90702b2638a39b5ae9d22740f3a1607c54a0)

Author SHA1 Message Date
Daniel Borkmann 3f96a53211 net: sctp: rfc4443: do not report ICMP redirects to user space
Adapt the same behaviour for SCTP as present in TCP for ICMP redirect
messages. For IPv6, RFC4443, section 2.4. says:

  ...
  (e) An ICMPv6 error message MUST NOT be originated as a result of
      receiving the following:
  ...
       (e.2) An ICMPv6 redirect message [IPv6-DISC].
  ...

Therefore, do not report an error to user space, just invoke dst's redirect
callback and leave, same for IPv4 as done in TCP as well. The implication
w/o having this patch could be that the reception of such packets would
generate a poll notification and in worst case it could even tear down the
whole connection. Therefore, stop updating sk_err on redirects.

Reported-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Suggested-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-16 21:40:15 -04:00
Ding Zhi 0d2ede929f ip6_tunnels: raddr and laddr are inverted in nl msg
IFLA_IPTUN_LOCAL and IFLA_IPTUN_REMOTE were inverted.

Introduced by c075b13098 (ip6tnl: advertise tunnel param via rtnl).

Signed-off-by: Ding Zhi <zhi.ding@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-16 21:36:12 -04:00
Hong Zhiguo 716ec052d2 bridge: fix NULL pointer deref of br_port_get_rcu
The NULL deref happens when br_handle_frame is called between these
2 lines of del_nbp:
	dev->priv_flags &= ~IFF_BRIDGE_PORT;
	/* --> br_handle_frame is called at this time */
	netdev_rx_handler_unregister(dev);

In br_handle_frame the return of br_port_get_rcu(dev) is dereferenced
without check but br_port_get_rcu(dev) returns NULL if:
	!(dev->priv_flags & IFF_BRIDGE_PORT)

Eric Dumazet pointed out the testing of IFF_BRIDGE_PORT is not necessary
here since we're in rcu_read_lock and we have synchronize_net() in
netdev_rx_handler_unregister. So remove the testing of IFF_BRIDGE_PORT
and by the previous patch, make sure br_port_get_rcu is called in
bridging code.

Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-15 22:03:33 -04:00
Hong Zhiguo 1fb1754a8c bridge: use br_port_get_rtnl within rtnl lock
current br_port_get_rcu is problematic in bridging path
(NULL deref). Change these calls in netlink path first.

Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-15 22:03:33 -04:00
Herbert Xu be4f154d5e bridge: Clamp forward_delay when enabling STP
At some point limits were added to forward_delay.  However, the
limits are only enforced when STP is enabled.  This created a
scenario where you could have a value outside the allowed range
while STP is disabled, which then stuck around even after STP
is enabled.

This patch fixes this by clamping the value when we enable STP.

I had to move the locking around a bit to ensure that there is
no window where someone could insert a value outside the range
while we're in the middle of enabling STP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-12 23:32:14 -04:00
Chris Healy 9a0620133c resubmit bridge: fix message_age_timer calculation
This changes the message_age_timer calculation to use the BPDU's max age as
opposed to the local bridge's max age.  This is in accordance with section
8.6.2.3.2 Step 2 of the 802.1D-1998 sprecification.

With the current implementation, when running with very large bridge
diameters, convergance will not always occur even if a root bridge is
configured to have a longer max age.

Tested successfully on bridge diameters of ~200.

Signed-off-by: Chris Healy <cphealy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-12 23:30:37 -04:00
Daniel Borkmann 95ee62083c net: sctp: fix ipv6 ipsec encryption bug in sctp_v6_xmit
Alan Chester reported an issue with IPv6 on SCTP that IPsec traffic is not
being encrypted, whereas on IPv4 it is. Setting up an AH + ESP transport
does not seem to have the desired effect:

SCTP + IPv4:

  22:14:20.809645 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 116)
    192.168.0.2 > 192.168.0.5: AH(spi=0x00000042,sumlen=16,seq=0x1): ESP(spi=0x00000044,seq=0x1), length 72
  22:14:20.813270 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 340)
    192.168.0.5 > 192.168.0.2: AH(spi=0x00000043,sumlen=16,seq=0x1):

SCTP + IPv6:

  22:31:19.215029 IP6 (class 0x02, hlim 64, next-header SCTP (132) payload length: 364)
    fe80::222:15ff:fe87:7fc.3333 > fe80::92e6:baff:fe0d:5a54.36767: sctp
    1) [INIT ACK] [init tag: 747759530] [rwnd: 62464] [OS: 10] [MIS: 10]

Moreover, Alan says:

  This problem was seen with both Racoon and Racoon2. Other people have seen
  this with OpenSwan. When IPsec is configured to encrypt all upper layer
  protocols the SCTP connection does not initialize. After using Wireshark to
  follow packets, this is because the SCTP packet leaves Box A unencrypted and
  Box B believes all upper layer protocols are to be encrypted so it drops
  this packet, causing the SCTP connection to fail to initialize. When IPsec
  is configured to encrypt just SCTP, the SCTP packets are observed unencrypted.

In fact, using `socat sctp6-listen:3333 -` on one end and transferring "plaintext"
string on the other end, results in cleartext on the wire where SCTP eventually
does not report any errors, thus in the latter case that Alan reports, the
non-paranoid user might think he's communicating over an encrypted transport on
SCTP although he's not (tcpdump ... -X):

  ...
  0x0030: 5d70 8e1a 0003 001a 177d eb6c 0000 0000  ]p.......}.l....
  0x0040: 0000 0000 706c 6169 6e74 6578 740a 0000  ....plaintext...

Only in /proc/net/xfrm_stat we can see XfrmInTmplMismatch increasing on the
receiver side. Initial follow-up analysis from Alan's bug report was done by
Alexey Dobriyan. Also thanks to Vlad Yasevich for feedback on this.

SCTP has its own implementation of sctp_v6_xmit() not calling inet6_csk_xmit().
This has the implication that it probably never really got updated along with
changes in inet6_csk_xmit() and therefore does not seem to invoke xfrm handlers.

SCTP's IPv4 xmit however, properly calls ip_queue_xmit() to do the work. Since
a call to inet6_csk_xmit() would solve this problem, but result in unecessary
route lookups, let us just use the cached flowi6 instead that we got through
sctp_v6_get_dst(). Since all SCTP packets are being sent through sctp_packet_transmit(),
we do the route lookup / flow caching in sctp_transport_route(), hold it in
tp->dst and skb_dst_set() right after that. If we would alter fl6->daddr in
sctp_v6_xmit() to np->opt->srcrt, we possibly could run into the same effect
of not having xfrm layer pick it up, hence, use fl6_update_dst() in sctp_v6_get_dst()
instead to get the correct source routed dst entry, which we assign to the skb.

Also source address routing example from 625034113 ("sctp: fix sctp to work with
ipv6 source address routing") still works with this patch! Nevertheless, in RFC5095
it is actually 'recommended' to not use that anyway due to traffic amplification [1].
So it seems we're not supposed to do that anyway in sctp_v6_xmit(). Moreover, if
we overwrite the flow destination here, the lower IPv6 layer will be unable to
put the correct destination address into IP header, as routing header is added in
ipv6_push_nfrag_opts() but then probably with wrong final destination. Things aside,
result of this patch is that we do not have any XfrmInTmplMismatch increase plus on
the wire with this patch it now looks like:

SCTP + IPv6:

  08:17:47.074080 IP6 2620:52:0:102f:7a2b:cbff:fe27:1b0a > 2620:52:0:102f:213:72ff:fe32:7eba:
    AH(spi=0x00005fb4,seq=0x1): ESP(spi=0x00005fb5,seq=0x1), length 72
  08:17:47.074264 IP6 2620:52:0:102f:213:72ff:fe32:7eba > 2620:52:0:102f:7a2b:cbff:fe27:1b0a:
    AH(spi=0x00003d54,seq=0x1): ESP(spi=0x00003d55,seq=0x1), length 296

This fixes Kernel Bugzilla 24412. This security issue seems to be present since
2.6.18 kernels. Lets just hope some big passive adversary in the wild didn't have
its fun with that. lksctp-tools IPv6 regression test suite passes as well with
this patch.

 [1] http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf

Reported-by: Alan Chester <alan.chester@tekelec.com>
Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-12 17:24:43 -04:00
Sonic Zhang b0dd663b60 netpoll: Should handle ETH_P_ARP other than ETH_P_IP in netpoll_neigh_reply
The received ARP request type in the Ethernet packet head is ETH_P_ARP other than ETH_P_IP.

[ Bug introduced by commit b7394d2429
  ("netpoll: prepare for ipv6") ]

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-12 17:19:14 -04:00
Linus Torvalds c2d95729e3 Merge branch 'akpm' (patches from Andrew Morton)
Merge first patch-bomb from Andrew Morton:
 - Some pidns/fork/exec tweaks
 - OCFS2 updates
 - Most of MM - there remain quite a few memcg parts which depend on
   pending core cgroups changes.  Which might have been already merged -
   I'll check tomorrow...
 - Various misc stuff all over the place
 - A few block bits which I never got around to sending to Jens -
   relatively minor things.
 - MAINTAINERS maintenance
 - A small number of lib/ updates
 - checkpatch updates
 - epoll
 - firmware/dmi-scan
 - Some kprobes work for S390
 - drivers/rtc updates
 - hfsplus feature work
 - vmcore feature work
 - rbtree upgrades
 - AOE updates
 - pktcdvd cleanups
 - PPS
 - memstick
 - w1
 - New "inittmpfs" feature, which does the obvious
 - More IPC work from Davidlohr.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (303 commits)
  lz4: fix compression/decompression signedness mismatch
  ipc: drop ipc_lock_check
  ipc, shm: drop shm_lock_check
  ipc: drop ipc_lock_by_ptr
  ipc, shm: guard against non-existant vma in shmdt(2)
  ipc: document general ipc locking scheme
  ipc,msg: drop msg_unlock
  ipc: rename ids->rw_mutex
  ipc,shm: shorten critical region for shmat
  ipc,shm: cleanup do_shmat pasta
  ipc,shm: shorten critical region for shmctl
  ipc,shm: make shmctl_nolock lockless
  ipc,shm: introduce shmctl_nolock
  ipc: drop ipcctl_pre_down
  ipc,shm: shorten critical region in shmctl_down
  ipc,shm: introduce lockless functions to obtain the ipc object
  initmpfs: use initramfs if rootfstype= or root= specified
  initmpfs: make rootfs use tmpfs when CONFIG_TMPFS enabled
  initmpfs: move rootfs code from fs/ramfs/ to init/
  initmpfs: move bdi setup from init_rootfs to init_ramfs
  ...
2013-09-11 16:08:54 -07:00
Mathieu Desnoyers 3ddc5b46a8 kernel-wide: fix missing validations on __get/__put/__copy_to/__copy_from_user()
I found the following pattern that leads in to interesting findings:

  grep -r "ret.*|=.*__put_user" *
  grep -r "ret.*|=.*__get_user" *
  grep -r "ret.*|=.*__copy" *

The __put_user() calls in compat_ioctl.c, ptrace compat, signal compat,
since those appear in compat code, we could probably expect the kernel
addresses not to be reachable in the lower 32-bit range, so I think they
might not be exploitable.

For the "__get_user" cases, I don't think those are exploitable: the worse
that can happen is that the kernel will copy kernel memory into in-kernel
buffers, and will fail immediately afterward.

The alpha csum_partial_copy_from_user() seems to be missing the
access_ok() check entirely.  The fix is inspired from x86.  This could
lead to information leak on alpha.  I also noticed that many architectures
map csum_partial_copy_from_user() to csum_partial_copy_generic(), but I
wonder if the latter is performing the access checks on every
architectures.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:58:18 -07:00
Linus Torvalds bbda1baeeb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Brown paper bag fix in HTB scheduler, class options set incorrectly
    due to a typoe.  Fix from Vimalkumar.

 2) It's possible for the ipv6 FIB garbage collector to run before all
    the necessary datastructure are setup during init, defer the
    notifier registry to avoid this problem.  Fix from Michal Kubecek.

 3) New i40e ethernet driver from the Intel folks.

 4) Add new qmi wwan device IDs, from Bjørn Mork.

 5) Doorbell lock in bnx2x driver is not initialized properly in some
    configurations, fix from Ariel Elior.

 6) Revert an ipv6 packet option padding change that broke standardized
    ipv6 implementation test suites.  From Jiri Pirko.

 7) Fix synchronization of ARP information in bonding layer, from
    Nikolay Aleksandrov.

 8) Fix missing error return resulting in illegal memory accesses in
    openvswitch, from Daniel Borkmann.

 9) SCTP doesn't signal poll events properly due to mistaken operator
    precedence, fix also from Daniel Borkmann.

10) __netdev_pick_tx() passes wrong index to sk_tx_queue_set() which
    essentially disables caching of TX queue in sockets :-/ Fix from
    Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
  net_sched: htb: fix a typo in htb_change_class()
  net: qmi_wwan: add new Qualcomm devices
  ipv6: don't call fib6_run_gc() until routing is ready
  net: tilegx driver: avoid compiler warning
  fib6_rules: fix indentation
  irda: vlsi_ir: Remove casting the return value which is a void pointer
  irda: donauboe: Remove casting the return value which is a void pointer
  net: fix multiqueue selection
  net: sctp: fix smatch warning in sctp_send_asconf_del_ip
  net: sctp: fix bug in sctp_poll for SOCK_SELECT_ERR_QUEUE
  net: fib: fib6_add: fix potential NULL pointer dereference
  net: ovs: flow: fix potential illegal memory access in __parse_flow_nlattrs
  bcm63xx_enet: remove deprecated IRQF_DISABLED
  net: korina: remove deprecated IRQF_DISABLED
  macvlan: Move skb_clone check closer to call
  qlcnic: Fix warning reported by kbuild test robot.
  bonding: fix bond_arp_rcv setting and arp validate desync state
  bonding: fix store_arp_validate race with mode change
  ipv6/exthdrs: accept tlv which includes only padding
  bnx2x: avoid atomic allocations during initialization
  ...
2013-09-11 14:33:16 -07:00
Vimalkumar f3ad857e3d net_sched: htb: fix a typo in htb_change_class()
Fix a typo added in commit 56b765b79 ("htb: improved accuracy at high
rates")

cbuffer should not be a copy of buffer.

Signed-off-by: Vimalkumar <j.vimal@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11 17:16:22 -04:00
Michal Kubeček 2c861cc65e ipv6: don't call fib6_run_gc() until routing is ready
When loading the ipv6 module, ndisc_init() is called before
ip6_route_init(). As the former registers a handler calling
fib6_run_gc(), this opens a window to run the garbage collector
before necessary data structures are initialized. If a network
device is initialized in this window, adding MAC address to it
triggers a NETDEV_CHANGEADDR event, leading to a crash in
fib6_clean_all().

Take the event handler registration out of ndisc_init() into a
separate function ndisc_late_init() and move it after
ip6_route_init().

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11 17:04:09 -04:00
Stefan Tomanek 04f0888da2 fib6_rules: fix indentation
This change just removes two tabs from the source file.

Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11 16:16:29 -04:00
Eric Dumazet 50d1784ee4 net: fix multiqueue selection
commit 416186fbf8 ("net: Split core bits of netdev_pick_tx
into __netdev_pick_tx") added a bug that disables caching of queue
index in the socket.

This is the source of packet reorders for TCP flows, and
again this is happening more often when using FQ pacing.

Old code was doing

if (queue_index != old_index)
	sk_tx_queue_set(sk, queue_index);

Alexander renamed the variables but forgot to change sk_tx_queue_set()
2nd parameter.

if (queue_index != new_index)
	sk_tx_queue_set(sk, queue_index);

This means we store -1 over and over in sk->sk_tx_queue_mapping

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11 16:10:00 -04:00
Daniel Borkmann 88362ad8f9 net: sctp: fix smatch warning in sctp_send_asconf_del_ip
This was originally reported in [1] and posted by Neil Horman [2], he said:

  Fix up a missed null pointer check in the asconf code. If we don't find
  a local address, but we pass in an address length of more than 1, we may
  dereference a NULL laddr pointer. Currently this can't happen, as the only
  users of the function pass in the value 1 as the addrcnt parameter, but
  its not hot path, and it doesn't hurt to check for NULL should that ever
  be the case.

The callpath from sctp_asconf_mgmt() looks okay. But this could be triggered
from sctp_setsockopt_bindx() call with SCTP_BINDX_REM_ADDR and addrcnt > 1
while passing all possible addresses from the bind list to SCTP_BINDX_REM_ADDR
so that we do *not* find a single address in the association's bind address
list that is not in the packed array of addresses. If this happens when we
have an established association with ASCONF-capable peers, then we could get
a NULL pointer dereference as we only check for laddr == NULL && addrcnt == 1
and call later sctp_make_asconf_update_ip() with NULL laddr.

BUT: this actually won't happen as sctp_bindx_rem() will catch such a case
and return with an error earlier. As this is incredably unintuitive and error
prone, add a check to catch at least future bugs here. As Neil says, its not
hot path. Introduced by 8a07eb0a5 ("sctp: Add ASCONF operation on the
single-homed host").

 [1] http://www.spinics.net/lists/linux-sctp/msg02132.html
 [2] http://www.spinics.net/lists/linux-sctp/msg02133.html

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Michio Honda <micchie@sfc.wide.ad.jp>
Acked-By: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11 16:10:00 -04:00
Daniel Borkmann a0fb05d1ae net: sctp: fix bug in sctp_poll for SOCK_SELECT_ERR_QUEUE
If we do not add braces around ...

  mask |= POLLERR |
          sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? POLLPRI : 0;

... then this condition always evaluates to true as POLLERR is
defined as 8 and binary or'd with whatever result comes out of
sock_flag(). Hence instead of (X | Y) ? A : B, transform it into
X | (Y ? A : B). Unfortunatelty, commit 8facd5fb73 ("net: fix
smatch warnings inside datagram_poll") forgot about SCTP. :-(

Introduced by 7d4c04fc17 ("net: add option to enable error queue
packets waking select").

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11 16:09:59 -04:00
Daniel Borkmann ae7b4e1f21 net: fib: fib6_add: fix potential NULL pointer dereference
When the kernel is compiled with CONFIG_IPV6_SUBTREES, and we return
with an error in fn = fib6_add_1(), then error codes are encoded into
the return pointer e.g. ERR_PTR(-ENOENT). In such an error case, we
write the error code into err and jump to out, hence enter the if(err)
condition. Now, if CONFIG_IPV6_SUBTREES is enabled, we check for:

  if (pn != fn && pn->leaf == rt)
    ...
  if (pn != fn && !pn->leaf && !(pn->fn_flags & RTN_RTINFO))
    ...

Since pn is NULL and fn is f.e. ERR_PTR(-ENOENT), then pn != fn
evaluates to true and causes a NULL-pointer dereference on further
checks on pn. Fix it, by setting both NULL in error case, so that
pn != fn already evaluates to false and no further dereference
takes place.

This was first correctly implemented in 4a287eba2 ("IPv6 routing,
NLM_F_* flag support: REPLACE and EXCL flags support, warn about
missing CREATE flag"), but the bug got later on introduced by
188c517a0 ("ipv6: return errno pointers consistently for fib6_add_1()").

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Lin Ming <mlin@ss.pku.edu.cn>
Cc: Matti Vaittinen <matti.vaittinen@nsn.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Matti Vaittinen <matti.vaittinen@nsn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11 16:09:59 -04:00
Daniel Borkmann 3bf4b5b11d net: ovs: flow: fix potential illegal memory access in __parse_flow_nlattrs
In function __parse_flow_nlattrs(), we check for condition
(type > OVS_KEY_ATTR_MAX) and if true, print an error, but we do
not return from this function as in other checks. It seems this
has been forgotten, as otherwise, we could access beyond the
memory of ovs_key_lens, which is of ovs_key_lens[OVS_KEY_ATTR_MAX + 1].
Hence, a maliciously prepared nla_type from user space could access
beyond this upper limit.

Introduced by 03f0d916a ("openvswitch: Mega flow implementation").

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Andy Zhou <azhou@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11 16:09:58 -04:00
Jiri Pirko 8112b1fe07 ipv6/exthdrs: accept tlv which includes only padding
In rfc4942 and rfc2460 I cannot find anything which would implicate to
drop packets which have only padding in tlv.

Current behaviour breaks TAHI Test v6LC.1.2.6.

Problem was intruduced in:
9b905fe684 "ipv6/exthdrs: strict Pad1 and PadN check"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11 15:52:27 -04:00
Linus Torvalds 2b76db6a0f for-linus-3.12-merge minor 9p fixes and tweaks for 3.12 merge window
The first fixes namespace issues which causes a kernel
 NULL pointer dereference, the second fixes uevent
 handling to work better with udev, and the third
 switches some code to use srlcpy instead of strncpy
 in order to be safer.
 
 All changes have been baking in for-next for at least
 2 weeks.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 Comment: GPGTools - http://gpgtools.org
 
 iQIcBAABAgAGBQJSMJjZAAoJEDZk62b0Tg6x81sQAKa60QStBKhnL65bvG+ooIsS
 mhwfmFyaWOKw1ezwY2Vk0+JnmKDBpKmqjjwyL3nLP18TcRZStPiFdcJBKWl+czge
 FTv14t54CcjysYPbYN7+gUap4F5mfg0mcHaR0UGow505dNyjwd7mqkZhy1IqhdvP
 Ue/h0RE46GeNtdirxrKBdEfW/7TAL0tcoRgjKu0ev1V2sXCJZywuXgkzWjByRXwT
 JOg04gGnYThuek0/KUPRhf0KxB0CyKrZiics7LGb40HkYYxs7ahADACttLyiDr8l
 GntfHXLgvVlU5QcSbKRfLp0zNbi7AxWmJrwYsEwpas4tUw1Q+pVJ2EE2Ameuq5G+
 LrMGmRVQCVYw8UN+OYUO7glhXEJcCPJj6vxgm+NVXx24yaQyGI1aTsIEjHwZ/hkm
 wlQHC47z6/fIypkXpsU6pYWF/r3GwXHokYReejATQWEPIzIxvHeThe0jjqMLth7F
 zmsHZTpmECqtti1fizy5wBZD25wAIxdf+rf8nKy1VvcSN4s08ESSlC/kV/siNeko
 efFnL8xbjP5SPEVoBtXM6eTDHrQ0S+ACSGWtp0FGXKOW4PKzS60ve2Stp+FYZgQc
 WgXI7+NBU6Z9z+cZ9bsY0hrGwK1YZiR4F3KJ5ofTuxAO6n7zd+N3fGBuQJ2tiW9P
 pKtIXNozWqnAU9Wx4rGa
 =YbFT
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-3.12-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull 9p updates from Eric Van Hensbergen:
 "Minor 9p fixes and tweaks for 3.12 merge window

  The first fixes namespace issues which causes a kernel NULL pointer
  dereference, the second fixes uevent handling to work better with
  udev, and the third switches some code to use srlcpy instead of
  strncpy in order to be safer.

  All changes have been baking in for-next for at least 2 weeks"

* tag 'for-linus-3.12-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: avoid accessing utsname after namespace has been torn down
  9p: send uevent after adding/removing mount_tag attribute
  fs: 9p: use strlcpy instead of strncpy
2013-09-11 12:34:13 -07:00
Linus Torvalds cf596766fc Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "This was a very quiet cycle! Just a few bugfixes and some cleanup"

* 'nfsd-next' of git://linux-nfs.org/~bfields/linux:
  rpc: let xdr layer allocate gssproxy receieve pages
  rpc: fix huge kmalloc's in gss-proxy
  rpc: comment on linux_cred encoding, treat all as unsigned
  rpc: clean up decoding of gssproxy linux creds
  svcrpc: remove unused rq_resused
  nfsd4: nfsd4_create_clid_dir prints uninitialized data
  nfsd4: fix leak of inode reference on delegation failure
  Revert "nfsd: nfs4_file_get_access: need to be more careful with O_RDWR"
  sunrpc: prepare NFS for 2038
  nfsd4: fix setlease error return
  nfsd: nfs4_file_get_access: need to be more careful with O_RDWR
2013-09-10 20:04:59 -07:00
Linus Torvalds bf97293eb8 NFS client updates for Linux 3.12
Highlights include:
 
 - Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as
   lease loss due to a network partition, where doing so may result in data
   corruption. Add a kernel parameter to control choice of legacy behaviour
   or not.
 - Performance improvements when 2 processes are writing to the same file.
 - Flush data to disk when an RPCSEC_GSS session timeout is imminent.
 - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
   NFS clients from being able to manipulate our lease and file lockingr
   state.
 - Allow sharing of RPCSEC_GSS caches between different rpc clients
 - Fix the broken NFSv4 security auto-negotiation between client and server
 - Fix rmdir() to wait for outstanding sillyrename unlinks to complete
 - Add a tracepoint framework for debugging NFSv4 state recovery issues.
 - Add tracing to the generic NFS layer.
 - Add tracing for the SUNRPC socket connection state.
 - Clean up the rpc_pipefs mount/umount event management.
 - Merge more patches from Chuck in preparation for NFSv4 migration support.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSLelVAAoJEGcL54qWCgDyo2IQAKOfRJyZVnf4ipxi3xLNl1QF
 w/70DVSIF1S1djWN7G3vgkxj/R8KCvJ8CcvkAD2BEgRDeZJ9TtyKAdM/jYLZ+W05
 7k2QKk8fkwZmc1Y2qDqFwKHzP5ZgP5L2nGx7FNhi/99wEAe47yFG3qd3rUWKrcOf
 mnd863zgGDE2Q10slhoq/bywwMJo6tKZNeaIE8kPjgFbBEh/jslpAWr8dSA4QgvJ
 nZ8VB5XU8L+XJ0GpHHdjYm9LvQ51DbQ6omOF+0P4fI093azKmf4ZsrjMDWT8+iu3
 XkXlnQmKLGTi7yB43hHtn2NiRqwGzCcZ1Amo9PpCFaHUt1RP9cc37UhG1T+x1xWJ
 STEKDbvCdQ3FU9FvbgrGEwBR0e8fNS4fZY3ToDBflIcfwre0aWs5RCodZMUD0nUI
 4wY5J9NsQR/bL+v8KeUR4V4cXK8YrgL0zB4u4WYzH5Npxr5KD0NEKDNqRPhrB9l2
 LLF9Haql8j76Ff0ek6UGFIZjDE0h6Fs71wLBpLj+ZWArOJ7vBuLMBSOVqNpld9+9
 f2fEG7qoGF4FGTY4myH/eakMPaWnk9Ol4Ls/svSIapJ9+rePD+a93e/qnmdofIMf
 4TuEYk6ERib1qXgaeDRQuCsm2YE1Co5skGMaOsRFWgReE1c12QoJQVst2nMtEKp3
 uV2w8LgX18aZOZXJVkCM
 =ZuW+
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

   - Fix NFSv4 recovery so that it doesn't recover lost locks in cases
     such as lease loss due to a network partition, where doing so may
     result in data corruption.  Add a kernel parameter to control
     choice of legacy behaviour or not.
   - Performance improvements when 2 processes are writing to the same
     file.
   - Flush data to disk when an RPCSEC_GSS session timeout is imminent.
   - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
     NFS clients from being able to manipulate our lease and file
     locking state.
   - Allow sharing of RPCSEC_GSS caches between different rpc clients.
   - Fix the broken NFSv4 security auto-negotiation between client and
     server.
   - Fix rmdir() to wait for outstanding sillyrename unlinks to complete
   - Add a tracepoint framework for debugging NFSv4 state recovery
     issues.
   - Add tracing to the generic NFS layer.
   - Add tracing for the SUNRPC socket connection state.
   - Clean up the rpc_pipefs mount/umount event management.
   - Merge more patches from Chuck in preparation for NFSv4 migration
     support"

* tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits)
  NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity
  NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set
  NFSv4: Allow security autonegotiation for submounts
  NFSv4: Disallow security negotiation for lookups when 'sec=' is specified
  NFSv4: Fix security auto-negotiation
  NFS: Clean up nfs_parse_security_flavors()
  NFS: Clean up the auth flavour array mess
  NFSv4.1 Use MDS auth flavor for data server connection
  NFS: Don't check lock owner compatability unless file is locked (part 2)
  NFS: Don't check lock owner compatibility in writes unless file is locked
  nfs4: Map NFS4ERR_WRONG_CRED to EPERM
  nfs4.1: Add SP4_MACH_CRED write and commit support
  nfs4.1: Add SP4_MACH_CRED stateid support
  nfs4.1: Add SP4_MACH_CRED secinfo support
  nfs4.1: Add SP4_MACH_CRED cleanup support
  nfs4.1: Add state protection handler
  nfs4.1: Minimal SP4_MACH_CRED implementation
  SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid
  SUNRPC: Add an identifier for struct rpc_clnt
  SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints
  ...
2013-09-09 09:19:15 -07:00
Linus Torvalds 6cccc7d301 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
 "This includes both the first pile of Ceph patches (which I sent to
  torvalds@vger, sigh) and a few new patches that add support for
  fscache for Ceph.  That includes a few fscache core fixes that David
  Howells asked go through the Ceph tree.  (Thanks go to Milosz Tanski
  for putting this feature together)

  This first batch of patches (included here) had (has) several
  important RBD bug fixes, hole punch support, several different
  cleanups in the page cache interactions, improvements in the truncate
  code (new truncate mutex to avoid shenanigans with i_mutex), and a
  series of fixes in the synchronous striping read/write code.

  On top of that is a random collection of small fixes all across the
  tree (error code checks and error path cleanup, obsolete wq flags,
  etc)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (43 commits)
  ceph: use d_invalidate() to invalidate aliases
  ceph: remove ceph_lookup_inode()
  ceph: trivial buildbot warnings fix
  ceph: Do not do invalidate if the filesystem is mounted nofsc
  ceph: page still marked private_2
  ceph: ceph_readpage_to_fscache didn't check if marked
  ceph: clean PgPrivate2 on returning from readpages
  ceph: use fscache as a local presisent cache
  fscache: Netfs function for cleanup post readpages
  FS-Cache: Fix heading in documentation
  CacheFiles: Implement interface to check cache consistency
  FS-Cache: Add interface to check consistency of a cached object
  rbd: fix null dereference in dout
  rbd: fix buffer size for writes to images with snapshots
  libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc
  rbd: fix I/O error propagation for reads
  ceph: use vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem
  ceph: allow sync_read/write return partial successed size of read/write.
  ceph: fix bugs about handling short-read for sync read mode.
  ceph: remove useless variable revoked_rdcache
  ...
2013-09-09 09:13:22 -07:00
Linus Torvalds c7c4591db6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace changes from Eric Biederman:
 "This is an assorted mishmash of small cleanups, enhancements and bug
  fixes.

  The major theme is user namespace mount restrictions.  nsown_capable
  is killed as it encourages not thinking about details that need to be
  considered.  A very hard to hit pid namespace exiting bug was finally
  tracked and fixed.  A couple of cleanups to the basic namespace
  infrastructure.

  Finally there is an enhancement that makes per user namespace
  capabilities usable as capabilities, and an enhancement that allows
  the per userns root to nice other processes in the user namespace"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  userns:  Kill nsown_capable it makes the wrong thing easy
  capabilities: allow nice if we are privileged
  pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
  userns: Allow PR_CAPBSET_DROP in a user namespace.
  namespaces: Simplify copy_namespaces so it is clear what is going on.
  pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
  sysfs: Restrict mounting sysfs
  userns: Better restrictions on when proc and sysfs can be mounted
  vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces
  kernel/nsproxy.c: Improving a snippet of code.
  proc: Restrict mounting the proc filesystem
  vfs: Lock in place mounts from more privileged users
2013-09-07 14:35:32 -07:00
Linus Torvalds 0ffb01d9de Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A quick set of fixes, some to deal with fallout from yesterday's
  net-next merge.

   1) Fix compilation of bnx2x driver with CONFIG_BNX2X_SRIOV disabled,
      from Dmitry Kravkov.

   2) Fix a bnx2x regression caused by one of Dave Jones's mistaken
      braces changes, from Eilon Greenstein.

   3) Add some protective filtering in the netlink tap code, from Daniel
      Borkmann.

   4) Fix TCP congestion window growth regression after timeouts, from
      Yuchung Cheng.

   5) Correctly adjust TCP's rcv_ssthresh for out of order packets, from
      Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  tcp: properly increase rcv_ssthresh for ofo packets
  net: add documentation for BQL helpers
  mlx5: remove unused MLX5_DEBUG param in Kconfig
  bnx2x: Restore a call to config_init
  bnx2x: fix broken compilation with CONFIG_BNX2X_SRIOV is not set
  tcp: fix no cwnd growth after timeout
  net: netlink: filter particular protocols from analyzers
2013-09-07 14:27:46 -07:00
Eric Dumazet 4e4f1fc226 tcp: properly increase rcv_ssthresh for ofo packets
TCP receive window handling is multi staged.

A socket has a memory budget, static or dynamic, in sk_rcvbuf.

Because we do not really know how this memory budget translates to
a TCP window (payload), TCP announces a small initial window
(about 20 MSS).

When a packet is received, we increase TCP rcv_win depending
on the payload/truesize ratio of this packet. Good citizen
packets give a hint that it's reasonable to have rcv_win = sk_rcvbuf/2

This heuristic takes place in tcp_grow_window()

Problem is : We currently call tcp_grow_window() only for in-order
packets.

This means that reorders or packet losses stop proper grow of
rcv_win, and senders are unable to benefit from fast recovery,
or proper reordering level detection.

Really, a packet being stored in OFO queue is not a bad citizen.
It should be part of the game as in-order packets.

In our traces, we very often see sender is limited by linux small
receive windows, even if linux hosts use autotuning (DRS) and should
allow rcv_win to grow to ~3MB.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-06 14:43:49 -04:00
Yuchung Cheng 16edfe7ee0 tcp: fix no cwnd growth after timeout
In commit 0f7cc9a3 "tcp: increase throughput when reordering is high",
it only allows cwnd to increase in Open state. This mistakenly disables
slow start after timeout (CA_Loss). Moreover cwnd won't grow if the
state moves from Disorder to Open later in tcp_fastretrans_alert().

Therefore the correct logic should be to allow cwnd to grow as long
as the data is received in order in Open, Loss, or even Disorder state.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-06 14:43:49 -04:00
Daniel Borkmann 5ffd5cddd4 net: netlink: filter particular protocols from analyzers
Fix finer-grained control and let only a whitelist of allowed netlink
protocols pass, in our case related to networking. If later on, other
subsystems decide they want to add their protocol as well to the list
of allowed protocols they shall simply add it. While at it, we also
need to tell what protocol is in use otherwise BPF_S_ANC_PROTOCOL can
not pick it up (as it's not filled out).

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-06 14:43:48 -04:00
Milosz Tanski cd0a2df681 Patches for Ceph FS-Cache support
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIVAwUAUimQLxOxKuMESys7AQJc+Q/+N3BN+ZWRhfqSKANFyEuXIsUmzueCmmYc
 ZOgdRGTorlYKefqFHTFOHLvPbbxXaT2/HKUhq38yuA8UuIkoPL3rRQpjNcvWR+TX
 s/H17MRqPeZOfaDC/p9y2uL5MbuUvzhlCZ/GTi5w2ZwiNuWBo10gxeyCrXQSOFtH
 YFq0dVuG0AXzWdZuWcM3MtgY0llcMmfnZpIDjF4JDXJidgXY+wtjNUF3ByYZ+33+
 +CmzXrnCaY+3N44Ji2Dn+ci8tym8uht4dnbTZFkQ0I6B+k93V2RkZeHWnDQWqW+c
 THjyG9c+LSf0m8FIh43DNNJSkywbh5dxsBgnqxhQTJMij0dV1ne8wjKptJMgz+0b
 HFUi4rE6oRQtbLdTtJhdjdFFORBGdFj71gW8foBdFAZTP4Amf/fbiAfHeK+33oDt
 s5PMJyfA3BDM90eBFoxDWjCEe+o6YcccHC0SVWM1ZJPQ/U0hXL99O6NSHBrr9iNP
 spBgM+fNTgtUMf6P6MwjJfTQHov5xevBNaLB3boUPAI6/yK8KQt9xoevJ7t902uQ
 /19bXoNgMmwAti1Gd6T5UnWlAHsOWdnIASUu8LVqEoh2PY42T2I4NTWhgs4NFntu
 91MYysF93sTx0sMvGvWdCUSl7zMMeKXCUSUnPvMld+BMqyuK3XzsO3xkMn97t2/U
 p6kQZXZDlwU=
 =s35L
 -----END PGP SIGNATURE-----

Merge tag 'fscache-fixes-for-ceph' into wip-fscache

Patches for Ceph FS-Cache support
2013-09-06 16:41:20 +00:00
Linus Torvalds 2e515bf096 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "The usual trivial updates all over the tree -- mostly typo fixes and
  documentation updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits)
  doc: Documentation/cputopology.txt fix typo
  treewide: Convert retrun typos to return
  Fix comment typo for init_cma_reserved_pageblock
  Documentation/trace: Correcting and extending tracepoint documentation
  mm/hotplug: fix a typo in Documentation/memory-hotplug.txt
  power: Documentation: Update s2ram link
  doc: fix a typo in Documentation/00-INDEX
  Documentation/printk-formats.txt: No casts needed for u64/s64
  doc: Fix typo "is is" in Documentations
  treewide: Fix printks with 0x%#
  zram: doc fixes
  Documentation/kmemcheck: update kmemcheck documentation
  doc: documentation/hwspinlock.txt fix typo
  PM / Hibernate: add section for resume options
  doc: filesystems : Fix typo in Documentations/filesystems
  scsi/megaraid fixed several typos in comments
  ppc: init_32: Fix error typo "CONFIG_START_KERNEL"
  treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks
  page_isolation: Fix a comment typo in test_pages_isolated()
  doc: fix a typo about irq affinity
  ...
2013-09-06 09:36:28 -07:00
Linus Torvalds 22e04f6b4b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:
 "Highlights:

   - conversion of HID subsystem to use devm-based resource management,
     from Benjamin Tissoires

   - i2c-hid support for DT bindings, from Benjamin Tissoires

   - much improved support for Win8-multitouch devices, from Benjamin
     Tissoires

   - cleanup of core code using common hidinput_input_event(), from
     David Herrmann

   - fix for bug in implement() access to the bit stream (causing oops)
     that has been present in the code for ages, but devices that are
     able to trigger it have started to appear only now, from Jiri
     Kosina

   - fixes for CVE-2013-2899, CVE-2013-2898, CVE-2013-2896,
     CVE-2013-2892, CVE-2013-2888 (all triggerable only by specially
     crafted malicious HW devices plugged into the system), from Kees
     Cook

   - hidraw oops fix, from Manoj Chourasia

   - various smaller fixes here and there, support for a bunch of new
     devices by various contributors"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (53 commits)
  HID: MAINTAINERS: add roccat drivers
  HID: hid-sensor-hub: change kmalloc + memcpy by kmemdup
  HID: hid-sensor-hub: move to devm_kzalloc
  HID: hid-sensor-hub: fix indentation accross the code
  HID: move HID_REPORT_TYPES closer to the report-definitions
  HID: check for NULL field when setting values
  HID: picolcd_core: validate output report details
  HID: sensor-hub: validate feature report details
  HID: ntrig: validate feature report details
  HID: pantherlord: validate output report details
  HID: hid-wiimote: print small buffers via %*phC
  HID: uhid: improve uhid example client
  HID: Correct the USB IDs for the new Macbook Air 6
  HID: wiimote: add support for Guitar-Hero guitars
  HID: wiimote: add support for Guitar-Hero drums
  Input: introduce BTN/ABS bits for drums and guitars
  HID: battery: don't do DMA from stack
  HID: roccat: add support for KonePureOptical v2
  HID: picolcd: Prevent NULL pointer dereference on _remove()
  HID: usbhid: quirk for N-Trig DuoSense Touch Screen
  ...
2013-09-06 09:30:36 -07:00
J. Bruce Fields d4a516560f rpc: let xdr layer allocate gssproxy receieve pages
In theory the linux cred in a gssproxy reply can include up to
NGROUPS_MAX data, 256K of data.  In the common case we expect it to be
shorter.  So do as the nfsv3 ACL code does and let the xdr code allocate
the pages as they come in, instead of allocating a lot of pages that
won't typically be used.

Tested-by: Simo Sorce <simo@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-09-06 11:45:58 -04:00
J. Bruce Fields 9dfd87da1a rpc: fix huge kmalloc's in gss-proxy
The reply to a gssproxy can include up to NGROUPS_MAX gid's, which will
take up more than a page.  We therefore need to allocate an array of
pages to hold the reply instead of trying to allocate a single huge
buffer.

Tested-by: Simo Sorce <simo@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-09-06 11:45:58 -04:00
J. Bruce Fields 6a36978e69 rpc: comment on linux_cred encoding, treat all as unsigned
The encoding of linux creds is a bit confusing.

Also: I think in practice it doesn't really matter whether we treat any
of these things as signed or unsigned, but unsigned seems more
straightforward: uid_t/gid_t are unsigned and it simplifies the ngroups
overflow check.

Tested-by: Simo Sorce <simo@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-09-06 11:45:57 -04:00
J. Bruce Fields 778e512bb1 rpc: clean up decoding of gssproxy linux creds
We can use the normal coding infrastructure here.

Two minor behavior changes:

	- we're assuming no wasted space at the end of the linux cred.
	  That seems to match gss-proxy's behavior, and I can't see why
	  it would need to do differently in the future.

	- NGROUPS_MAX check added: note groups_alloc doesn't do this,
	  this is the caller's responsibility.

Tested-by: Simo Sorce <simo@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-09-06 11:45:56 -04:00
Linus Torvalds cc998ff881 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:
 "Noteworthy changes this time around:

   1) Multicast rejoin support for team driver, from Jiri Pirko.

   2) Centralize and simplify TCP RTT measurement handling in order to
      reduce the impact of bad RTO seeding from SYN/ACKs.  Also, when
      both timestamps and local RTT measurements are available prefer
      the later because there are broken middleware devices which
      scramble the timestamp.

      From Yuchung Cheng.

   3) Add TCP_NOTSENT_LOWAT socket option to limit the amount of kernel
      memory consumed to queue up unsend user data.  From Eric Dumazet.

   4) Add a "physical port ID" abstraction for network devices, from
      Jiri Pirko.

   5) Add a "suppress" operation to influence fib_rules lookups, from
      Stefan Tomanek.

   6) Add a networking development FAQ, from Paul Gortmaker.

   7) Extend the information provided by tcp_probe and add ipv6 support,
      from Daniel Borkmann.

   8) Use RCU locking more extensively in openvswitch data paths, from
      Pravin B Shelar.

   9) Add SCTP support to openvswitch, from Joe Stringer.

  10) Add EF10 chip support to SFC driver, from Ben Hutchings.

  11) Add new SYNPROXY netfilter target, from Patrick McHardy.

  12) Compute a rate approximation for sending in TCP sockets, and use
      this to more intelligently coalesce TSO frames.  Furthermore, add
      a new packet scheduler which takes advantage of this estimate when
      available.  From Eric Dumazet.

  13) Allow AF_PACKET fanouts with random selection, from Daniel
      Borkmann.

  14) Add ipv6 support to vxlan driver, from Cong Wang"

Resolved conflicts as per discussion.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1218 commits)
  openvswitch: Fix alignment of struct sw_flow_key.
  netfilter: Fix build errors with xt_socket.c
  tcp: Add missing braces to do_tcp_setsockopt
  caif: Add missing braces to multiline if in cfctrl_linkup_request
  bnx2x: Add missing braces in bnx2x:bnx2x_link_initialize
  vxlan: Fix kernel panic on device delete.
  net: mvneta: implement ->ndo_do_ioctl() to support PHY ioctls
  net: mvneta: properly disable HW PHY polling and ensure adjust_link() works
  icplus: Use netif_running to determine device state
  ethernet/arc/arc_emac: Fix huge delays in large file copies
  tuntap: orphan frags before trying to set tx timestamp
  tuntap: purge socket error queue on detach
  qlcnic: use standard NAPI weights
  ipv6:introduce function to find route for redirect
  bnx2x: VF RSS support - VF side
  bnx2x: VF RSS support - PF side
  vxlan: Notify drivers for listening UDP port changes
  net: usbnet: update addr_assign_type if appropriate
  driver/net: enic: update enic maintainers and driver
  driver/net: enic: Exposing symbols for Cisco's low latency driver
  ...
2013-09-05 14:54:29 -07:00
Jesse Gross 0d40f75bda openvswitch: Fix alignment of struct sw_flow_key.
sw_flow_key alignment was declared as " __aligned(__alignof__(long))".
However, this breaks on the m68k architecture where long is 32 bit in
size but 16 bit aligned by default. This aligns to the size of a long to
ensure that we can always do comparsions in full long-sized chunks. It
also adds an additional build check to catch any reduction in alignment.

CC: Andy Zhou <azhou@nicira.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05 15:54:37 -04:00
David S. Miller 06c54055be Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
	net/bridge/br_multicast.c
	net/ipv6/sit.c

The conflicts were minor:

1) sit.c changes overlap with change to ip_tunnel_xmit() signature.

2) br_multicast.c had an overlap between computing max_delay using
   msecs_to_jiffies and turning MLDV2_MRC() into an inline function
   with a name using lowercase instead of uppercase letters.

3) stmmac had two overlapping changes, one which conditionally allocated
   and hooked up a dma_cfg based upon the presence of the pbl OF property,
   and another one handling store-and-forward DMA made.  The latter of
   which should not go into the new of_find_property() basic block.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05 14:58:52 -04:00
David S. Miller 1a5bbfc3d6 netfilter: Fix build errors with xt_socket.c
As reported by Randy Dunlap:

====================
when CONFIG_IPV6=m
and CONFIG_NETFILTER_XT_MATCH_SOCKET=y:

net/built-in.o: In function `socket_mt6_v1_v2':
xt_socket.c:(.text+0x51b55): undefined reference to `udp6_lib_lookup'
net/built-in.o: In function `socket_mt_init':
xt_socket.c:(.init.text+0x1ef8): undefined reference to `nf_defrag_ipv6_enable'
====================

Like several other modules under net/netfilter/ we have to
have a dependency "IPV6 disabled or set compatibly with this
module" clause.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05 14:38:03 -04:00
Dave Jones e2e5c4c07c tcp: Add missing braces to do_tcp_setsockopt
Signed-off-by: Dave Jones <davej@fedoraproject.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05 14:31:02 -04:00
Dave Jones 0c1db731bf caif: Add missing braces to multiline if in cfctrl_linkup_request
The indentation here implies this was meant to be a multi-line if.

Introduced several years back in commit c85c2951d4
("caif: Handle dev_queue_xmit errors.")

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05 14:31:02 -04:00
Duan Jiong b55b76b221 ipv6:introduce function to find route for redirect
RFC 4861 says that the IP source address of the Redirect is the
same as the current first-hop router for the specified ICMP
Destination Address, so the gateway should be taken into
consideration when we find the route for redirect.

There was once a check in commit
a6279458c5 ("NDISC: Search over
all possible rules on receipt of redirect.") and the check
went away in commit b94f1c0904
("ipv6: Use icmpv6_notify() to propagate redirect, instead of
rt6_redirect()").

The bug is only "exploitable" on layer-2 because the source
address of the redirect is checked to be a valid link-local
address but it makes spoofing a lot easier in the same L2
domain nonetheless.

Thanks very much for Hannes's help.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05 12:44:31 -04:00
Linus Lüssing 3c3769e633 bridge: apply multicast snooping to IPv6 link-local, too
The multicast snooping code should have matured enough to be safely
applicable to IPv6 link-local multicast addresses (excluding the
link-local all nodes address, ff02::1), too.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05 12:35:53 -04:00
Linus Lüssing 8fad9c39f3 bridge: prevent flooding IPv6 packets that do not have a listener
Currently if there is no listener for a certain group then IPv6 packets
for that group are flooded on all ports, even though there might be no
host and router interested in it on a port.

With this commit they are only forwarded to ports with a multicast
router.

Just like commit bd4265fe36 ("bridge: Only flood unregistered groups
to routers") did for IPv4, let's do the same for IPv6 with the same
reasoning.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05 12:35:41 -04:00
Trond Myklebust 2f048db468 SUNRPC: Add an identifier for struct rpc_clnt
Add an identifier in order to aid debugging.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05 10:13:15 -04:00
Linus Torvalds 27703bb4a6 PTR_RET() is a weird name, and led to some confusing usage. We ended
up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages.
 
 This has been sitting in linux-next for a whole cycle.
 
 Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJSJo+1AAoJENkgDmzRrbjxIC4QALJK95o8AUXuwUkl+2fmFkUt
 hh2/PJ1vDYgk4Xt0J6hyoK7XMa0H1RkbBrROuDdsBnorMFpEsGcgdkUZte9ufoAS
 97Bg+7N0KPbTB/S8vOwtW1vbERTJIVPN2uf6h1Wqm9Xc2puCh3HbMMr1AWMGu0WQ
 NqY5+Zz8zecy1UOrMhEP6H1CjeQcL1w1DO6YM5ydeqlKNzAz+JMfDXriLPDwiE7+
 XFPDF/O3Vtd2ckA7L70Lio7hfHwxV5U4WwFVfiwls98XB4jcZqDKIoh1r8z4SRgR
 +0Rae2DN3BaOabGMr//5XdrzQVpwJTh5m2w8BAOHJvCJ9HR7Sq29UIN4u+TowZBy
 L2xYo4dvFxkympwu5zEd3c7vHYWKIaqmSq5PIjr4gF/uIo2OeOTrpPIK782ZEYb7
 e+qUgOEM05V9AmQZCrSZeP9u474Sj8ow3sCtWxfdRtwNfoEIcUXsNNJd/zDHlVtW
 cEtXqc2xXIpcuUJQWlSaGp8fmRQjVZPzrLKYLM2m39ZcOOJbf5rzQAYS7hHPosIa
 SK+YVux/+Zzi+Xo/vXq1OlM/SruCr5S7JOgCxLowoQ88vupgXME6uPyC8EO+QQ50
 GsrHes5ZNLbk0uVsfcexIyojkUnyvDmmnDpv+1zdC6RgZLJQn8OXp5yNhHhnhrFT
 BiHX6YFWtDDqRlVv8Q0F
 =LeaW
 -----END PGP SIGNATURE-----

Merge tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull PTR_RET() removal patches from Rusty Russell:
 "PTR_RET() is a weird name, and led to some confusing usage.  We ended
  up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages.

  This has been sitting in linux-next for a whole cycle"

[ There are still some PTR_RET users scattered about, with some of them
  possibly being new, but most of them existing in Rusty's tree too.  We
  have that

      #define PTR_RET(p) PTR_ERR_OR_ZERO(p)

  thing in <linux/err.h>, so they continue to work for now  - Linus ]

* tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  GFS2: Replace PTR_RET with PTR_ERR_OR_ZERO
  Btrfs: volume: Replace PTR_RET with PTR_ERR_OR_ZERO
  drm/cma: Replace PTR_RET with PTR_ERR_OR_ZERO
  sh_veu: Replace PTR_RET with PTR_ERR_OR_ZERO
  dma-buf: Replace PTR_RET with PTR_ERR_OR_ZERO
  drivers/rtc: Replace PTR_RET with PTR_ERR_OR_ZERO
  mm/oom_kill: remove weird use of ERR_PTR()/PTR_ERR().
  staging/zcache: don't use PTR_RET().
  remoteproc: don't use PTR_RET().
  pinctrl: don't use PTR_RET().
  acpi: Replace weird use of PTR_RET.
  s390: Replace weird use of PTR_RET.
  PTR_RET is now PTR_ERR_OR_ZERO(): Replace most.
  PTR_RET is now PTR_ERR_OR_ZERO
2013-09-04 17:31:11 -07:00
Daniel Borkmann b4af8def5c net: ipv6: mld: introduce mld_{gq, ifc, dad}_stop_timer functions
We already have mld_{gq,ifc,dad}_start_timer() functions, so introduce
mld_{gq,ifc,dad}_stop_timer() functions to reduce code size and make it
more readable.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 14:53:21 -04:00
Daniel Borkmann 2b7c121f82 net: ipv6: mld: refactor query processing into v1/v2 functions
Make igmp6_event_query() a bit easier to read by refactoring code
parts into mld_process_v1() and mld_process_v2().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 14:53:21 -04:00
Daniel Borkmann cc7f7ab758 net: ipv6: mld: similarly to MLDv2 have min max_delay of 1
Similarly as we do in MLDv2 queries, set a forged MLDv1 query with
0 ms mld_maxdelay to minimum timer shot time of 1 jiffies. This is
eventually done in igmp6_group_queried() anyway, so we can simplify
a check there.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 14:53:21 -04:00