Commit graph

33183 commits

Author SHA1 Message Date
Linus Torvalds 5615f9f822 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Bluetooth pairing fixes from Johan Hedberg.

 2) ieee80211_send_auth() doesn't allocate enough tail room for the SKB,
    from Max Stepanov.

 3) New iwlwifi chip IDs, from Oren Givon.

 4) bnx2x driver reads wrong PCI config space MSI register, from Yijing
    Wang.

 5) IPV6 MLD Query validation isn't strong enough, from Hangbin Liu.

 6) Fix double SKB free in openvswitch, from Andy Zhou.

 7) Fix sk_dst_set() being racey with UDP sockets, leading to strange
    crashes, from Eric Dumazet.

 8) Interpret the NAPI budget correctly in the new systemport driver,
    from Florian Fainelli.

 9) VLAN code frees percpu stats in the wrong place, leading to crashes
    in the get stats handler.  From Eric Dumazet.

10) TCP sockets doing a repair can crash with a divide by zero, because
    we invoke tcp_push() with an MSS value of zero.  Just skip that part
    of the sendmsg paths in repair mode.  From Christoph Paasch.

11) IRQ affinity bug fixes in mlx4 driver from Amir Vadai.

12) Don't ignore path MTU icmp messages with a zero mtu, machines out
    there still spit them out, and all of our per-protocol handlers for
    PMTU can cope with it just fine.  From Edward Allcutt.

13) Some NETDEV_CHANGE notifier invocations were not passing in the
    correct kind of cookie as the argument, from Loic Prylli.

14) Fix crashes in long multicast/broadcast reassembly, from Jon Paul
    Maloy.

15) ip_tunnel_lookup() doesn't interpret wildcard keys correctly, fix
    from Dmitry Popov.

16) Fix skb->sk assigned without taking a reference to 'sk' in
    appletalk, from Andrey Utkin.

17) Fix some info leaks in ULP event signalling to userspace in SCTP,
    from Daniel Borkmann.

18) Fix deadlocks in HSO driver, from Olivier Sobrie.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits)
  hso: fix deadlock when receiving bursts of data
  hso: remove unused workqueue
  net: ppp: don't call sk_chk_filter twice
  mlx4: mark napi id for gro_skb
  bonding: fix ad_select module param check
  net: pppoe: use correct channel MTU when using Multilink PPP
  neigh: sysctl - simplify address calculation of gc_* variables
  net: sctp: fix information leaks in ulpevent layer
  MAINTAINERS: update r8169 maintainer
  net: bcmgenet: fix RGMII_MODE_EN bit
  tipc: clear 'next'-pointer of message fragments before reassembly
  r8152: fix r8152_csum_workaround function
  be2net: set EQ DB clear-intr bit in be_open()
  GRE: enable offloads for GRE
  farsync: fix invalid memory accesses in fst_add_one() and fst_init_card()
  igb: do a reset on SR-IOV re-init if device is down
  igb: Workaround for i210 Errata 25: Slow System Clock
  usbnet: smsc95xx: add reset_resume function with reset operation
  dp83640: Always decode received status frames
  r8169: disable L23
  ...
2014-07-15 08:42:52 -07:00
Sasha Levin 3cf521f7dc net/l2tp: don't fall back on UDP [get|set]sockopt
The l2tp [get|set]sockopt() code has fallen back to the UDP functions
for socket option levels != SOL_PPPOL2TP since day one, but that has
never actually worked, since the l2tp socket isn't an inet socket.

As David Miller points out:

  "If we wanted this to work, it'd have to look up the tunnel and then
   use tunnel->sk, but I wonder how useful that would be"

Since this can never have worked so nobody could possibly have depended
on that functionality, just remove the broken code and return -EINVAL.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: James Chapman <jchapman@katalix.com>
Acked-by: David Miller <davem@davemloft.net>
Cc: Phil Turnbull <phil.turnbull@oracle.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-14 17:02:31 -07:00
Mathias Krause 9ecf07a1d8 neigh: sysctl - simplify address calculation of gc_* variables
The code in neigh_sysctl_register() relies on a specific layout of
struct neigh_table, namely that the 'gc_*' variables are directly
following the 'parms' member in a specific order. The code, though,
expresses this in the most ugly way.

Get rid of the ugly casts and use the 'tbl' pointer to get a handle to
the table. This way we can refer to the 'gc_*' variables directly.

Similarly seen in the grsecurity patch, written by Brad Spengler.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14 14:32:51 -07:00
Daniel Borkmann 8f2e5ae40e net: sctp: fix information leaks in ulpevent layer
While working on some other SCTP code, I noticed that some
structures shared with user space are leaking uninitialized
stack or heap buffer. In particular, struct sctp_sndrcvinfo
has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that
remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when
putting this into cmsg. But also struct sctp_remote_error
contains a 2 bytes hole that we don't fill but place into a skb
through skb_copy_expand() via sctp_ulpevent_make_remote_error().

Both structures are defined by the IETF in RFC6458:

* Section 5.3.2. SCTP Header Information Structure:

  The sctp_sndrcvinfo structure is defined below:

  struct sctp_sndrcvinfo {
    uint16_t sinfo_stream;
    uint16_t sinfo_ssn;
    uint16_t sinfo_flags;
    <-- 2 bytes hole  -->
    uint32_t sinfo_ppid;
    uint32_t sinfo_context;
    uint32_t sinfo_timetolive;
    uint32_t sinfo_tsn;
    uint32_t sinfo_cumtsn;
    sctp_assoc_t sinfo_assoc_id;
  };

* 6.1.3. SCTP_REMOTE_ERROR:

  A remote peer may send an Operation Error message to its peer.
  This message indicates a variety of error conditions on an
  association. The entire ERROR chunk as it appears on the wire
  is included in an SCTP_REMOTE_ERROR event. Please refer to the
  SCTP specification [RFC4960] and any extensions for a list of
  possible error formats. An SCTP error notification has the
  following format:

  struct sctp_remote_error {
    uint16_t sre_type;
    uint16_t sre_flags;
    uint32_t sre_length;
    uint16_t sre_error;
    <-- 2 bytes hole  -->
    sctp_assoc_t sre_assoc_id;
    uint8_t  sre_data[];
  };

Fix this by setting both to 0 before filling them out. We also
have other structures shared between user and kernel space in
SCTP that contains holes (e.g. struct sctp_paddrthlds), but we
copy that buffer over from user space first and thus don't need
to care about it in that cases.

While at it, we can also remove lengthy comments copied from
the draft, instead, we update the comment with the correct RFC
number where one can look it up.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14 14:18:56 -07:00
Jon Paul Maloy 999417549c tipc: clear 'next'-pointer of message fragments before reassembly
If the 'next' pointer of the last fragment buffer in a message is not
zeroed before reassembly, we risk ending up with a corrupt message,
since the reassembly function itself isn't doing this.

Currently, when a buffer is retrieved from the deferred queue of the
broadcast link, the next pointer is not cleared, with the result as
described above.

This commit corrects this, and thereby fixes a bug that may occur when
long broadcast messages are transmitted across dual interfaces. The bug
has been present since 40ba3cdf54 ("tipc:
message reassembly using fragment chain")

This commit should be applied to both net and net-next.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-11 15:02:10 -07:00
Amritha Nambiar d0a7ebbc11 GRE: enable offloads for GRE
To get offloads to work with Generic Routing Encapsulation (GRE), the
outer transport header has to be reset after skb_push is done. This
patch has the support for this fix and hence GRE offloading.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Tested-By: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-11 13:53:39 -07:00
Ben Pfaff ac30ef832e netlink: Fix handling of error from netlink_dump().
netlink_dump() returns a negative errno value on error.  Until now,
netlink_recvmsg() directly recorded that negative value in sk->sk_err, but
that's wrong since sk_err takes positive errno values.  (This manifests as
userspace receiving a positive return value from the recv() system call,
falsely indicating success.) This bug was introduced in the commit that
started checking the netlink_dump() return value, commit b44d211 (netlink:
handle errors from netlink_dump()).

Multithreaded Netlink dumps are one way to trigger this behavior in
practice, as described in the commit message for the userspace workaround
posted here:
    http://openvswitch.org/pipermail/dev/2014-June/042339.html

This commit also fixes the same bug in netlink_poll(), introduced in commit
cd1df525d (netlink: add flow control for memory mapped I/O).

Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-09 14:33:47 -07:00
Andrey Utkin 36beddc272 appletalk: Fix socket referencing in skb
Setting just skb->sk without taking its reference and setting a
destructor is invalid. However, in the places where this was done, skb
is used in a way not requiring skb->sk setting. So dropping the setting
of skb->sk.
Thanks to Eric Dumazet <eric.dumazet@gmail.com> for correct solution.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79441
Reported-by: Ed Martin <edman007@edman007.com>
Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08 19:39:43 -07:00
Dmitry Popov e0056593b6 ip_tunnel: fix ip_tunnel_lookup
This patch fixes 3 similar bugs where incoming packets might be routed into
wrong non-wildcard tunnels:

1) Consider the following setup:
    ip address add 1.1.1.1/24 dev eth0
    ip address add 1.1.1.2/24 dev eth0
    ip tunnel add ipip1 remote 2.2.2.2 local 1.1.1.1 mode ipip dev eth0
    ip link set ipip1 up

Incoming ipip packets from 2.2.2.2 were routed into ipip1 even if it has dst =
1.1.1.2. Moreover even if there was wildcard tunnel like
   ip tunnel add ipip0 remote 2.2.2.2 local any mode ipip dev eth0
but it was created before explicit one (with local 1.1.1.1), incoming ipip
packets with src = 2.2.2.2 and dst = 1.1.1.2 were still routed into ipip1.

Same issue existed with all tunnels that use ip_tunnel_lookup (gre, vti)

2)  ip address add 1.1.1.1/24 dev eth0
    ip tunnel add ipip1 remote 2.2.146.85 local 1.1.1.1 mode ipip dev eth0
    ip link set ipip1 up

Incoming ipip packets with dst = 1.1.1.1 were routed into ipip1, no matter what
src address is. Any remote ip address which has ip_tunnel_hash = 0 raised this
issue, 2.2.146.85 is just an example, there are more than 4 million of them.
And again, wildcard tunnel like
   ip tunnel add ipip0 remote any local 1.1.1.1 mode ipip dev eth0
wouldn't be ever matched if it was created before explicit tunnel like above.

Gre & vti tunnels had the same issue.

3)  ip address add 1.1.1.1/24 dev eth0
    ip tunnel add gre1 remote 2.2.146.84 local 1.1.1.1 key 1 mode gre dev eth0
    ip link set gre1 up

Any incoming gre packet with key = 1 were routed into gre1, no matter what
src/dst addresses are. Any remote ip address which has ip_tunnel_hash = 0 raised
the issue, 2.2.146.84 is just an example, there are more than 4 million of them.
Wildcard tunnel like
   ip tunnel add gre2 remote any local any key 1 mode gre dev eth0
wouldn't be ever matched if it was created before explicit tunnel like above.

All this stuff happened because while looking for a wildcard tunnel we didn't
check that matched tunnel is a wildcard one. Fixed.

Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08 19:35:09 -07:00
Jon Paul Maloy 29322d0db9 tipc: fix bug in multicast/broadcast message reassembly
Since commit 37e22164a8 ("tipc: rename and
move message reassembly function") reassembly of long broadcast messages
has been broken. This is because we test for a non-NULL return value
of the *buf parameter as criteria for succesful reassembly. However, this
parameter is left defined even after reception of the first fragment,
when reassebly is still incomplete. This leads to a kernel crash as soon
as a the first fragment of a long broadcast message is received.

We fix this with this commit, by implementing a stricter behavior of the
function and its return values.

This commit should be applied to both net and net-next.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08 15:55:09 -07:00
Yuchung Cheng 6e08d5e3c8 tcp: fix false undo corner cases
The undo code assumes that, upon entering loss recovery, TCP
1) always retransmit something
2) the retransmission never fails locally (e.g., qdisc drop)

so undo_marker is set in tcp_enter_recovery() and undo_retrans is
incremented only when tcp_retransmit_skb() is successful.

When the assumption is broken because TCP's cwnd is too small to
retransmit or the retransmit fails locally. The next (DUP)ACK
would incorrectly revert the cwnd and the congestion state in
tcp_try_undo_dsack() or tcp_may_undo(). Subsequent (DUP)ACKs
may enter the recovery state. The sender repeatedly enter and
(incorrectly) exit recovery states if the retransmits continue to
fail locally while receiving (DUP)ACKs.

The fix is to initialize undo_retrans to -1 and start counting on
the first retransmission. Always increment undo_retrans even if the
retransmissions fail locally because they couldn't cause DSACKs to
undo the cwnd reduction.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07 21:40:48 -07:00
dingtianhong 52ad353a53 igmp: fix the problem when mc leave group
The problem was triggered by these steps:

1) create socket, bind and then setsockopt for add mc group.
   mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
   mreq.imr_interface.s_addr = inet_addr("192.168.1.2");
   setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq));

2) drop the mc group for this socket.
   mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
   mreq.imr_interface.s_addr = inet_addr("0.0.0.0");
   setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq));

3) and then drop the socket, I found the mc group was still used by the dev:

   netstat -g

   Interface       RefCnt Group
   --------------- ------ ---------------------
   eth2		   1	  255.0.0.37

Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need
to be released for the netdev when drop the socket, but this process was broken when
route default is NULL, the reason is that:

The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr
is NULL, the default route dev will be chosen, then the ifindex is got from the dev,
then polling the inet->mc_list and return -ENODEV, but if the default route dev is NULL,
the in_dev and ifIndex is both NULL, when polling the inet->mc_list, the mc group will be
released from the mc_list, but the dev didn't dec the refcnt for this mc group, so
when dropping the socket, the mc_list is NULL and the dev still keep this group.

v1->v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs,
	so I add the checking for the in_dev before polling the mc_list, make sure when
	we remove the mc group, dec the refcnt to the real dev which was using the mc address.
	The problem would never happened again.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07 21:30:55 -07:00
Loic Prylli 5495119465 net: Fix NETDEV_CHANGE notifier usage causing spurious arp flush
A bug was introduced in NETDEV_CHANGE notifier sequence causing the
arp table to be sometimes spuriously cleared (including manual arp
entries marked permanent), upon network link carrier changes.

The changed argument for the notifier was applied only to a single
caller of NETDEV_CHANGE, missing among others netdev_state_change().
So upon net_carrier events induced by the network, which are
triggering a call to netdev_state_change(), arp_netdev_event() would
decide whether to clear or not arp cache based on random/junk stack
values (a kind of read buffer overflow).

Fixes: be9efd3653 ("net: pass changed flags along with NETDEV_CHANGE event")
Fixes: 6c8b4e3ff8 ("arp: flush arp cache on IFF_NOARP change")
Signed-off-by: Loic Prylli <loicp@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07 21:20:01 -07:00
David S. Miller edc1bb0bd7 Merge branch 'net_ovs_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch
Pravin B Shelar says:

====================
Open vSwitch

A set of fixes for net.
First bug is related flow-table management.  Second one is in sample
action. Third is related flow stats and last one add gre-err handler for ovs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07 19:39:34 -07:00
Tom Herbert 11ef7a8996 net: Performance fix for process_backlog
In process_backlog the input_pkt_queue is only checked once for new
packets and quota is artificially reduced to reflect precisely the
number of packets on the input_pkt_queue so that the loop exits
appropriately.

This patches changes the behavior to be more straightforward and
less convoluted. Packets are processed until either the quota
is met or there are no more packets to process.

This patch seems to provide a small, but noticeable performance
improvement. The performance improvement is a result of staying
in the process_backlog loop longer which can reduce number of IPI's.

Performance data using super_netperf TCP_RR with 200 flows:

Before fix:

88.06% CPU utilization
125/190/309 90/95/99% latencies
1.46808e+06 tps
1145382 intrs.sec.

With fix:

87.73% CPU utilization
122/183/296 90/95/99% latencies
1.4921e+06 tps
1021674.30 intrs./sec.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07 19:24:34 -07:00
Edward Allcutt 68b7107b62 ipv4: icmp: Fix pMTU handling for rare case
Some older router implementations still send Fragmentation Needed
errors with the Next-Hop MTU field set to zero. This is explicitly
described as an eventuality that hosts must deal with by the
standard (RFC 1191) since older standards specified that those
bits must be zero.

Linux had a generic (for all of IPv4) implementation of the algorithm
described in the RFC for searching a list of MTU plateaus for a good
value. Commit 46517008e1 ("ipv4: Kill ip_rt_frag_needed().")
removed this as part of the changes to remove the routing cache.
Subsequently any Fragmentation Needed packet with a zero Next-Hop
MTU has been discarded without being passed to the per-protocol
handlers or notifying userspace for raw sockets.

When there is a router which does not implement RFC 1191 on an
MTU limited path then this results in stalled connections since
large packets are discarded and the local protocols are not
notified so they never attempt to lower the pMTU.

One example I have seen is an OpenBSD router terminating IPSec
tunnels. It's worth pointing out that this case is distinct from
the BSD 4.2 bug which incorrectly calculated the Next-Hop MTU
since the commit in question dismissed that as a valid concern.

All of the per-protocols handlers implement the simple approach from
RFC 1191 of immediately falling back to the minimum value. Although
this is sub-optimal it is vastly preferable to connections hanging
indefinitely.

Remove the Next-Hop MTU != 0 check and allow such packets
to follow the normal path.

Fixes: 46517008e1 ("ipv4: Kill ip_rt_frag_needed().")
Signed-off-by: Edward Allcutt <edward.allcutt@openmarket.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07 17:22:57 -07:00
Christoph Paasch 5924f17a8a tcp: Fix divide by zero when pushing during tcp-repair
When in repair-mode and TCP_RECV_QUEUE is set, we end up calling
tcp_push with mss_now being 0. If data is in the send-queue and
tcp_set_skb_tso_segs gets called, we crash because it will divide by
mss_now:

[  347.151939] divide error: 0000 [#1] SMP
[  347.152907] Modules linked in:
[  347.152907] CPU: 1 PID: 1123 Comm: packetdrill Not tainted 3.16.0-rc2 #4
[  347.152907] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  347.152907] task: f5b88540 ti: f3c82000 task.ti: f3c82000
[  347.152907] EIP: 0060:[<c1601359>] EFLAGS: 00210246 CPU: 1
[  347.152907] EIP is at tcp_set_skb_tso_segs+0x49/0xa0
[  347.152907] EAX: 00000b67 EBX: f5acd080 ECX: 00000000 EDX: 00000000
[  347.152907] ESI: f5a28f40 EDI: f3c88f00 EBP: f3c83d10 ESP: f3c83d00
[  347.152907]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  347.152907] CR0: 80050033 CR2: 083158b0 CR3: 35146000 CR4: 000006b0
[  347.152907] Stack:
[  347.152907]  c167f9d9 f5acd080 000005b4 00000002 f3c83d20 c16013e6 f3c88f00 f5acd080
[  347.152907]  f3c83da0 c1603b5a f3c83d38 c10a0188 00000000 00000000 f3c83d84 c10acc85
[  347.152907]  c1ad5ec0 00000000 00000000 c1ad679c 010003e0 00000000 00000000 f3c88fc8
[  347.152907] Call Trace:
[  347.152907]  [<c167f9d9>] ? apic_timer_interrupt+0x2d/0x34
[  347.152907]  [<c16013e6>] tcp_init_tso_segs+0x36/0x50
[  347.152907]  [<c1603b5a>] tcp_write_xmit+0x7a/0xbf0
[  347.152907]  [<c10a0188>] ? up+0x28/0x40
[  347.152907]  [<c10acc85>] ? console_unlock+0x295/0x480
[  347.152907]  [<c10ad24f>] ? vprintk_emit+0x1ef/0x4b0
[  347.152907]  [<c1605716>] __tcp_push_pending_frames+0x36/0xd0
[  347.152907]  [<c15f4860>] tcp_push+0xf0/0x120
[  347.152907]  [<c15f7641>] tcp_sendmsg+0xf1/0xbf0
[  347.152907]  [<c116d920>] ? kmem_cache_free+0xf0/0x120
[  347.152907]  [<c106a682>] ? __sigqueue_free+0x32/0x40
[  347.152907]  [<c106a682>] ? __sigqueue_free+0x32/0x40
[  347.152907]  [<c114f0f0>] ? do_wp_page+0x3e0/0x850
[  347.152907]  [<c161c36a>] inet_sendmsg+0x4a/0xb0
[  347.152907]  [<c1150269>] ? handle_mm_fault+0x709/0xfb0
[  347.152907]  [<c15a006b>] sock_aio_write+0xbb/0xd0
[  347.152907]  [<c1180b79>] do_sync_write+0x69/0xa0
[  347.152907]  [<c1181023>] vfs_write+0x123/0x160
[  347.152907]  [<c1181d55>] SyS_write+0x55/0xb0
[  347.152907]  [<c167f0d8>] sysenter_do_call+0x12/0x28

This can easily be reproduced with the following packetdrill-script (the
"magic" with netem, sk_pacing and limit_output_bytes is done to prevent
the kernel from pushing all segments, because hitting the limit without
doing this is not so easy with packetdrill):

0   socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0  setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

+0  bind(3, ..., ...) = 0
+0  listen(3, 1) = 0

+0  < S 0:0(0) win 32792 <mss 1460>
+0  > S. 0:0(0) ack 1 <mss 1460>
+0.1  < . 1:1(0) ack 1 win 65000

+0  accept(3, ..., ...) = 4

// This forces that not all segments of the snd-queue will be pushed
+0 `tc qdisc add dev tun0 root netem delay 10ms`
+0 `sysctl -w net.ipv4.tcp_limit_output_bytes=2`
+0 setsockopt(4, SOL_SOCKET, 47, [2], 4) = 0

+0 write(4,...,10000) = 10000
+0 write(4,...,10000) = 10000

// Set tcp-repair stuff, particularly TCP_RECV_QUEUE
+0 setsockopt(4, SOL_TCP, 19, [1], 4) = 0
+0 setsockopt(4, SOL_TCP, 20, [1], 4) = 0

// This now will make the write push the remaining segments
+0 setsockopt(4, SOL_SOCKET, 47, [20000], 4) = 0
+0 `sysctl -w net.ipv4.tcp_limit_output_bytes=130000`

// Now we will crash
+0 write(4,...,1000) = 1000

This happens since ec34232575 (tcp: fix retransmission in repair
mode). Prior to that, the call to tcp_push was prevented by a check for
tp->repair.

The patch fixes it, by adding the new goto-label out_nopush. When exiting
tcp_sendmsg and a push is not required, which is the case for tp->repair,
we go to this label.

When repairing and calling send() with TCP_RECV_QUEUE, the data is
actually put in the receive-queue. So, no push is required because no
data has been added to the send-queue.

Cc: Andrew Vagin <avagin@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Fixes: ec34232575 (tcp: fix retransmission in repair mode)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Acked-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-02 18:21:03 -07:00
Eric Dumazet a48e5fafec vlan: free percpu stats in device destructor
Madalin-Cristian reported crashs happening after a recent commit
(5a4ae5f6e7 "vlan: unnecessary to check if vlan_pcpu_stats is NULL")

-----------------------------------------------------------------------
root@p5040ds:~# vconfig add eth8 1
root@p5040ds:~# vconfig rem eth8.1
Unable to handle kernel paging request for data at address 0x2bc88028
Faulting instruction address: 0xc058e950
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=8 CoreNet Generic
Modules linked in:
CPU: 3 PID: 2167 Comm: vconfig Tainted: G        W     3.16.0-rc3-00346-g65e85bf #2
task: e7264d90 ti: e2c2c000 task.ti: e2c2c000
NIP: c058e950 LR: c058ea30 CTR: c058e900
REGS: e2c2db20 TRAP: 0300   Tainted: G        W      (3.16.0-rc3-00346-g65e85bf)
MSR: 00029002 <CE,EE,ME>  CR: 48000428  XER: 20000000
DEAR: 2bc88028 ESR: 00000000
GPR00: c047299c e2c2dbd0 e7264d90 00000000 2bc88000 00000000 ffffffff 00000000
GPR08: 0000000f 00000000 000000ff 00000000 28000422 10121928 10100000 10100000
GPR16: 10100000 00000000 c07c5968 00000000 00000000 00000000 e2c2dc48 e7838000
GPR24: c07c5bac c07c58a8 e77290cc c07b0000 00000000 c05de6c0 e7838000 e2c2dc48
NIP [c058e950] vlan_dev_get_stats64+0x50/0x170
LR [c058ea30] vlan_dev_get_stats64+0x130/0x170
Call Trace:
[e2c2dbd0] [ffffffea] 0xffffffea (unreliable)
[e2c2dc20] [c047299c] dev_get_stats+0x4c/0x140
[e2c2dc40] [c0488ca8] rtnl_fill_ifinfo+0x3d8/0x960
[e2c2dd70] [c0489f4c] rtmsg_ifinfo+0x6c/0x110
[e2c2dd90] [c04731d4] rollback_registered_many+0x344/0x3b0
[e2c2ddd0] [c047332c] rollback_registered+0x2c/0x50
[e2c2ddf0] [c0476058] unregister_netdevice_queue+0x78/0xf0
[e2c2de00] [c058d800] unregister_vlan_dev+0xc0/0x160
[e2c2de20] [c058e360] vlan_ioctl_handler+0x1c0/0x550
[e2c2de90] [c045d11c] sock_ioctl+0x28c/0x2f0
[e2c2deb0] [c010d070] do_vfs_ioctl+0x90/0x7b0
[e2c2df20] [c010d7d0] SyS_ioctl+0x40/0x80
[e2c2df40] [c000f924] ret_from_syscall+0x0/0x3c

Fix this problem by freeing percpu stats from dev->destructor() instead
of ndo_uninit()

Reported-by: Madalin-Cristian Bucur <madalin.bucur@freescale.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Madalin-Cristian Bucur <madalin.bucur@freescale.com>
Fixes: 5a4ae5f6e7 ("vlan: unnecessary to check if vlan_pcpu_stats is NULL")
Cc: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-02 17:02:05 -07:00
David S. Miller eb608d2b99 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-06-27

Please pull the following batch of fixes for the 3.16 stream...

For the mac80211 bits, Johannes says:

"We have a fix from Eliad for a time calculation, a fix from Max for
head/tailroom when sending authentication packets, a revert that Felix
requested since the patch in question broke regulatory and a fix from
myself for an issue with a new command that we advertised in the wrong
place."

For the bluetooth bits, Gustavo says:

"A few fixes for 3.16. This pull request contains a NULL dereference fix,
and some security/pairing fixes."

For the iwlwifi bits, Emmanuel says:

"I have here a fix from Eliad for scheduled scan: it fixes a firmware
assertion. Arik reverts a patch I made that didn't take into account
that 3160 doesn't have UAPSD and hence, we can't assume that all
newer firmwares support the feature. Here too, the visible effect
is a firmware assertion. Along with that, we have a few fixes and
additions to the device list."

For the ath10k bits, Kalle says:

"Bartosz fixed an issue where we were not able to create 8 vdevs when
using DFS. Michal removed a false warning which was just confusing
people."

On top of that...

Arend van Spriel fixes a 'divide by zero' regression in brcmfmac.

Amitkumar Karwar corrects a transmit timeout in mwifiex.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-01 23:47:33 -07:00
Eric Dumazet 7f50236153 ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix
We have two different ways to handle changes to sk->sk_dst

First way (used by TCP) assumes socket lock is owned by caller, and use
no extra lock : __sk_dst_set() & __sk_dst_reset()

Another way (used by UDP) uses sk_dst_lock because socket lock is not
always taken. Note that sk_dst_lock is not softirq safe.

These ways are not inter changeable for a given socket type.

ipv4_sk_update_pmtu(), added in linux-3.8, added a race, as it used
the socket lock as synchronization, but users might be UDP sockets.

Instead of converting sk_dst_lock to a softirq safe version, use xchg()
as we did for sk_rx_dst in commit e47eb5dfb2 ("udp: ipv4: do not use
sk_dst_lock from softirq context")

In a follow up patch, we probably can remove sk_dst_lock, as it is
only used in IPv6.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Fixes: 9cb3a50c5f ("ipv4: Invalidate the socket cached route on pmtu events if possible")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-30 23:40:58 -07:00
Alex Wang 4a46b24e14 openvswitch: Use exact lookup for flow_get and flow_del.
Due to the race condition in userspace, there is chance that two
overlapping megaflows could be installed in datapath.  And this
causes userspace unable to delete the less inclusive megaflow flow
even after it timeout, since the flow_del logic will stop at the
first match of masked flow.

This commit fixes the bug by making the kernel flow_del and flow_get
logic check all masks in that case.

Introduced by 03f0d916a (openvswitch: Mega flow implementation).

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-30 20:47:15 -07:00
Ben Pfaff ad55200734 openvswitch: Fix tracking of flags seen in TCP flows.
Flow statistics need to take into account the TCP flags from the packet
currently being processed (in 'key'), not the TCP flags matched by the
flow found in the kernel flow table (in 'flow').

This bug made the Open vSwitch userspace fin_timeout action have no effect
in many cases.
This bug is introduced by commit 88d73f6c41 (openvswitch: Use
TCP flags in the flow key for stats.)

Reported-by: Len Gao <leng@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-29 14:10:51 -07:00
Wei Zhang e0bb8c44ed openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash
When use gre vport, openvswitch register a gre_cisco_protocol but
does not supply a err_handler with it. The gre_cisco_err() in
net/ipv4/gre_demux.c expect err_handler be provided with the
gre_cisco_protocol implementation, and call ->err_handler() without
existence check, cause the kernel crash.

This patch provide a err_handler to fix this bug.
This bug introduced by commit aa310701e7 (openvswitch: Add gre
tunnel support.)

Signed-off-by: Wei Zhang <asuka.com@163.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-29 14:10:48 -07:00
Andy Zhou fe984c08e2 openvswitch: Fix a double free bug for the sample action
When sample action returns with an error, the skb has already been
freed. This patch fix a bug to make sure we don't free it again.
This bug introduced by commit ccb1352e76 (net: Add Open vSwitch
kernel components.)

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-29 14:10:43 -07:00
Linus Torvalds eb477e03fe Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "Mostly minor fixes this time around.  The highlights include:

   - iscsi-target CHAP authentication fixes to enforce explicit key
     values (Tejas Vaykole + rahul.rane)
   - fix a long-standing OOPs in target-core when a alua configfs
     attribute is accessed after port symlink has been removed.
     (Sebastian Herbszt)
   - fix a v3.10.y iscsi-target regression causing the login reject
     status class/detail to be ignored (Christoph Vu-Brugier)
   - fix a v3.10.y iscsi-target regression to avoid rejecting an
     existing ITT during Data-Out when data-direction is wrong (Santosh
     Kulkarni + Arshad Hussain)
   - fix a iscsi-target related shutdown deadlock on UP kernels (Mikulas
     Patocka)
   - fix a v3.16-rc1 build issue with vhost-scsi + !CONFIG_NET (MST)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  iscsi-target: fix iscsit_del_np deadlock on unload
  iovec: move memcpy_from/toiovecend to lib/iovec.c
  iscsi-target: Avoid rejecting incorrect ITT for Data-Out
  tcm_loop: Fix memory leak in tcm_loop_submission_work error path
  iscsi-target: Explicily clear login response PDU in exception path
  target: Fix left-over se_lun->lun_sep pointer OOPs
  iscsi-target; Enforce 1024 byte maximum for CHAP_C key value
  iscsi-target: Convert chap_server_compute_md5 to use kstrtoul
2014-06-28 09:43:58 -07:00
Michael S. Tsirkin ac5ccdba3a iovec: move memcpy_from/toiovecend to lib/iovec.c
ERROR: "memcpy_fromiovecend" [drivers/vhost/vhost_scsi.ko] undefined!

commit 9f977ef7b6
    vhost-scsi: Include prot_bytes into expected data transfer length
in target-pending makes drivers/vhost/scsi.c call memcpy_fromiovecend().
This function is not available when CONFIG_NET is not enabled.

socket.h already includes uio.h, so no callers need updating.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-27 11:47:58 -07:00
John W. Linville f9fa39e9ac Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2014-06-27 13:35:56 -04:00
Hangbin Liu e940f5d6ba ipv6: Fix MLD Query message check
Based on RFC3810 6.2, we also need to check the hop limit and router alert
option besides source address.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 00:21:50 -07:00
James M Leddy 3e215c8d1b udp: Add MIB counters for rcvbuferrors
Add MIB counters for rcvbuferrors in UDP to help diagnose problems.

Signed-off-by: James M Leddy <james.leddy@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 00:20:55 -07:00
Linus Torvalds f40ede392d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix crash in ipvs tot_stats estimator, from Julian Anastasov.

 2) Fix OOPS in nf_nat on netns removal, from Florian Westphal.

 3) Really really really fix locking issues in slip and slcan tty write
    wakeups, from Tyler Hall.

 4) Fix checksum offloading in fec driver, from Fugang Duan.

 5) Off by one in BPF instruction limit test, from Kees Cook.

 6) Need to clear all TSO capability flags when doing software TSO in
    tg3 driver, from Prashant Sreedharan.

 7) Fix memory leak in vlan_reorder_header() error path, from Li
    RongQing.

 8) Fix various bugs in xen-netfront and xen-netback multiqueue support,
    from David Vrabel and Wei Liu.

 9) Fix deadlock in cxgb4 driver, from Li RongQing.

10) Prevent double free of no-cache DST entries, from Eric Dumazet.

11) Bad csum_start handling in skb_segment() leads to crashes when
    forwarding, from Tom Herbert.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
  net: fix setting csum_start in skb_segment()
  ipv4: fix dst race in sk_dst_get()
  net: filter: Use kcalloc/kmalloc_array to allocate arrays
  trivial: net: filter: Change kerneldoc parameter order
  trivial: net: filter: Fix typo in comment
  net: allwinner: emac: Add missing free_irq
  cxgb4: use dev_port to identify ports
  xen-netback: bookkeep number of active queues in our own module
  tg3: Change nvram command timeout value to 50ms
  cxgb4: Not need to hold the adap_rcu_lock lock when read adap_rcu_list
  be2net: fix qnq mode detection on VFs
  of: mdio: fixup of_phy_register_fixed_link parsing of new bindings
  at86rf230: fix irq setup
  net: phy: at803x: fix coccinelle warnings
  net/mlx4_core: Fix the error flow when probing with invalid VF configuration
  tulip: Poll link status more frequently for Comet chips
  net: huawei_cdc_ncm: increase command buffer size
  drivers: net: cpsw: fix dual EMAC stall when connected to same switch
  xen-netfront: recreate queues correctly when reconnecting
  xen-netfront: fix oops when disconnected from backend
  ...
2014-06-25 21:08:24 -07:00
Tom Herbert de843723f9 net: fix setting csum_start in skb_segment()
Dave Jones reported that a crash is occurring in

csum_partial
tcp_gso_segment
inet_gso_segment
? update_dl_migration
skb_mac_gso_segment
__skb_gso_segment
dev_hard_start_xmit
sch_direct_xmit
__dev_queue_xmit
? dev_hard_start_xmit
dev_queue_xmit
ip_finish_output
? ip_output
ip_output
ip_forward_finish
ip_forward
ip_rcv_finish
ip_rcv
__netif_receive_skb_core
? __netif_receive_skb_core
? trace_hardirqs_on
__netif_receive_skb
netif_receive_skb_internal
napi_gro_complete
? napi_gro_complete
dev_gro_receive
? dev_gro_receive
napi_gro_receive

It looks like a likely culprit is that SKB_GSO_CB()->csum_start is
not set correctly when doing non-scatter gather. We are using
offset as opposed to doffset.

Reported-by: Dave Jones <davej@redhat.com>
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 7e2b10c1e5 ("net: Support for multiple checksums with gso")
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25 20:45:54 -07:00
Eric Dumazet f886497212 ipv4: fix dst race in sk_dst_get()
When IP route cache had been removed in linux-3.6, we broke assumption
that dst entries were all freed after rcu grace period. DST_NOCACHE
dst were supposed to be freed from dst_release(). But it appears
we want to keep such dst around, either in UDP sockets or tunnels.

In sk_dst_get() we need to make sure dst refcount is not 0
before incrementing it, or else we might end up freeing a dst
twice.

DST_NOCACHE set on a dst does not mean this dst can not be attached
to a socket or a tunnel.

Then, before actual freeing, we need to observe a rcu grace period
to make sure all other cpus can catch the fact the dst is no longer
usable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dormando <dormando@rydia.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25 17:41:44 -07:00
Tobias Klauser 99e72a0fed net: filter: Use kcalloc/kmalloc_array to allocate arrays
Use kcalloc/kmalloc_array to make it clear we're allocating arrays. No
integer overflow can actually happen here, since len/flen is guaranteed
to be less than BPF_MAXINSNS (4096). However, this changed makes sure
we're not going to get one if BPF_MAXINSNS were ever increased.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25 16:40:02 -07:00
Tobias Klauser 677a9fd3e6 trivial: net: filter: Change kerneldoc parameter order
Change the order of the parameters to sk_unattached_filter_create() in
the kerneldoc to reflect the order they appear in the actual function.

This fix is only cosmetic, in the generated doc they still appear in the
correct order without the fix.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25 16:38:54 -07:00
Tobias Klauser 285276e72c trivial: net: filter: Fix typo in comment
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25 16:38:54 -07:00
John W. Linville 8b87efba61 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2014-06-25 14:22:35 -04:00
John W. Linville 12307e4208 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2014-06-25 14:17:50 -04:00
Andy Adamson 66b0686049 NFSv4: test SECINFO RPC_AUTH_GSS pseudoflavors for support
Fix nfs4_negotiate_security to create an rpc_clnt used to test each SECINFO
returned pseudoflavor. Check credential creation  (and gss_context creation)
which is important for RPC_AUTH_GSS pseudoflavors which can fail for multiple
reasons including mis-configuration.

Don't call nfs4_negotiate in nfs4_submount as it was just called by
nfs4_proc_lookup_mountpoint (nfs4_proc_lookup_common)

Signed-off-by: Andy Adamson <andros@netapp.com>
[Trond: fix corrupt return value from nfs_find_best_sec()]
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24 18:46:58 -04:00
Johannes Berg 02df00eb00 nl80211: move set_qos_map command into split state
The non-split wiphy state shouldn't be increased in size
so move the new set_qos_map command into the split if
statement.

Cc: stable@vger.kernel.org (3.14+)
Fixes: fa9ffc7456 ("cfg80211: Add support for QoS mapping")
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-24 16:13:10 +02:00
Eliad Peller 0ce12026d6 cfg80211: fix elapsed_jiffies calculation
MAX_JIFFY_OFFSET has no meaning when calculating the
elapsed jiffies, as jiffies run out until ULONG_MAX.

This miscalculation results in erroneous values
in case of a wrap-around.

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 11:29:25 +02:00
Johannes Berg e33e2241e2 Revert "cfg80211: Use 5MHz bandwidth by default when checking usable channels"
This reverts commit 8eca1fb692.

Felix notes that this broke regulatory, leaving channel 12 open for AP
operation in the US regulatory domain where it isn't permitted.

Link: http://mid.gmane.org/53A6C0FF.9090104@openwrt.org
Reported-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 11:06:28 +02:00
Max Stepanov 744462a91e mac80211: WEP extra head/tail room in ieee80211_send_auth
After skb allocation and call to ieee80211_wep_encrypt in ieee80211_send_auth
the flow fails with a warning in ieee80211_wep_add_iv on verification of
available head/tailroom needed for WEP_IV and WEP_ICV.

Signed-off-by: Max Stepanov <Max.Stepanov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 10:45:14 +02:00
Li RongQing 916c1689a0 8021q: fix a potential memory leak
skb_cow called in vlan_reorder_header does not free the skb when it failed,
and vlan_reorder_header returns NULL to reset original skb when it is called
in vlan_untag, lead to a memory leak.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-21 15:12:13 -07:00
Lukasz Rymanowski 1d56dc4f5f Bluetooth: Fix for ACL disconnect when pairing fails
When pairing fails hci_conn refcnt drops below zero. This cause that
ACL link is not disconnected when disconnect timeout fires.

Probably this is because l2cap_conn_del calls l2cap_chan_del for each
channel, and inside l2cap_chan_del conn is dropped. After that loop
hci_chan_del is called which also drops conn.

Anyway, as it is desrcibed in hci_core.h, it is known that refcnt
drops below 0 sometimes and it should be fine. If so, let disconnect
link when hci_conn_timeout fires and refcnt is 0 or below. This patch
does it.

This affects PTS test SM_TC_JW_BV_05_C

Logs from scenario:

[69713.706227] [6515] pair_device:
[69713.706230] [6515] hci_conn_add: hci0 dst 00:1b:dc:06:06:22
[69713.706233] [6515] hci_dev_hold: hci0 orig refcnt 8
[69713.706235] [6515] hci_conn_init_sysfs: conn ffff88021f65a000
[69713.706239] [6515] hci_req_add_ev: hci0 opcode 0x200d plen 25
[69713.706242] [6515] hci_prepare_cmd: skb len 28
[69713.706243] [6515] hci_req_run: length 1
[69713.706248] [6515] hci_conn_hold: hcon ffff88021f65a000 orig refcnt 0
[69713.706251] [6515] hci_dev_put: hci0 orig refcnt 9
[69713.706281] [8909] hci_cmd_work: hci0 cmd_cnt 1 cmd queued 1
[69713.706288] [8909] hci_send_frame: hci0 type 1 len 28
[69713.706290] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 28
[69713.706316] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000
[69713.706382] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000
[69713.711664] [8909] hci_rx_work: hci0
[69713.711668] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 6
[69713.711680] [8909] hci_rx_work: hci0 Event packet
[69713.711683] [8909] hci_cs_le_create_conn: hci0 status 0x00
[69713.711685] [8909] hci_sent_cmd_data: hci0 opcode 0x200d
[69713.711688] [8909] hci_req_cmd_complete: opcode 0x200d status 0x00
[69713.711690] [8909] hci_sent_cmd_data: hci0 opcode 0x200d
[69713.711695] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000
[69713.711744] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000
[69713.818875] [8909] hci_rx_work: hci0
[69713.818889] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 21
[69713.818913] [8909] hci_rx_work: hci0 Event packet
[69713.818917] [8909] hci_le_conn_complete_evt: hci0 status 0x00
[69713.818922] [8909] hci_send_to_control: len 19
[69713.818927] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000
[69713.818938] [8909] hci_conn_add_sysfs: conn ffff88021f65a000
[69713.818975] [6450] bt_sock_poll: sock ffff88005e758500, sk ffff88010323b800
[69713.818981] [6515] hci_sock_recvmsg: sock ffff88005e75a080, sk ffff88010323ac00
...
[69713.819021] [8909] hci_dev_hold: hci0 orig refcnt 10
[69713.819025] [8909] l2cap_connect_cfm: hcon ffff88021f65a000 bdaddr 00:1b:dc:06:06:22 status 0
[69713.819028] [8909] hci_chan_create: hci0 hcon ffff88021f65a000
[69713.819031] [8909] l2cap_conn_add: hcon ffff88021f65a000 conn ffff880221005c00 hchan ffff88020d60b1c0
[69713.819034] [8909] l2cap_conn_ready: conn ffff880221005c00
[69713.819036] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000
[69713.819037] [8909] smp_conn_security: conn ffff880221005c00 hcon ffff88021f65a000 level 0x02
[69713.819039] [8909] smp_chan_create:
[69713.819041] [8909] hci_conn_hold: hcon ffff88021f65a000 orig refcnt 1
[69713.819043] [8909] smp_send_cmd: code 0x01
[69713.819045] [8909] hci_send_acl: hci0 chan ffff88020d60b1c0 flags 0x0000
[69713.819046] [5949] hci_sock_recvmsg: sock ffff8800941a9900, sk ffff88012bf4e800
[69713.819049] [8909] hci_queue_acl: hci0 nonfrag skb ffff88005157c100 len 15
[69713.819055] [5949] hci_sock_recvmsg: sock ffff8800941a9900, sk ffff88012bf4e800
[69713.819057] [8909] l2cap_le_conn_ready:
[69713.819064] [8909] l2cap_chan_create: chan ffff88005ede2c00
[69713.819066] [8909] l2cap_chan_hold: chan ffff88005ede2c00 orig refcnt 1
[69713.819069] [8909] l2cap_sock_init: sk ffff88005ede5800
[69713.819072] [8909] bt_accept_enqueue: parent ffff880160356000, sk ffff88005ede5800
[69713.819074] [8909] __l2cap_chan_add: conn ffff880221005c00, psm 0x00, dcid 0x0004
[69713.819076] [8909] l2cap_chan_hold: chan ffff88005ede2c00 orig refcnt 2
[69713.819078] [8909] hci_conn_hold: hcon ffff88021f65a000 orig refcnt 2
[69713.819080] [8909] smp_conn_security: conn ffff880221005c00 hcon ffff88021f65a000 level 0x01
[69713.819082] [8909] l2cap_sock_ready_cb: sk ffff88005ede5800, parent ffff880160356000
[69713.819086] [8909] le_pairing_complete_cb: status 0
[69713.819091] [8909] hci_tx_work: hci0 acl 10 sco 8 le 0
[69713.819093] [8909] hci_sched_acl: hci0
[69713.819094] [8909] hci_sched_sco: hci0
[69713.819096] [8909] hci_sched_esco: hci0
[69713.819098] [8909] hci_sched_le: hci0
[69713.819099] [8909] hci_chan_sent: hci0
[69713.819101] [8909] hci_chan_sent: chan ffff88020d60b1c0 quote 10
[69713.819104] [8909] hci_sched_le: chan ffff88020d60b1c0 skb ffff88005157c100 len 15 priority 7
[69713.819106] [8909] hci_send_frame: hci0 type 2 len 15
[69713.819108] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 15
[69713.819119] [8909] hci_chan_sent: hci0
[69713.819121] [8909] hci_prio_recalculate: hci0
[69713.819123] [8909] process_pending_rx:
[69713.819226] [6450] hci_sock_recvmsg: sock ffff88005e758780, sk ffff88010323d400
...
[69713.822022] [6450] l2cap_sock_accept: sk ffff880160356000 timeo 0
[69713.822024] [6450] bt_accept_dequeue: parent ffff880160356000
[69713.822026] [6450] bt_accept_unlink: sk ffff88005ede5800 state 1
[69713.822028] [6450] l2cap_sock_accept: new socket ffff88005ede5800
[69713.822368] [6450] l2cap_sock_getname: sock ffff8800941ab700, sk ffff88005ede5800
[69713.822375] [6450] l2cap_sock_getsockopt: sk ffff88005ede5800
[69713.822383] [6450] l2cap_sock_getname: sock ffff8800941ab700, sk ffff88005ede5800
[69713.822414] [6450] bt_sock_poll: sock ffff8800941ab700, sk ffff88005ede5800
...
[69713.823255] [6450] l2cap_sock_getname: sock ffff8800941ab700, sk ffff88005ede5800
[69713.823259] [6450] l2cap_sock_getsockopt: sk ffff88005ede5800
[69713.824322] [6450] l2cap_sock_getname: sock ffff8800941ab700, sk ffff88005ede5800
[69713.824330] [6450] l2cap_sock_getsockopt: sk ffff88005ede5800
[69713.825029] [6450] bt_sock_poll: sock ffff88005e758500, sk ffff88010323b800
...
[69713.825187] [6450] l2cap_sock_sendmsg: sock ffff8800941ab700, sk ffff88005ede5800
[69713.825189] [6450] bt_sock_wait_ready: sk ffff88005ede5800
[69713.825192] [6450] l2cap_create_basic_pdu: chan ffff88005ede2c00 len 3
[69713.825196] [6450] l2cap_do_send: chan ffff88005ede2c00, skb ffff880160b0b500 len 7 priority 0
[69713.825199] [6450] hci_send_acl: hci0 chan ffff88020d60b1c0 flags 0x0000
[69713.825201] [6450] hci_queue_acl: hci0 nonfrag skb ffff880160b0b500 len 11
[69713.825210] [8909] hci_tx_work: hci0 acl 9 sco 8 le 0
[69713.825213] [8909] hci_sched_acl: hci0
[69713.825214] [8909] hci_sched_sco: hci0
[69713.825216] [8909] hci_sched_esco: hci0
[69713.825217] [8909] hci_sched_le: hci0
[69713.825219] [8909] hci_chan_sent: hci0
[69713.825221] [8909] hci_chan_sent: chan ffff88020d60b1c0 quote 9
[69713.825223] [8909] hci_sched_le: chan ffff88020d60b1c0 skb ffff880160b0b500 len 11 priority 0
[69713.825225] [8909] hci_send_frame: hci0 type 2 len 11
[69713.825227] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 11
[69713.825242] [8909] hci_chan_sent: hci0
[69713.825253] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000
[69713.825253] [8909] hci_prio_recalculate: hci0
[69713.825292] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000
[69713.825768] [6450] bt_sock_poll: sock ffff88005e758500, sk ffff88010323b800
...
[69713.866902] [8909] hci_rx_work: hci0
[69713.866921] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 7
[69713.866928] [8909] hci_rx_work: hci0 Event packet
[69713.866931] [8909] hci_num_comp_pkts_evt: hci0 num_hndl 1
[69713.866937] [8909] hci_tx_work: hci0 acl 9 sco 8 le 0
[69713.866939] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000
[69713.866940] [8909] hci_sched_acl: hci0
...
[69713.866944] [8909] hci_sched_le: hci0
[69713.866953] [8909] hci_chan_sent: hci0
[69713.866997] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000
[69713.867840] [28074] hci_rx_work: hci0
[69713.867844] [28074] hci_send_to_monitor: hdev ffff88021f0c7000 len 7
[69713.867850] [28074] hci_rx_work: hci0 Event packet
[69713.867853] [28074] hci_num_comp_pkts_evt: hci0 num_hndl 1
[69713.867857] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000
[69713.867858] [28074] hci_tx_work: hci0 acl 10 sco 8 le 0
[69713.867860] [28074] hci_sched_acl: hci0
[69713.867861] [28074] hci_sched_sco: hci0
[69713.867862] [28074] hci_sched_esco: hci0
[69713.867863] [28074] hci_sched_le: hci0
[69713.867865] [28074] hci_chan_sent: hci0
[69713.867888] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000
[69714.145661] [8909] hci_rx_work: hci0
[69714.145666] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 10
[69714.145676] [8909] hci_rx_work: hci0 ACL data packet
[69714.145679] [8909] hci_acldata_packet: hci0 len 6 handle 0x002d flags 0x0002
[69714.145681] [8909] hci_conn_enter_active_mode: hcon ffff88021f65a000 mode 0
[69714.145683] [8909] l2cap_recv_acldata: conn ffff880221005c00 len 6 flags 0x2
[69714.145693] [8909] l2cap_recv_frame: len 2, cid 0x0006
[69714.145696] [8909] hci_send_to_control: len 14
[69714.145710] [8909] smp_chan_destroy:
[69714.145713] [8909] pairing_complete: status 3
[69714.145714] [8909] cmd_complete: sock ffff88010323ac00
[69714.145717] [8909] hci_conn_drop: hcon ffff88021f65a000 orig refcnt 3
[69714.145719] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000
[69714.145720] [6450] bt_sock_poll: sock ffff88005e758500, sk ffff88010323b800
[69714.145722] [6515] hci_sock_recvmsg: sock ffff88005e75a080, sk ffff88010323ac00
[69714.145724] [6450] bt_sock_poll: sock ffff8801db6b4f00, sk ffff880160351c00
...
[69714.145735] [6515] hci_sock_recvmsg: sock ffff88005e75a080, sk ffff88010323ac00
[69714.145737] [8909] hci_conn_drop: hcon ffff88021f65a000 orig refcnt 2
[69714.145739] [8909] l2cap_conn_del: hcon ffff88021f65a000 conn ffff880221005c00, err 13
[69714.145740] [6450] bt_sock_poll: sock ffff8801db6b5400, sk ffff88021e775000
[69714.145743] [6450] bt_sock_poll: sock ffff8801db6b5e00, sk ffff880160356000
[69714.145744] [8909] l2cap_chan_hold: chan ffff88005ede2c00 orig refcnt 3
[69714.145746] [6450] bt_sock_poll: sock ffff8800941ab700, sk ffff88005ede5800
[69714.145748] [8909] l2cap_chan_del: chan ffff88005ede2c00, conn ffff880221005c00, err 13
[69714.145749] [8909] l2cap_chan_put: chan ffff88005ede2c00 orig refcnt 4
[69714.145751] [8909] hci_conn_drop: hcon ffff88021f65a000 orig refcnt 1
[69714.145754] [6450] bt_sock_poll: sock ffff8800941ab700, sk ffff88005ede5800
[69714.145756] [8909] l2cap_chan_put: chan ffff88005ede2c00 orig refcnt 3
[69714.145759] [8909] hci_chan_del: hci0 hcon ffff88021f65a000 chan ffff88020d60b1c0
[69714.145766] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000
[69714.145787] [6515] hci_sock_release: sock ffff88005e75a080 sk ffff88010323ac00
[69714.146002] [6450] hci_sock_recvmsg: sock ffff88005e758780, sk ffff88010323d400
[69714.150795] [6450] l2cap_sock_release: sock ffff8800941ab700, sk ffff88005ede5800
[69714.150799] [6450] l2cap_sock_shutdown: sock ffff8800941ab700, sk ffff88005ede5800
[69714.150802] [6450] l2cap_chan_close: chan ffff88005ede2c00 state BT_CLOSED
[69714.150805] [6450] l2cap_sock_kill: sk ffff88005ede5800 state BT_CLOSED
[69714.150806] [6450] l2cap_chan_put: chan ffff88005ede2c00 orig refcnt 2
[69714.150808] [6450] l2cap_sock_destruct: sk ffff88005ede5800
[69714.150809] [6450] l2cap_chan_put: chan ffff88005ede2c00 orig refcnt 1
[69714.150811] [6450] l2cap_chan_destroy: chan ffff88005ede2c00
[69714.150970] [6450] bt_sock_poll: sock ffff88005e758500, sk ffff88010323b800
...
[69714.151991] [8909] hci_conn_drop: hcon ffff88021f65a000 orig refcnt 0
[69716.150339] [8909] hci_conn_timeout: hcon ffff88021f65a000 state BT_CONNECTED, refcnt -1

Signed-off-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-06-20 13:53:54 +02:00
Johan Hedberg 2ed8f65ca2 Bluetooth: Fix rejecting pairing in case of insufficient capabilities
If we need an MITM protected connection but the local and remote IO
capabilities cannot provide it we should reject the pairing attempt in
the appropriate way. This patch adds the missing checks for such a
situation to the smp_cmd_pairing_req() and smp_cmd_pairing_rsp()
functions.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-06-20 13:53:48 +02:00
Johan Hedberg 581370cc74 Bluetooth: Refactor authentication method lookup into its own function
We'll need to do authentication method lookups from more than one place,
so refactor the lookup into its own function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-06-20 13:53:42 +02:00
Johan Hedberg c7262e711a Bluetooth: Fix overriding higher security level in SMP
When we receive a pairing request or an internal request to start
pairing we shouldn't blindly overwrite the existing pending_sec_level
value as that may actually be higher than the new one. This patch fixes
the SMP code to only overwrite the value in case the new one is higher
than the old.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-06-20 13:53:38 +02:00
David S. Miller 1b0608fd9b Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-06-18

Please pull this batch of fixes intended for the 3.16 stream!

For the Bluetooth bits, Gustavo says:

"This is our first batch of fixes for 3.16. Be aware that two patches here
are not exactly bugfixes:

* 71f28af57066 Bluetooth: Add clarifying comment for conn->auth_type
This commit just add some important security comments to the code, we found
it important enough to include it here for 3.16 since it is security related.

* 9f7ec8871132 Bluetooth: Refactor discovery stopping into its own function
This commit is just a refactor in a preparation for a fix in the next
commit (f8680f128b).

All the other patches are fixes for deadlocks and for the Bluetooth protocols,
most of them related to authentication and encryption."

On top of that...

Chin-Ran Lo fixes a problems with overlapping DMA areas in mwifiex.

Michael Braun corrects a couple of issues in order to enable a new
device in rt2800usb.

Rafał Miłecki reverts a b43 patch that caused a regression, fixes a
Kconfig typo, and corrects a frequency reporting error with the G-PHY.

Stanislaw Grsuzka fixes an rfkill regression for rt2500pci, and avoids
a rt2x00 scheduling while atomic BUG.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-19 21:32:27 -07:00
Daniel Borkmann 24599e61b7 net: sctp: check proc_dointvec result in proc_sctp_do_auth
When writing to the sysctl field net.sctp.auth_enable, it can well
be that the user buffer we handed over to proc_dointvec() via
proc_sctp_do_auth() handler contains something other than integers.

In that case, we would set an uninitialized 4-byte value from the
stack to net->sctp.auth_enable that can be leaked back when reading
the sysctl variable, and it can unintentionally turn auth_enable
on/off based on the stack content since auth_enable is interpreted
as a boolean.

Fix it up by making sure proc_dointvec() returned sucessfully.

Fixes: b14878ccb7 ("net: sctp: cache auth_enable per endpoint")
Reported-by: Florian Westphal <fwestpha@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-19 21:30:19 -07:00
Neal Cardwell 2cd0d743b0 tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb
If there is an MSS change (or misbehaving receiver) that causes a SACK
to arrive that covers the end of an skb but is less than one MSS, then
tcp_match_skb_to_sack() was rounding up pkt_len to the full length of
the skb ("Round if necessary..."), then chopping all bytes off the skb
and creating a zero-byte skb in the write queue.

This was visible now because the recently simplified TLP logic in
bef1909ee3 ("tcp: fixing TLP's FIN recovery") could find that 0-byte
skb at the end of the write queue, and now that we do not check that
skb's length we could send it as a TLP probe.

Consider the following example scenario:

 mss: 1000
 skb: seq: 0 end_seq: 4000  len: 4000
 SACK: start_seq: 3999 end_seq: 4000

The tcp_match_skb_to_sack() code will compute:

 in_sack = false
 pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999
 new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000
 new_len += mss = 4000

Previously we would find the new_len > skb->len check failing, so we
would fall through and set pkt_len = new_len = 4000 and chop off
pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment
afterward in the write queue.

With this new commit, we notice that the new new_len >= skb->len check
succeeds, so that we return without trying to fragment.

Fixes: adb92db857 ("tcp: Make SACK code to split only at mss boundaries")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-19 20:50:49 -07:00