Commit graph

27557 commits

Author SHA1 Message Date
Patrick McHardy aaa795ad25 netfilter: nat: propagate errors from xfrm_me_harder()
Propagate errors from ip_xfrm_me_harder() instead of returning EPERM in
all cases.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-08 12:34:01 +02:00
Patrick McHardy 58e35d1471 netfilter: ipv6: propagate routing errors from ip6_route_me_harder()
Propagate routing errors from ip_route_me_harder() when dropping a packet
using NF_DROP_ERR(). This makes userspace get the proper error instead of
EPERM for everything.

# ip -6 r a unreachable default table 100
# ip -6 ru add fwmark 0x1 lookup 100
# ip6tables -t mangle -A OUTPUT -d 2001:4860:4860::8888 -j MARK --set-mark 0x1

Old behaviour:

PING 2001:4860:4860::8888(2001:4860:4860::8888) 56 data bytes
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted

New behaviour:

PING 2001:4860:4860::8888(2001:4860:4860::8888) 56 data bytes
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-08 12:34:01 +02:00
Patrick McHardy c9e1673a0a netfilter: ipv4: propagate routing errors from ip_route_me_harder()
Propagate routing errors from ip_route_me_harder() when dropping a packet
using NF_DROP_ERR(). This makes userspace get the proper error instead of
EPERM for everything.

Example:

# ip r a unreachable default table 100
# ip ru add fwmark 0x1 lookup 100
# iptables -t mangle -A OUTPUT -d 8.8.8.8 -j MARK --set-mark 0x1

Current behaviour:

PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted

New behaviour:

PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-08 12:34:00 +02:00
Eric W. Biederman 6b0ee8c036 scm: Stop passing struct cred
Now that uids and gids are completely encapsulated in kuid_t
and kgid_t we no longer need to pass struct cred which allowed
us to test both the uid and the user namespace for equality.

Passing struct cred potentially allows us to pass the entire group
list as BSD does but I don't believe the cost of cache line misses
justifies retaining code for a future potential application.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 18:58:55 -04:00
David S. Miller d978a6361a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/nfc/microread/mei.c
	net/netfilter/nfnetlink_queue_core.c

Pull in 'net' to get Eric Biederman's AF_UNIX fix, upon which
some cleanups are going to go on-top.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 18:37:01 -04:00
Wei Yongjun 8303e699f7 decnet: remove duplicated include from dn_table.c
Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 17:12:01 -04:00
Wei Yongjun 2f13e9f741 net_cls: remove duplicated include from cls_api.c
Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 17:12:01 -04:00
Alan Ott fc52eea4c5 6lowpan: handle dev_queue_xmit() error code properly
dev_queue_xmit() will return a positive value if the packet could not be
queued, often because the real network device (in our case the mac802154
wpan device) has its queue stopped.  lowpan_xmit() should handle the
positive return code (for the debug statement) and return that value to
the higher layer so the higher layer will retry sending the packet.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 17:06:44 -04:00
Alan Ott e937f583ec mac802154: Increase tx_buffer_len
Increase the buffer length from 10 to 300 packets. Consider that traffic on
mac802154 devices will often be 6LoWPAN, and a full-length (1280 octet)
IPv6 packet will fragment into 15 6LoWPAN fragments (because the MTU of
IEEE 802.15.4 is 127).  A 300-packet queue is really 20 full-length IPv6
packets.

With a queue length of 10, an entire IPv6 packet was unable to get queued
at one time, causing fragments to be dropped, and making reassembly
impossible.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 17:06:43 -04:00
Alan Ott b5992fe962 mac802154: Use netif flow control
Use netif_stop_queue() and netif_wake_queue() to control the flow of
packets to mac802154 devices.  Since many IEEE 802.15.4 devices have no
output buffer, and since the mac802154 xmit() function is designed to
block, netif_stop_queue() is called after each packet.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 17:06:43 -04:00
Alan Ott 7dd43d356e mac802154: Do not try to resend failed packets
When ops->xmit() fails, drop the packet. Devices which support hardware
ack and retry (which include all devices currently supported by mainline),
will automatically retry sending the packet (in the hardware) up to 3
times, per the 802.15.4 spec.  There is no need, and it is incorrect to
try to do it in mac802154.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 17:06:43 -04:00
Wei Yongjun 524fba6c14 sctp: fix error return code in __sctp_connect()
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 17:04:17 -04:00
Cong Wang cfbe800b8b 802: fix a possible race condition
(Resend with a better changelog)

garp_pdu_queue() should ways be called with this spin lock.
garp_uninit_applicant() only holds rtnl lock which is not
enough here.  A possible race can happen as garp_pdu_rcv()
is called in BH context:

	garp_pdu_rcv()
	  |->garp_pdu_parse_msg()
	    |->garp_pdu_parse_attr()
	      |-> garp_gid_event()

Found by code inspection.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Ward <david.ward@ll.mit.edu>
Cc: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 17:04:17 -04:00
Mathias Krause d5e0d0f607 VSOCK: Fix missing msg_namelen update in vsock_stream_recvmsg()
The code misses to update the msg_namelen member to 0 and therefore
makes net/socket.c leak the local, uninitialized sockaddr_storage
variable to userland -- 128 bytes of kernel stack memory.

Cc: Andy King <acking@vmware.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
Cc: George Zhang <georgezhang@vmware.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:02 -04:00
Mathias Krause 680d04e0ba VSOCK: vmci - fix possible info leak in vmci_transport_dgram_dequeue()
In case we received no data on the call to skb_recv_datagram(), i.e.
skb->data is NULL, vmci_transport_dgram_dequeue() will return with 0
without updating msg_namelen leading to net/socket.c leaking the local,
uninitialized sockaddr_storage variable to userland -- 128 bytes of
kernel stack memory.

Fix this by moving the already existing msg_namelen assignment a few
lines above.

Cc: Andy King <acking@vmware.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
Cc: George Zhang <georgezhang@vmware.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:02 -04:00
Mathias Krause 60085c3d00 tipc: fix info leaks via msg_name in recv_msg/recv_stream
The code in set_orig_addr() does not initialize all of the members of
struct sockaddr_tipc when filling the sockaddr info -- namely the union
is only partly filled. This will make recv_msg() and recv_stream() --
the only users of this function -- leak kernel stack memory as the
msg_name member is a local variable in net/socket.c.

Additionally to that both recv_msg() and recv_stream() fail to update
the msg_namelen member to 0 while otherwise returning with 0, i.e.
"success". This is the case for, e.g., non-blocking sockets. This will
lead to a 128 byte kernel stack leak in net/socket.c.

Fix the first issue by initializing the memory of the union with
memset(0). Fix the second one by setting msg_namelen to 0 early as it
will be updated later if we're going to fill the msg_name member.

Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:02 -04:00
Mathias Krause 4a184233f2 rose: fix info leak via msg_name in rose_recvmsg()
The code in rose_recvmsg() does not initialize all of the members of
struct sockaddr_rose/full_sockaddr_rose when filling the sockaddr info.
Nor does it initialize the padding bytes of the structure inserted by
the compiler for alignment. This will lead to leaking uninitialized
kernel stack bytes in net/socket.c.

Fix the issue by initializing the memory used for sockaddr info with
memset(0).

Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:02 -04:00
Mathias Krause d26d6504f2 NFC: llcp: fix info leaks via msg_name in llcp_sock_recvmsg()
The code in llcp_sock_recvmsg() does not initialize all the members of
struct sockaddr_nfc_llcp when filling the sockaddr info. Nor does it
initialize the padding bytes of the structure inserted by the compiler
for alignment.

Also, if the socket is in state LLCP_CLOSED or is shutting down during
receive the msg_namelen member is not updated to 0 while otherwise
returning with 0, i.e. "success". The msg_namelen update is also
missing for stream and seqpacket sockets which don't fill the sockaddr
info.

Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.

Fix the first issue by initializing the memory used for sockaddr info
with memset(0). Fix the second one by setting msg_namelen to 0 early.
It will be updated later if we're going to fill the msg_name member.

Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:02 -04:00
Mathias Krause 3ce5efad47 netrom: fix info leak via msg_name in nr_recvmsg()
In case msg_name is set the sockaddr info gets filled out, as
requested, but the code fails to initialize the padding bytes of
struct sockaddr_ax25 inserted by the compiler for alignment. Also
the sax25_ndigis member does not get assigned, leaking four more
bytes.

Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.

Fix both issues by initializing the memory with memset(0).

Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:02 -04:00
Mathias Krause c77a4b9cff llc: Fix missing msg_namelen update in llc_ui_recvmsg()
For stream sockets the code misses to update the msg_namelen member
to 0 and therefore makes net/socket.c leak the local, uninitialized
sockaddr_storage variable to userland -- 128 bytes of kernel stack
memory. The msg_namelen update is also missing for datagram sockets
in case the socket is shutting down during receive.

Fix both issues by setting msg_namelen to 0 early. It will be
updated later if we're going to fill the msg_name member.

Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:01 -04:00
Mathias Krause b860d3cc62 l2tp: fix info leak in l2tp_ip6_recvmsg()
The L2TP code for IPv6 fails to initialize the l2tp_conn_id member of
struct sockaddr_l2tpip6 and therefore leaks four bytes kernel stack
in l2tp_ip6_recvmsg() in case msg_name is set.

Initialize l2tp_conn_id with 0 to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:01 -04:00
Mathias Krause a5598bd9c0 iucv: Fix missing msg_namelen update in iucv_sock_recvmsg()
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about iucv_sock_recvmsg() not filling the msg_name in case it was set.

Cc: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:01 -04:00
Mathias Krause 5ae94c0d2f irda: Fix missing msg_namelen update in irda_recvmsg_dgram()
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about irda_recvmsg_dgram() not filling the msg_name in case it was
set.

Cc: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:01 -04:00
Mathias Krause 2d6fbfe733 caif: Fix missing msg_namelen update in caif_seqpkt_recvmsg()
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about caif_seqpkt_recvmsg() not filling the msg_name in case it was
set.

Cc: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:01 -04:00
Mathias Krause c8c499175f Bluetooth: SCO - Fix missing msg_namelen update in sco_sock_recvmsg()
If the socket is in state BT_CONNECT2 and BT_SK_DEFER_SETUP is set in
the flags, sco_sock_recvmsg() returns early with 0 without updating the
possibly set msg_namelen member. This, in turn, leads to a 128 byte
kernel stack leak in net/socket.c.

Fix this by updating msg_namelen in this case. For all other cases it
will be handled in bt_sock_recvmsg().

Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:01 -04:00
Mathias Krause e11e0455c0 Bluetooth: RFCOMM - Fix missing msg_namelen update in rfcomm_sock_recvmsg()
If RFCOMM_DEFER_SETUP is set in the flags, rfcomm_sock_recvmsg() returns
early with 0 without updating the possibly set msg_namelen member. This,
in turn, leads to a 128 byte kernel stack leak in net/socket.c.

Fix this by updating msg_namelen in this case. For all other cases it
will be handled in bt_sock_stream_recvmsg().

Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:00 -04:00
Mathias Krause 4683f42fde Bluetooth: fix possible info leak in bt_sock_recvmsg()
In case the socket is already shutting down, bt_sock_recvmsg() returns
with 0 without updating msg_namelen leading to net/socket.c leaking the
local, uninitialized sockaddr_storage variable to userland -- 128 bytes
of kernel stack memory.

Fix this by moving the msg_namelen assignment in front of the shutdown
test.

Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:00 -04:00
Mathias Krause ef3313e84a ax25: fix info leak via msg_name in ax25_recvmsg()
When msg_namelen is non-zero the sockaddr info gets filled out, as
requested, but the code fails to initialize the padding bytes of struct
sockaddr_ax25 inserted by the compiler for alignment. Additionally the
msg_namelen value is updated to sizeof(struct full_sockaddr_ax25) but is
not always filled up to this size.

Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.

Fix both issues by initializing the memory with memset(0).

Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:00 -04:00
Mathias Krause 9b3e617f3d atm: update msg_namelen in vcc_recvmsg()
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about vcc_recvmsg() not filling the msg_name in case it was set.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:00 -04:00
Christoph Paasch 50a75a8914 ipv6/tcp: Stop processing ICMPv6 redirect messages
Tetja Rediske found that if the host receives an ICMPv6 redirect message
after sending a SYN+ACK, the connection will be reset.

He bisected it down to 093d04d (ipv6: Change skb->data before using
icmpv6_notify() to propagate redirect), but the origin of the bug comes
from ec18d9a26 (ipv6: Add redirect support to all protocol icmp error
handlers.). The bug simply did not trigger prior to 093d04d, because
skb->data did not point to the inner IP header and thus icmpv6_notify
did not call the correct err_handler.

This patch adds the missing "goto out;" in tcp_v6_err. After receiving
an ICMPv6 Redirect, we should not continue processing the ICMP in
tcp_v6_err, as this may trigger the removal of request-socks or setting
sk_err(_soft).

Reported-by: Tetja Rediske <tetja@tetja.de>
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 12:36:08 -04:00
David S. Miller d16658206a Merge branch 'master' of git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter and IPVS updates for
your net-next tree, most relevantly they are:

* Add net namespace support to NFLOG, ULOG and ebt_ulog and NFQUEUE.
  The LOG and ebt_log target has been also adapted, but they still
  depend on the syslog netnamespace that seems to be missing, from
  Gao Feng.

* Don't lose indications of congestion in IPv6 fragmentation handling,
  from Hannes Frederic Sowa.i

* IPVS conversion to use RCU, including some code consolidation patches
  and optimizations, also some from Julian Anastasov.

* cpu fanout support for NFQUEUE, from Holger Eitzenberger.

* Better error reporting to userspace when dropping packets from
  all our _*_[xfrm|route]_me_harder functions, from Patrick McHardy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 12:22:06 -04:00
Hannes Frederic Sowa b8dd6a223e netfilter: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling
This change brings netfilter reassembly logic on par with
reassembly.c. The corresponding change in net-next is
(eec2e61 ipv6: implement RFC3168 5.3 (ecn protection) for
ipv6 fragmentation handling)

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-06 13:06:37 +02:00
Patrick McHardy 124dff01af netfilter: don't reset nf_trace in nf_reset()
Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
to reset nf_trace in nf_reset(). This is wrong and unnecessary.

nf_reset() is used in the following cases:

- when passing packets up the the socket layer, at which point we want to
  release all netfilter references that might keep modules pinned while
  the packet is queued. nf_trace doesn't matter anymore at this point.

- when encapsulating or decapsulating IPsec packets. We want to continue
  tracing these packets after IPsec processing.

- when passing packets through virtual network devices. Only devices on
  that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
  used anymore. Its not entirely clear whether those packets should
  be traced after that, however we've always done that.

- when passing packets through virtual network devices that make the
  packet cross network namespace boundaries. This is the only cases
  where we clearly want to reset nf_trace and is also what the
  original patch intended to fix.

Add a new function nf_reset_trace() and use it in dev_forward_skb() to
fix this properly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05 15:38:10 -04:00
Pablo Neira Ayuso 12202fa757 netfilter: remove unneeded variable proc_net_netfilter
Now that this supports net namespace for nflog and nfqueue,
we can remove the global proc_net_netfilter which has no
clients anymore.

Based on patch from Gao feng.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 21:08:11 +02:00
Gao feng e817961048 netfilter: nfnetlink_queue: add net namespace support for nfnetlink_queue
This patch makes /proc/net/netfilter/nfnetlink_queue pernet.
Moreover, there's a pernet instance table and lock.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 21:08:11 +02:00
Gao feng 5b023fc8d8 netfilter: enable per netns support for nf_loggers
After this patch, all nf_loggers support net namespace. Still
xt_LOG and ebt_log require syslog netns support.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 21:08:10 +02:00
Gao feng 9368a53c47 netfilter: nfnetlink_log: add net namespace support for nfnetlink_log
This patch makes /proc/net/netfilter/nfnetlink_log pernet.
Moreover, there's a pernet instance table and lock.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 21:08:01 +02:00
Gao feng 355430671a netfilter: ipt_ULOG: add net namespace support for ipt_ULOG
Add pernet support to ipt_ULOG by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

This patch also make ulog_buffers and netlink socket
nflognl per netns.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 21:03:43 +02:00
Gao feng 5b175b7ebd netfilter: ebt_ulog: add net namespace support for ebt_ulog
Add pernet support to ebt_ulog by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

This patch also make ulog_buffers and netlink socket
ebtulognl per netns.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 20:59:50 +02:00
Gao feng 69b34fb996 netfilter: xt_LOG: add net namespace support for xt_LOG
Add pernet support to xt_LOG by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

Since syslog ns has yet not been implemented, we don't want
the containers to DDOS host's syslogd. So only enable ebt_log
only from init_net and wait for syslog ns support

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 20:58:45 +02:00
Gao feng 7d2789246c netfilter: ebt_log: add net namespace support for ebt_log
Add pernet support to ebt_log by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

Since syslog ns has yet not been implemented, we don't want
the containers to DDOS host's syslogd. So only enable ebt_log
only from init_net and wait for syslog ns support.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 20:57:27 +02:00
Gao feng 30e0c6a6be netfilter: nf_log: prepare net namespace support for loggers
This patch adds netns support to nf_log and it prepares netns
support for existing loggers. It is composed of four major
changes.

1) nf_log_register has been split to two functions: nf_log_register
   and nf_log_set. The new nf_log_register is used to globally
   register the nf_logger and nf_log_set is used for enabling
   pernet support from nf_loggers.

   Per netns is not yet complete after this patch, it comes in
   separate follow up patches.

2) Add net as a parameter of nf_log_bind_pf. Per netns is not
   yet complete after this patch, it only allows to bind the
   nf_logger to the protocol family from init_net and it skips
   other cases.

3) Adapt all nf_log_packet callers to pass netns as parameter.
   After this patch, this function only works for init_net.

4) Make the sysctl net/netfilter/nf_log pernet.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 20:12:54 +02:00
Gao feng f3c1a44a22 netfilter: make /proc/net/netfilter pernet
This patch makes this proc dentry pernet. So far only init_net
had a /proc/net/netfilter directory.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 19:35:02 +02:00
Jiri Pirko 34e2ed34a0 net: ipv4: notify when address lifetime changes
if userspace changes lifetime of address, send netlink notification and
call notifier.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05 00:51:12 -04:00
Eric W. Biederman 0e82e7f6df af_unix: If we don't care about credentials coallesce all messages
It was reported that the following LSB test case failed
https://lsbbugs.linuxfoundation.org/attachment.cgi?id=2144 because we
were not coallescing unix stream messages when the application was
expecting us to.

The problem was that the first send was before the socket was accepted
and thus sock->sk_socket was NULL in maybe_add_creds, and the second
send after the socket was accepted had a non-NULL value for sk->socket
and thus we could tell the credentials were not needed so we did not
bother.

The unnecessary credentials on the first message cause
unix_stream_recvmsg to start verifying that all messages had the same
credentials before coallescing and then the coallescing failed because
the second message had no credentials.

Ignoring credentials when we don't care in unix_stream_recvmsg fixes a
long standing pessimization which would fail to coallesce messages when
reading from a unix stream socket if the senders were different even if
we did not care about their credentials.

I have tested this and verified that the in the LSB test case mentioned
above that the messages do coallesce now, while the were failing to
coallesce without this change.

Reported-by: Karel Srot <ksrot@redhat.com>
Reported-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05 00:49:13 -04:00
Eric W. Biederman 25da0e3e9d Revert "af_unix: dont send SCM_CREDENTIAL when dest socket is NULL"
This reverts commit 14134f6584.

The problem that the above patch was meant to address is that af_unix
messages are not being coallesced because we are sending unnecesarry
credentials.  Not sending credentials in maybe_add_creds totally
breaks unconnected unix domain sockets that wish to send credentails
to other sockets.

In practice this break some versions of udev because they receive a
message and the sending uid is bogus so they drop the message.

Reported-by: Sven Joachim <svenjoac@gmx.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05 00:49:03 -04:00
Vlad Yasevich 4543fbefe6 net: count hw_addr syncs so that unsync works properly.
A few drivers use dev_uc_sync/unsync to synchronize the
address lists from master down to slave/lower devices.  In
some cases (bond/team) a single address list is synched down
to multiple devices.  At the time of unsync, we have a leak
in these lower devices, because "synced" is treated as a
boolean and the address will not be unsynced for anything after
the first device/call.

Treat "synced" as a count (same as refcount) and allow all
unsync calls to work.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05 00:18:46 -04:00
David S. Miller 4f4ecd5f2a Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:

====================
The following patchset contains netfilter updates for your net tree,
they are:

* Fix missing the skb->trace reset in nf_reset, noticed by Gao Feng
  while using the TRACE target with several net namespaces.

* Fix prefix translation in IPv6 NPT if non-multiple of 32 prefixes
  are used, from Matthias Schiffer.

* Fix invalid nfacct objects with empty name, they are now rejected
  with -EINVAL, spotted by Michael Zintakis, patch from myself.

* A couple of fixes for wrong return values in the error path of
  nfnetlink_queue and nf_conntrack, from Wei Yongjun.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-04 17:41:53 -04:00
David S. Miller 518314ffe4 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into wireless
John W. Linville says:

====================
Here are some more fixes intended for the 3.9 stream...

Regarding the mac80211 bits, Johannes says:

"I had changed the idle handling to simplify it, but broken the
sequencing of commands, at least for ath9k-htc, one patch restores the
sequence. The other patch fixes a crash Jouni found while stress-testing
the remain-on-channel code, when an item is deleted the work struct can
run twice and crash the second time."

As for the iwlwifi bits, Johannes says:

"The only fix here is to the passive-no-RX firmware regulatory
enforcement driver support code to not drop auth frames in quick
succession, leading to not being able to connect to APs on passive
channels in certain circumstances."

Don't forget the NFC bits, about which Samuel says:

"This time we have:

- A crash fix for when a DGRAM LLCP socket is listening while the NFC adapter
  is physically removed.
- A potential double skb free when the LLCP socket receive queue is full.
- A fix for properly handling multiple and consecutive LLCP connections, and
  not trash the socket ack log.
- A build failure for the MEI microread physical layer, now that the MEI bus
  APIs have been merged into char-misc-next."

On top of that, Stone Piao provides an mwifiex fix to avoid accessing
beyond the end of a buffer.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-04 17:39:06 -04:00
Jesper Dangaard Brouer 19952cc4f8 net: frag queue per hash bucket locking
This patch implements per hash bucket locking for the frag queue
hash.  This removes two write locks, and the only remaining write
lock is for protecting hash rebuild.  This essentially reduce the
readers-writer lock to a rebuild lock.

This patch is part of "net: frag performance followup"
 http://thread.gmane.org/gmane.linux.network/263644
of which two patches have already been accepted:

Same test setup as previous:
 (http://thread.gmane.org/gmane.linux.network/257155)
 Two 10G interfaces, on seperate NUMA nodes, are under-test, and uses
 Ethernet flow-control.  A third interface is used for generating the
 DoS attack (with trafgen).

Notice, I have changed the frag DoS generator script to be more
efficient/deadly.  Before it would only hit one RX queue, now its
sending packets causing multi-queue RX, due to "better" RX hashing.

Test types summary (netperf UDP_STREAM):
 Test-20G64K     == 2x10G with 65K fragments
 Test-20G3F      == 2x10G with 3x fragments (3*1472 bytes)
 Test-20G64K+DoS == Same as 20G64K with frag DoS
 Test-20G3F+DoS  == Same as 20G3F  with frag DoS
 Test-20G64K+MQ  == Same as 20G64K with Multi-Queue frag DoS
 Test-20G3F+MQ   == Same as 20G3F  with Multi-Queue frag DoS

When I rebased this-patch(03) (on top of net-next commit a210576c) and
removed the _bh spinlock, I saw a performance regression.  BUT this
was caused by some unrelated change in-between.  See tests below.

Test (A) is what I reported before for patch-02, accepted in commit 1b5ab0de.
Test (B) verifying-retest of commit 1b5ab0de corrospond to patch-02.
Test (C) is what I reported before for this-patch

Test (D) is net-next master HEAD (commit a210576c), which reveals some
(unknown) performance regression (compared against test (B)).
Test (D) function as a new base-test.

Performance table summary (in Mbit/s):

(#) Test-type:  20G64K    20G3F    20G64K+DoS  20G3F+DoS  20G64K+MQ 20G3F+MQ
    ----------  -------   -------  ----------  ---------  --------  -------
(A) Patch-02  : 18848.7   13230.1   4103.04     5310.36     130.0    440.2
(B) 1b5ab0de  : 18841.5   13156.8   4101.08     5314.57     129.0    424.2
(C) Patch-03v1: 18838.0   13490.5   4405.11     6814.72     196.6    461.6

(D) a210576c  : 18321.5   11250.4   3635.34     5160.13     119.1    405.2
(E) with _bh  : 17247.3   11492.6   3994.74     6405.29     166.7    413.6
(F) without bh: 17471.3   11298.7   3818.05     6102.11     165.7    406.3

Test (E) and (F) is this-patch(03), with(V1) and without(V2) the _bh spinlocks.

I cannot explain the slow down for 20G64K (but its an artificial
"lab-test" so I'm not worried).  But the other results does show
improvements.  And test (E) "with _bh" version is slightly better.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>

----
V2:
- By analysis from Hannes Frederic Sowa and Eric Dumazet, we don't
  need the spinlock _bh versions, as Netfilter currently does a
  local_bh_disable() before entering inet_fragment.
- Fold-in desc from cover-mail
V3:
- Drop the chain_len counter per hash bucket.
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-04 17:37:05 -04:00