Commit graph

782561 commits

Author SHA1 Message Date
Pablo Neira Ayuso 99e25d071f netfilter: cttimeout: ctnl_timeout_find_get() returns incorrect pointer to type
Compiler did not catch incorrect typing in the rcu hook assignment.

 % nfct add timeout test-tcp inet tcp established 100 close 10 close_wait 10
 % iptables -I OUTPUT -t raw -p tcp -j CT --timeout test-tcp
 dmesg - xt_CT: Timeout policy `test-tcp' can only be used by L3 protocol number 25000

The CT target bails out with incorrect layer 3 protocol number.

Fixes: 6c1fd7dc48 ("netfilter: cttimeout: decouple timeout policy from nfnetlink_cttimeout object")
Reported-by: Harsha Sharma <harshasharmaiitr@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11 01:31:10 +02:00
Pablo Neira Ayuso a874752a10 netfilter: conntrack: timeout interface depend on CONFIG_NF_CONNTRACK_TIMEOUT
Now that cttimeout support for nft_ct is in place, these should depend
on CONFIG_NF_CONNTRACK_TIMEOUT otherwise we can crash when dumping the
policy if this option is not enabled.

[   71.600121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[...]
[   71.600141] CPU: 3 PID: 7612 Comm: nft Not tainted 4.18.0+ #246
[...]
[   71.600188] Call Trace:
[   71.600201]  ? nft_ct_timeout_obj_dump+0xc6/0xf0 [nft_ct]

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11 01:30:25 +02:00
Florian Westphal f94e63801a netfilter: conntrack: reset tcp maxwin on re-register
Doug Smythies says:
  Sometimes it is desirable to temporarily disable, or clear,
  the iptables rule set on a computer being controlled via a
  secure shell session (SSH). While unwise on an internet facing
  computer, I also do it often on non-internet accessible computers
  while testing. Recently, this has become problematic, with the
  SSH session being dropped upon re-load of the rule set.

The problem is that when all rules are deleted, conntrack hooks get
unregistered.

In case the rules are re-added later, its possible that tcp window
has moved far enough so that all packets are considered invalid (out of
window) until entry expires (which can take forever, default
established timeout is 5 days).

Fix this by clearing maxwin of existing tcp connections on register.

v2: don't touch entries on hook removal.
v3: remove obsolete expiry check.

Reported-by: Doug Smythies <dsmythies@telus.net>
Fixes: 4d3a57f23d ("netfilter: conntrack: do not enable connection tracking unless needed")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11 01:29:24 +02:00
Joe Thornber 3ab9182816 dm thin metadata: try to avoid ever aborting transactions
Committing a transaction can consume some metadata of it's own, we now
reserve a small amount of metadata to cover this.  Free metadata
reported by the kernel will not include this reserve.

If any of the reserve has been used after a commit we enter a new
internal state PM_OUT_OF_METADATA_SPACE.  This is reported as
PM_READ_ONLY, so no userland changes are needed.  If the metadata
device is resized the pool will move back to PM_WRITE.

These changes mean we never need to abort and rollback a transaction due
to running out of metadata space.  This is particularly important
because there have been a handful of reports of data corruption against
DM thin-provisioning that can all be attributed to the thin-pool having
ran out of metadata space.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-10 17:03:18 -04:00
Rodrigo Vivi 50cbc03e50 Merge tag 'gvt-fixes-2018-09-10' of https://github.com/intel/gvt-linux into drm-intel-fixes
gvt-fixes-2018-09-10

- KVM mm access reference fix (Zhenyu)
- Fix child device config length for virtual opregion (Weinan)

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
From: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180910092212.GZ20737@zhen-hp.sh.intel.com
2018-09-10 12:37:35 -07:00
Oliver Neukum df3aa13c7b Revert "cdc-acm: implement put_char() and flush_chars()"
This reverts commit a81cf9799a.

The patch causes a regression, which I cannot find the reason for.
So let's revert for now, as a revert hurts only performance.

Original report:
I was trying to resolve the problem with Oliver but we don't get any conclusion
for 5 months, so I am now sending this to mail list and cdc_acm authors.

I am using simple request-response protocol to obtain the boiller parameters
in constant intervals.

A simple one transaction is:
1. opening the /dev/ttyACM0
2. sending the following 10-bytes request to the device:
   unsigned char req[] = {0x02, 0xfe, 0x01, 0x05, 0x08, 0x02, 0x01, 0x69, 0xab, 0x03};
3. reading response (frame of 74 bytes length).
4. closing the descriptor
I am doing this transaction with 5 seconds intervals.

Before the bad commit everything was working correctly: I've got a requests and
a responses in a timely manner.

After the bad commit more time I am using the kernel module, more problems I have.
The graph [2] is showing the problem.

As you can see after module load all seems fine but after about 30 minutes I've got
a plenty of EAGAINs when doing read()'s and trying to read back the data.

When I rmmod and insmod the cdc_acm module again, then the situation is starting
over again: running ok shortly after load, and more time it is running, more EAGAINs
I have when calling read().

As a bonus I can see the problem on the device itself:
The device is configured as you can see here on this screen [3].
It has two transmision LEDs: TX and RX. Blink duration is set for 100ms.
This is a recording before the bad commit when all is working fine: [4]
And this is with the bad commit: [5]
As you can see the TX led is blinking wrongly long (indicating transmission?)
and I have problems doing read() calls (EAGAIN).

Reported-by: Mariusz Bialonczyk <manio@skyboo.net>
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Fixes: a81cf9799a ("cdc-acm: implement put_char() and flush_chars()")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-10 20:40:29 +02:00
Yoshihiro Shimoda fa82796609 usb: Change usb_of_get_companion_dev() place to usb/common
Since renesas_usb3 udc driver calls usb_of_get_companion_dev()
which is on usb/core/of.c, build error like below happens if we
disable CONFIG_USB because the usb/core/ needs CONFIG_USB:

ERROR: "usb_of_get_companion_dev" [drivers/usb/gadget/udc/renesas_usb3.ko] undefined!

According to the usb/gadget/Kconfig, "NOTE:  Gadget support
** DOES NOT ** depend on host-side CONFIG_USB !!".
So, to fix the issue, this patch changes the usb_of_get_companion_dev()
place from usb/core/of.c to usb/common/common.c to be called by both
host and gadget.

Reported-by: John Garry <john.garry@huawei.com>
Fixes: 39facfa01c ("usb: gadget: udc: renesas_usb3: Add register of usb role switch")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-10 20:40:29 +02:00
Chunfeng Yun 0a3b53305c usb: xhci: fix interrupt transfer error happened on MTK platforms
The MTK xHCI controller use some reserved bytes in endpoint context for
bandwidth scheduling, so need keep them in xhci_endpoint_copy();

The issue is introduced by:
commit f5249461b5 ("xhci: Clear the host side toggle manually when
endpoint is soft reset")
It resets endpoints and will drop bandwidth scheduling parameters used
by interrupt or isochronous endpoints on MTK xHCI controller.
Fixes: f5249461b5 ("xhci: Clear the host side toggle manually when
endpoint is soft reset")

Cc: stable@vger.kernel.org
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Tested-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-10 20:40:29 +02:00
Kristian Evensen 7c5cca3588 qmi_wwan: Support dynamic config on Quectel EP06
Quectel EP06 (and EM06/EG06) supports dynamic configuration of USB
interfaces, without the device changing VID/PID or configuration number.
When the configuration is updated and interfaces are added/removed, the
interface numbers change. This means that the current code for matching
EP06 does not work.

This patch removes the current EP06 interface number match, and replaces
it with a match on class, subclass and protocol. Unfortunately, matching
on those three alone is not enough, as the diag interface exports the
same values as QMI. The other serial interfaces + adb export different
values and do not match.

The diag interface only has two endpoints, while the QMI interface has
three. I have therefore added a check for number of interfaces, and we
ignore the interface if the number of endpoints equals two.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:48:54 -07:00
Imre Deak 92a6803149 drm/i915/bdw: Increase IPS disable timeout to 100ms
During IPS disabling the current 42ms timeout value leads to occasional
timeouts, increase it to 100ms which seems to get rid of the problem.

References: https://bugs.freedesktop.org/show_bug.cgi?id=107494
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107562
Reported-by: Diego Viola <diego.viola@gmail.com>
Tested-by: Diego Viola <diego.viola@gmail.com>
Cc: Diego Viola <diego.viola@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180905100005.7663-1-imre.deak@intel.com
(cherry picked from commit acb3ef0ee4)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-09-10 10:18:42 -07:00
Kuninori Morimoto 3ebb17446b ethernet: renesas: convert to SPDX identifiers
This patch updates license to use SPDX-License-Identifier
instead of verbose license text.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10 10:11:53 -07:00
Ahmed S. Darwish 3835841577 staging: gasket: TODO: re-implement using UIO
The gasket in-kernel framework, recently introduced under staging,
re-implements what is already long-time provided by the UIO
subsystem, with extra PCI BAR remapping and MSI conveniences.

Before moving it out of staging, make sure we add the new bits to
the UIO framework instead, then transform its signle client, the
Apex driver, to a proper UIO driver (uio_driver.h).

Link: https://lkml.kernel.org/r/20180828103817.GB1397@do-kernel

Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-10 18:08:06 +02:00
Nicholas Piggin 7f2bf7840b tty: hvc: hvc_write() fix break condition
Commit 550ddadcc7 ("tty: hvc: hvc_write() may sleep") broke the
termination condition in case the driver stops accepting characters.
This can result in unnecessary polling of the busy driver.

Restore it by testing the hvc_push return code.

Tested-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Jason Gunthorpe <jgg@mellanox.com>
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-10 18:04:31 +02:00
Nicholas Piggin 68b2fc714f tty: hvc: hvc_poll() fix read loop batching
Commit ec97eaad13 ("tty: hvc: hvc_poll() break hv read loop")
removes get_chars batching entirely, which slows down large console
operations like paste -- virtio console "feels worse than a 9600 baud
serial line," reports Matteo.

This adds back batching in a more latency friendly way. If the caller
can sleep then we try to fill the entire flip buffer, releasing the
lock and scheduling between each iteration. If it can not sleep, then
batches are limited to 128 bytes. Matteo confirms this fixes the
performance problem.

Latency testing the powerpc OPAL console with OpenBMC UART with a
large paste shows about 0.25ms latency, which seems reasonable. 10ms
latencies were typical for this case before the latency breaking work,
so we still see most of the benefit.

  kopald-1204    0d.h.    5us : hvc_poll <-hvc_handle_interrupt
  kopald-1204    0d.h.    5us : __hvc_poll <-hvc_handle_interrupt
  kopald-1204    0d.h.    5us : _raw_spin_lock_irqsave <-__hvc_poll
  kopald-1204    0d.h.    5us : tty_port_tty_get <-__hvc_poll
  kopald-1204    0d.h.    6us : _raw_spin_lock_irqsave <-tty_port_tty_get
  kopald-1204    0d.h.    6us : _raw_spin_unlock_irqrestore <-tty_port_tty_get
  kopald-1204    0d.h.    6us : tty_buffer_request_room <-__hvc_poll
  kopald-1204    0d.h.    7us : __tty_buffer_request_room <-__hvc_poll
  kopald-1204    0d.h.    7us+: opal_get_chars <-__hvc_poll
  kopald-1204    0d.h.   36us : tty_buffer_request_room <-__hvc_poll
  kopald-1204    0d.h.   36us : __tty_buffer_request_room <-__hvc_poll
  kopald-1204    0d.h.   36us+: opal_get_chars <-__hvc_poll
  kopald-1204    0d.h.   65us : tty_buffer_request_room <-__hvc_poll
  kopald-1204    0d.h.   65us : __tty_buffer_request_room <-__hvc_poll
  kopald-1204    0d.h.   66us+: opal_get_chars <-__hvc_poll
  kopald-1204    0d.h.   94us : tty_buffer_request_room <-__hvc_poll
  kopald-1204    0d.h.   95us : __tty_buffer_request_room <-__hvc_poll
  kopald-1204    0d.h.   95us+: opal_get_chars <-__hvc_poll
  kopald-1204    0d.h.  124us : tty_buffer_request_room <-__hvc_poll
  kopald-1204    0d.h.  124us : __tty_buffer_request_room <-__hvc_poll
  kopald-1204    0d.h.  125us+: opal_get_chars <-__hvc_poll
  kopald-1204    0d.h.  154us : tty_buffer_request_room <-__hvc_poll
  kopald-1204    0d.h.  154us : __tty_buffer_request_room <-__hvc_poll
  kopald-1204    0d.h.  154us+: opal_get_chars <-__hvc_poll
  kopald-1204    0d.h.  183us : tty_buffer_request_room <-__hvc_poll
  kopald-1204    0d.h.  184us : __tty_buffer_request_room <-__hvc_poll
  kopald-1204    0d.h.  184us+: opal_get_chars <-__hvc_poll
  kopald-1204    0d.h.  213us : tty_buffer_request_room <-__hvc_poll
  kopald-1204    0d.h.  213us : __tty_buffer_request_room <-__hvc_poll
  kopald-1204    0d.h.  213us+: opal_get_chars <-__hvc_poll
  kopald-1204    0d.h.  242us : _raw_spin_unlock_irqrestore <-__hvc_poll
  kopald-1204    0d.h.  242us : tty_flip_buffer_push <-__hvc_poll
  kopald-1204    0d.h.  243us : queue_work_on <-tty_flip_buffer_push
  kopald-1204    0d.h.  243us : tty_kref_put <-__hvc_poll
  kopald-1204    0d.h.  243us : hvc_kick <-hvc_handle_interrupt
  kopald-1204    0d.h.  243us : wake_up_process <-hvc_kick
  kopald-1204    0d.h.  244us : try_to_wake_up <-hvc_kick
  kopald-1204    0d.h.  244us : _raw_spin_lock_irqsave <-try_to_wake_up
  kopald-1204    0d.h.  244us : _raw_spin_unlock_irqrestore <-try_to_wake_up

Reported-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Jason Gunthorpe <jgg@mellanox.com>
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-10 18:04:31 +02:00
Nicholas Piggin 6e7f6b82c6 tty: hvc: hvc_poll() fix read loop hang
Commit ec97eaad13 ("tty: hvc: hvc_poll() break hv read loop") causes
the virtio console to hang at times (e.g., if you paste a bunch of
characters to it.

The reason is that get_chars must return 0 before we can be sure the
driver will kick or poll input again, but this change only scheduled a
poll if get_chars had returned a full count. Change this to poll on
any > 0 count.

Reported-by: Matteo Croce <mcroce@redhat.com>
Reported-by: Jason Gunthorpe <jgg@mellanox.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Jason Gunthorpe <jgg@mellanox.com>
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-10 18:04:31 +02:00
Jens Axboe bf93585ee1 Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-linus
Pull single NVMe fix from Christoph.

* 'nvme-4.19' of git://git.infradead.org/nvme:
  nvmet-rdma: fix possible bogus dereference under heavy load
2018-09-10 08:16:56 -06:00
Randy Dunlap 07e846bace x86/doc: Fix Documentation/x86/earlyprintk.txt
Fix a few issues in Documentation/x86/earlyprintk.txt:

- correct typos, punctuation, missing word, wrong word
- change product name from Netchip to NetChip
- expand where to add "earlyprintk=dbg"

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Link: http://lkml.kernel.org/r/d0c40ac3-7659-6374-dbda-23d3d2577f30@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 15:09:30 +02:00
Yabin Cui 02e184476e perf/core: Force USER_DS when recording user stack data
Perf can record user stack data in response to a synchronous request, such
as a tracepoint firing. If this happens under set_fs(KERNEL_DS), then we
end up reading user stack data using __copy_from_user_inatomic() under
set_fs(KERNEL_DS). I think this conflicts with the intention of using
set_fs(KERNEL_DS). And it is explicitly forbidden by hardware on ARM64
when both CONFIG_ARM64_UAO and CONFIG_ARM64_PAN are used.

So fix this by forcing USER_DS when recording user stack data.

Signed-off-by: Yabin Cui <yabinc@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 88b0193d94 ("perf/callchain: Force USER_DS when invoking perf_callchain_user()")
Link: http://lkml.kernel.org/r/20180823225935.27035-1-yabinc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 14:01:46 +02:00
Colin Ian King 0b405c65ad locking/ww_mutex: Fix spelling mistake "cylic" -> "cyclic"
Trivial fix to spelling mistake in pr_err() error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Link: http://lkml.kernel.org/r/20180824112235.8842-1-colin.king@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 14:00:01 +02:00
Ben Hutchings dc5591a03f locking/lockdep: Delete unnecessary #include
Commit:

  c3bc8fd637 ("tracing: Centralize preemptirq tracepoints and unify their usage")

added the inclusion of <trace/events/preemptirq.h>.

liblockdep doesn't have a stub version of that header so now fails to build.

However, commit:

  bff1b208a5 ("tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage"")

removed the use of functions declared in that header. So delete the #include.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <alexander.levin@verizon.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: bff1b208a5 ("tracing: Partial revert of "tracing: Centralize ...")
Fixes: c3bc8fd637 ("tracing: Centralize preemptirq tracepoints ...")
Link: http://lkml.kernel.org/r/20180828203315.GD18030@decadent.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 13:48:25 +02:00
Sasha Levin 16214312df tools/lib/lockdep: Add dummy task_struct state member
Commit:

  8cc05c71ba ("locking/lockdep: Move sanity check to inside lockdep_print_held_locks()")

added accesses to the task_struct's state member. Add dummy userspace declaration.

Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <Alexander.Levin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180813190527.16853-4-alexander.levin@microsoft.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 13:48:24 +02:00
Sasha Levin 1064ea494b tools/lib/lockdep: Add empty nmi.h
Required since:

  88f1c87de1 ("locking/lockdep: Avoid triggering hardlockup from debug_show_all_locks()")

Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <Alexander.Levin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180813190527.16853-3-alexander.levin@microsoft.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 13:48:23 +02:00
Sasha Levin 83e01228cb tools/lib/lockdep: Update Sasha Levin email to MSFT
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <Alexander.Levin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180813190527.16853-2-alexander.levin@microsoft.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 13:48:22 +02:00
Miklos Szeredi 8c25741aaa ovl: fix oopses in ovl_fill_super() failure paths
ovl_free_fs() dereferences ofs->workbasedir and ofs->upper_mnt in cases when
those might not have been initialized yet.

Fix the initialization order for these fields.

Reported-by: syzbot+c75f181dc8429d2eb887@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc:  <stable@vger.kernel.org> # v4.15
Fixes: 95e6d4177c ("ovl: grab reference to workbasedir early")
Fixes: a9075cdb46 ("ovl: factor out ovl_free_fs() helper")
2018-09-10 12:55:49 +02:00
Daniel Vetter f8ff6b2d4a staging/fbtft: Update TODO and mailing lists
Motivated by the ksummit-discuss discussion.

Cc: Shuah Khan <shuahkhan@gmail.com>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: linux-fbdev@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-10 10:39:19 +02:00
Randy Dunlap 882a78a9f3 sched/fair: Fix kernel-doc notation warning
Fix kernel-doc warning for missing 'flags' parameter description:

../kernel/sched/fair.c:3371: warning: Function parameter or member 'flags' not described in 'attach_entity_load_avg'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ea14b57e8a ("sched/cpufreq: Provide migration hint")
Link: http://lkml.kernel.org/r/cdda0d42-880d-4229-a9f7-5899c977a063@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 10:31:37 +02:00
Borislav Petkov da260fe123 jump_label: Fix typo in warning message
There's no 'allocatote' - use the next best thing: 'allocate' :-)

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180907103521.31344-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 10:15:48 +02:00
Vincent Guittot bb3485c8ac sched/fair: Fix load_balance redo for !imbalance
It can happen that load_balance() finds a busiest group and then a
busiest rq but the calculated imbalance is in fact 0.

In such situation, detach_tasks() returns immediately and lets the
flag LBF_ALL_PINNED set. The busiest CPU is then wrongly assumed to
have pinned tasks and removed from the load balance mask. then, we
redo a load balance without the busiest CPU. This creates wrong load
balance situation and generates wrong task migration.

If the calculated imbalance is 0, it's useless to try to find a
busiest rq as no task will be migrated and we can return immediately.

This situation can happen with heterogeneous system or smp system when
RT tasks are decreasing the capacity of some CPUs.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: jhugo@codeaurora.org
Link: http://lkml.kernel.org/r/1536306664-29827-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 10:13:49 +02:00
Vincent Guittot 287cdaac57 sched/fair: Fix scale_rt_capacity() for SMT
Since commit:

  523e979d31 ("sched/core: Use PELT for scale_rt_capacity()")

scale_rt_capacity() returns the remaining capacity and not a scale factor
to apply on cpu_capacity_orig. arch_scale_cpu() is directly called by
scale_rt_capacity() so we must take the sched_domain argument.

Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 523e979d31 ("sched/core: Use PELT for scale_rt_capacity()")
Link: http://lkml.kernel.org/r/20180904093626.GA23936@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 10:13:47 +02:00
Steve Muckle d0cdb3ce88 sched/fair: Fix vruntime_normalized() for remote non-migration wakeup
When a task which previously ran on a given CPU is remotely queued to
wake up on that same CPU, there is a period where the task's state is
TASK_WAKING and its vruntime is not normalized. This is not accounted
for in vruntime_normalized() which will cause an error in the task's
vruntime if it is switched from the fair class during this time.

For example if it is boosted to RT priority via rt_mutex_setprio(),
rq->min_vruntime will not be subtracted from the task's vruntime but
it will be added again when the task returns to the fair class. The
task's vruntime will have been erroneously doubled and the effective
priority of the task will be reduced.

Note this will also lead to inflation of all vruntimes since the doubled
vruntime value will become the rq's min_vruntime when other tasks leave
the rq. This leads to repeated doubling of the vruntime and priority
penalty.

Fix this by recognizing a WAKING task's vruntime as normalized only if
sched_remote_wakeup is true. This indicates a migration, in which case
the vruntime would have been normalized in migrate_task_rq_fair().

Based on a similar patch from John Dias <joaodias@google.com>.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Steve Muckle <smuckle@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Redpath <Chris.Redpath@arm.com>
Cc: John Dias <joaodias@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miguel de Dios <migueldedios@google.com>
Cc: Morten Rasmussen <Morten.Rasmussen@arm.com>
Cc: Patrick Bellasi <Patrick.Bellasi@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: kernel-team@android.com
Fixes: b5179ac70d ("sched/fair: Prepare to fix fairness problems on migration")
Link: http://lkml.kernel.org/r/20180831224217.169476-1-smuckle@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 10:13:47 +02:00
Vincent Guittot 12b04875d6 sched/pelt: Fix update_blocked_averages() for RT and DL classes
update_blocked_averages() is called to periodiccally decay the stalled load
of idle CPUs and to sync all loads before running load balance.

When cfs rq is idle, it trigs a load balance during pick_next_task_fair()
in order to potentially pull tasks and to use this newly idle CPU. This
load balance happens whereas prev task from another class has not been put
and its utilization updated yet. This may lead to wrongly account running
time as idle time for RT or DL classes.

Test that no RT or DL task is running when updating their utilization in
update_blocked_averages().

We still update RT and DL utilization instead of simply skipping them to
make sure that all metrics are synced when used during load balance.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 371bf42732 ("sched/rt: Add rt_rq utilization tracking")
Fixes: 3727e0e163 ("sched/dl: Add dl_rq utilization tracking")
Link: http://lkml.kernel.org/r/1535728975-22799-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 10:13:46 +02:00
Srikar Dronamraju e5e96fafd9 sched/topology: Set correct NUMA topology type
With the following commit:

  051f3ca02e ("sched/topology: Introduce NUMA identity node sched domain")

the scheduler introduced a new NUMA level. However this leads to the NUMA topology
on 2 node systems to not be marked as NUMA_DIRECT anymore.

After this commit, it gets reported as NUMA_BACKPLANE, because
sched_domains_numa_level is now 2 on 2 node systems.

Fix this by allowing setting systems that have up to 2 NUMA levels as
NUMA_DIRECT.

While here remove code that assumes that level can be 0.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andre Wild <wild@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Fixes: 051f3ca02e "Introduce NUMA identity node sched domain"
Link: http://lkml.kernel.org/r/1533920419-17410-1-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 10:13:45 +02:00
Jiada Wang e73e81975f sched/debug: Fix potential deadlock when writing to sched_features
The following lockdep report can be triggered by writing to /sys/kernel/debug/sched_features:

  ======================================================
  WARNING: possible circular locking dependency detected
  4.18.0-rc6-00152-gcd3f77d74ac3-dirty #18 Not tainted
  ------------------------------------------------------
  sh/3358 is trying to acquire lock:
  000000004ad3989d (cpu_hotplug_lock.rw_sem){++++}, at: static_key_enable+0x14/0x30
  but task is already holding lock:
  00000000c1b31a88 (&sb->s_type->i_mutex_key#3){+.+.}, at: sched_feat_write+0x160/0x428
  which lock already depends on the new lock.
  the existing dependency chain (in reverse order) is:
  -> #3 (&sb->s_type->i_mutex_key#3){+.+.}:
         lock_acquire+0xb8/0x148
         down_write+0xac/0x140
         start_creating+0x5c/0x168
         debugfs_create_dir+0x18/0x220
         opp_debug_register+0x8c/0x120
         _add_opp_dev+0x104/0x1f8
         dev_pm_opp_get_opp_table+0x174/0x340
         _of_add_opp_table_v2+0x110/0x760
         dev_pm_opp_of_add_table+0x5c/0x240
         dev_pm_opp_of_cpumask_add_table+0x5c/0x100
         cpufreq_init+0x160/0x430
         cpufreq_online+0x1cc/0xe30
         cpufreq_add_dev+0x78/0x198
         subsys_interface_register+0x168/0x270
         cpufreq_register_driver+0x1c8/0x278
         dt_cpufreq_probe+0xdc/0x1b8
         platform_drv_probe+0xb4/0x168
         driver_probe_device+0x318/0x4b0
         __device_attach_driver+0xfc/0x1f0
         bus_for_each_drv+0xf8/0x180
         __device_attach+0x164/0x200
         device_initial_probe+0x10/0x18
         bus_probe_device+0x110/0x178
         device_add+0x6d8/0x908
         platform_device_add+0x138/0x3d8
         platform_device_register_full+0x1cc/0x1f8
         cpufreq_dt_platdev_init+0x174/0x1bc
         do_one_initcall+0xb8/0x310
         kernel_init_freeable+0x4b8/0x56c
         kernel_init+0x10/0x138
         ret_from_fork+0x10/0x18
  -> #2 (opp_table_lock){+.+.}:
         lock_acquire+0xb8/0x148
         __mutex_lock+0x104/0xf50
         mutex_lock_nested+0x1c/0x28
         _of_add_opp_table_v2+0xb4/0x760
         dev_pm_opp_of_add_table+0x5c/0x240
         dev_pm_opp_of_cpumask_add_table+0x5c/0x100
         cpufreq_init+0x160/0x430
         cpufreq_online+0x1cc/0xe30
         cpufreq_add_dev+0x78/0x198
         subsys_interface_register+0x168/0x270
         cpufreq_register_driver+0x1c8/0x278
         dt_cpufreq_probe+0xdc/0x1b8
         platform_drv_probe+0xb4/0x168
         driver_probe_device+0x318/0x4b0
         __device_attach_driver+0xfc/0x1f0
         bus_for_each_drv+0xf8/0x180
         __device_attach+0x164/0x200
         device_initial_probe+0x10/0x18
         bus_probe_device+0x110/0x178
         device_add+0x6d8/0x908
         platform_device_add+0x138/0x3d8
         platform_device_register_full+0x1cc/0x1f8
         cpufreq_dt_platdev_init+0x174/0x1bc
         do_one_initcall+0xb8/0x310
         kernel_init_freeable+0x4b8/0x56c
         kernel_init+0x10/0x138
         ret_from_fork+0x10/0x18
  -> #1 (subsys mutex#6){+.+.}:
         lock_acquire+0xb8/0x148
         __mutex_lock+0x104/0xf50
         mutex_lock_nested+0x1c/0x28
         subsys_interface_register+0xd8/0x270
         cpufreq_register_driver+0x1c8/0x278
         dt_cpufreq_probe+0xdc/0x1b8
         platform_drv_probe+0xb4/0x168
         driver_probe_device+0x318/0x4b0
         __device_attach_driver+0xfc/0x1f0
         bus_for_each_drv+0xf8/0x180
         __device_attach+0x164/0x200
         device_initial_probe+0x10/0x18
         bus_probe_device+0x110/0x178
         device_add+0x6d8/0x908
         platform_device_add+0x138/0x3d8
         platform_device_register_full+0x1cc/0x1f8
         cpufreq_dt_platdev_init+0x174/0x1bc
         do_one_initcall+0xb8/0x310
         kernel_init_freeable+0x4b8/0x56c
         kernel_init+0x10/0x138
         ret_from_fork+0x10/0x18
  -> #0 (cpu_hotplug_lock.rw_sem){++++}:
         __lock_acquire+0x203c/0x21d0
         lock_acquire+0xb8/0x148
         cpus_read_lock+0x58/0x1c8
         static_key_enable+0x14/0x30
         sched_feat_write+0x314/0x428
         full_proxy_write+0xa0/0x138
         __vfs_write+0xd8/0x388
         vfs_write+0xdc/0x318
         ksys_write+0xb4/0x138
         sys_write+0xc/0x18
         __sys_trace_return+0x0/0x4
  other info that might help us debug this:
  Chain exists of:
    cpu_hotplug_lock.rw_sem --> opp_table_lock --> &sb->s_type->i_mutex_key#3
   Possible unsafe locking scenario:
         CPU0                    CPU1
         ----                    ----
    lock(&sb->s_type->i_mutex_key#3);
                                 lock(opp_table_lock);
                                 lock(&sb->s_type->i_mutex_key#3);
    lock(cpu_hotplug_lock.rw_sem);
   *** DEADLOCK ***
  2 locks held by sh/3358:
   #0: 00000000a8c4b363 (sb_writers#10){.+.+}, at: vfs_write+0x238/0x318
   #1: 00000000c1b31a88 (&sb->s_type->i_mutex_key#3){+.+.}, at: sched_feat_write+0x160/0x428
  stack backtrace:
  CPU: 5 PID: 3358 Comm: sh Not tainted 4.18.0-rc6-00152-gcd3f77d74ac3-dirty #18
  Hardware name: Renesas H3ULCB Kingfisher board based on r8a7795 ES2.0+ (DT)
  Call trace:
   dump_backtrace+0x0/0x288
   show_stack+0x14/0x20
   dump_stack+0x13c/0x1ac
   print_circular_bug.isra.10+0x270/0x438
   check_prev_add.constprop.16+0x4dc/0xb98
   __lock_acquire+0x203c/0x21d0
   lock_acquire+0xb8/0x148
   cpus_read_lock+0x58/0x1c8
   static_key_enable+0x14/0x30
   sched_feat_write+0x314/0x428
   full_proxy_write+0xa0/0x138
   __vfs_write+0xd8/0x388
   vfs_write+0xdc/0x318
   ksys_write+0xb4/0x138
   sys_write+0xc/0x18
   __sys_trace_return+0x0/0x4

This is because when loading the cpufreq_dt module we first acquire
cpu_hotplug_lock.rw_sem lock, then in cpufreq_init(), we are taking
the &sb->s_type->i_mutex_key lock.

But when writing to /sys/kernel/debug/sched_features, the
cpu_hotplug_lock.rw_sem lock depends on the &sb->s_type->i_mutex_key lock.

To fix this bug, reverse the lock acquisition order when writing to
sched_features, this way cpu_hotplug_lock.rw_sem no longer depends on
&sb->s_type->i_mutex_key.

Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Jiada Wang <jiada_wang@mentor.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Eugeniu Rosca <erosca@de.adit-jv.com>
Cc: George G. Davis <george_davis@mentor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180731121222.26195-1-jiada_wang@mentor.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 10:13:45 +02:00
Gao Xiang 5f0abea6ab staging: erofs: rename superblock flags (MS_xyz -> SB_xyz)
This patch follows commit 1751e8a6cb ("Rename superblock
flags (MS_xyz -> SB_xyz)") and after commit ("vfs: Suppress
MS_* flag defs within the kernel unless explicitly enabled"),
there is no MS_RDONLY and MS_NOATIME at all.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-10 10:12:21 +02:00
Thomas Hellstrom e13e2366d8 locking/mutex: Fix mutex debug call and ww_mutex documentation
The following commit:

  08295b3b5b ("Implement an algorithm choice for Wound-Wait mutexes")

introduced a reference in the documentation to a function that was
removed in an earlier commit.

It also forgot to remove a call to debug_mutex_add_waiter() which is now
unconditionally called by __mutex_add_waiter().

Fix those bugs.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dri-devel@lists.freedesktop.org
Fixes: 08295b3b5b ("Implement an algorithm choice for Wound-Wait mutexes")
Link: http://lkml.kernel.org/r/20180903140708.2401-1-thellstrom@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 10:05:10 +02:00
Peter Zijlstra 09121255c7 perf/UAPI: Clearly mark __PERF_SAMPLE_CALLCHAIN_EARLY as internal use
Vince noted that commit:

  6cbc304f2f ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)")

'leaked' __PERF_SAMPLE_CALLCHAIN_EARLY into the UAPI namespace. And
while sys_perf_event_open() will error out if you try to use it, it is
exposed.

Clearly mark it for internal use only to avoid any confusion.

Requested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 10:03:02 +02:00
Jacek Tomaka 16160c1946 perf/x86/intel: Add support/quirk for the MISPREDICT bit on Knights Landing CPUs
Problem: perf did not show branch predicted/mispredicted bit in brstack.

Output of perf -F brstack for profile collected

Before:

 0x4fdbcd/0x4fdc03/-/-/-/0
 0x45f4c1/0x4fdba0/-/-/-/0
 0x45f544/0x45f4bb/-/-/-/0
 0x45f555/0x45f53c/-/-/-/0
 0x7f66901cc24b/0x45f555/-/-/-/0
 0x7f66901cc22e/0x7f66901cc23d/-/-/-/0
 0x7f66901cc1ff/0x7f66901cc20f/-/-/-/0
 0x7f66901cc1e8/0x7f66901cc1fc/-/-/-/0

After:

 0x4fdbcd/0x4fdc03/P/-/-/0
 0x45f4c1/0x4fdba0/P/-/-/0
 0x45f544/0x45f4bb/P/-/-/0
 0x45f555/0x45f53c/P/-/-/0
 0x7f66901cc24b/0x45f555/P/-/-/0
 0x7f66901cc22e/0x7f66901cc23d/P/-/-/0
 0x7f66901cc1ff/0x7f66901cc20f/P/-/-/0
 0x7f66901cc1e8/0x7f66901cc1fc/P/-/-/0

Cause:

As mentioned in Software Development Manual vol 3, 17.4.8.1,
IA32_PERF_CAPABILITIES[5:0] indicates the format of the address that is
stored in the LBR stack. Knights Landing reports 1 (LBR_FORMAT_LIP) as
its format. Despite that, registers containing FROM address of the branch,
do have MISPREDICT bit but because of the format indicated in
IA32_PERF_CAPABILITIES[5:0], LBR did not read MISPREDICT bit.

Solution:

Teach LBR about above Knights Landing quirk and make it read MISPREDICT bit.

Signed-off-by: Jacek Tomaka <jacek.tomaka@poczta.fm>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180802013830.10600-1-jacekt@dugeo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 10:03:01 +02:00
Linus Torvalds 11da3a7f84 Linux 4.19-rc3 2018-09-09 17:26:43 -07:00
Taehee Yoo 5d407b071d ip: frags: fix crash in ip_do_fragment()
A kernel crash occurrs when defragmented packet is fragmented
in ip_do_fragment().
In defragment routine, skb_orphan() is called and
skb->ip_defrag_offset is set. but skb->sk and
skb->ip_defrag_offset are same union member. so that
frag->sk is not NULL.
Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
defragmented packet is fragmented.

test commands:
   %iptables -t nat -I POSTROUTING -j MASQUERADE
   %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000

splat looks like:
[  261.069429] kernel BUG at net/ipv4/ip_output.c:636!
[  261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
[  261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
[  261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
[  261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
[  261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
[  261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
[  261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
[  261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
[  261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
[  261.174169] FS:  00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
[  261.183012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
[  261.198158] Call Trace:
[  261.199018]  ? dst_output+0x180/0x180
[  261.205011]  ? save_trace+0x300/0x300
[  261.209018]  ? ip_copy_metadata+0xb00/0xb00
[  261.213034]  ? sched_clock_local+0xd4/0x140
[  261.218158]  ? kill_l4proto+0x120/0x120 [nf_conntrack]
[  261.223014]  ? rt_cpu_seq_stop+0x10/0x10
[  261.227014]  ? find_held_lock+0x39/0x1c0
[  261.233008]  ip_finish_output+0x51d/0xb50
[  261.237006]  ? ip_fragment.constprop.56+0x220/0x220
[  261.243011]  ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
[  261.250152]  ? rcu_is_watching+0x77/0x120
[  261.255010]  ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
[  261.261033]  ? nf_hook_slow+0xb1/0x160
[  261.265007]  ip_output+0x1c7/0x710
[  261.269005]  ? ip_mc_output+0x13f0/0x13f0
[  261.273002]  ? __local_bh_enable_ip+0xe9/0x1b0
[  261.278152]  ? ip_fragment.constprop.56+0x220/0x220
[  261.282996]  ? nf_hook_slow+0xb1/0x160
[  261.287007]  raw_sendmsg+0x21f9/0x4420
[  261.291008]  ? dst_output+0x180/0x180
[  261.297003]  ? sched_clock_cpu+0x126/0x170
[  261.301003]  ? find_held_lock+0x39/0x1c0
[  261.306155]  ? stop_critical_timings+0x420/0x420
[  261.311004]  ? check_flags.part.36+0x450/0x450
[  261.315005]  ? _raw_spin_unlock_irq+0x29/0x40
[  261.320995]  ? _raw_spin_unlock_irq+0x29/0x40
[  261.326142]  ? cyc2ns_read_end+0x10/0x10
[  261.330139]  ? raw_bind+0x280/0x280
[  261.334138]  ? sched_clock_cpu+0x126/0x170
[  261.338995]  ? check_flags.part.36+0x450/0x450
[  261.342991]  ? __lock_acquire+0x4500/0x4500
[  261.348994]  ? inet_sendmsg+0x11c/0x500
[  261.352989]  ? dst_output+0x180/0x180
[  261.357012]  inet_sendmsg+0x11c/0x500
[ ... ]

v2:
 - clear skb->sk at reassembly routine.(Eric Dumarzet)

Fixes: fa0f527358 ("ip: use rb trees for IP frag queue.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 14:50:56 -07:00
Ingo Molnar fa94351b56 perf/urgent fixes:
Kernel:
 
 - Modify breakpoint fixes (Jiri Olsa)
 
 perf annotate:
 
 - Fix parsing aarch64 branch instructions after objdump update (Kim Phillips)
 
 - Fix parsing indirect calls in 'perf annotate' (Martin Liška)
 
 perf probe:
 
 - Ignore SyS symbols irrespective of endianness on PowerPC (Sandipan Das)
 
 perf trace:
 
 - Fix include path for asm-generic/unistd.h on arm64 (Kim Phillips)
 
 Core libraries:
 
 - Fix potential null pointer dereference in perf_evsel__new_idx() (Hisao Tanabe)
 
 - Use fixed size string for comms instead of scanf("%m"), that is
   not present in the bionic libc and leads to a crash (Chris Phlipot)
 
 - Fix bad memory access in trace info on 32-bit systems, we were reading
   8 bytes from a 4-byte long variable when saving the command line in the
   perf.data file.  (Chris Phlipot)
 
 Build system:
 
 - Streamline bpf examples and headers installation, clarifying
   some install messages. (Arnaldo Carvalho de Melo)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCW41FnAAKCRCyPKLppCJ+
 J8MCAP4/RC5GwNrO5KYJ+G1iYb7QiNq9X/wsM7jCBlqWnTH+zgD9GYPIT3WQWKBN
 Rv94N4PNsYP4cpP7hTzWG0ar7p70owo=
 =+ODq
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo-4.19-20180903' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

Kernel:

- Modify breakpoint fixes (Jiri Olsa)

perf annotate:

- Fix parsing aarch64 branch instructions after objdump update (Kim Phillips)

- Fix parsing indirect calls in 'perf annotate' (Martin Liška)

perf probe:

- Ignore SyS symbols irrespective of endianness on PowerPC (Sandipan Das)

perf trace:

- Fix include path for asm-generic/unistd.h on arm64 (Kim Phillips)

Core libraries:

- Fix potential null pointer dereference in perf_evsel__new_idx() (Hisao Tanabe)

- Use fixed size string for comms instead of scanf("%m"), that is
  not present in the bionic libc and leads to a crash (Chris Phlipot)

- Fix bad memory access in trace info on 32-bit systems, we were reading
  8 bytes from a 4-byte long variable when saving the command line in the
  perf.data file.  (Chris Phlipot)

Build system:

- Streamline bpf examples and headers installation, clarifying
  some install messages. (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-09 21:36:31 +02:00
Vakul Garg 52ea992cfa net/tls: Set count of SG entries if sk_alloc_sg returns -ENOSPC
tls_sw_sendmsg() allocates plaintext and encrypted SG entries using
function sk_alloc_sg(). In case the number of SG entries hit
MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for
current SG index to '0'. This leads to calling of function
tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes
kernel crash. To fix this, set the number of SG elements to the number
of elements in plaintext/encrypted SG arrays in case sk_alloc_sg()
returns -ENOSPC.

Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 08:10:01 -07:00
David S. Miller 0e1f4c76be Merge branch 'ena-fixes'
Netanel Belgazal says:

====================
bug fixes for ENA Ethernet driver
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 07:59:56 -07:00
Netanel Belgazal 37dff155dc net: ena: fix incorrect usage of memory barriers
Added memory barriers where they were missing to support multiple
architectures, and removed redundant ones.

As part of removing the redundant memory barriers and improving
performance, we moved to more relaxed versions of memory barriers,
as well as to the more relaxed version of writel - writel_relaxed,
while maintaining correctness.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 07:59:56 -07:00
Netanel Belgazal 28abf4e9c9 net: ena: fix missing calls to READ_ONCE
Add READ_ONCE calls where necessary (for example when iterating
over a memory field that gets updated by the hardware).

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 07:59:56 -07:00
Netanel Belgazal 944b28aa29 net: ena: fix missing lock during device destruction
acquire the rtnl_lock during device destruction to avoid
using partially destroyed device.

ena_remove() shares almost the same logic as ena_destroy_device(),
so use ena_destroy_device() and avoid duplications.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 07:59:56 -07:00
Netanel Belgazal fe870c77ef net: ena: fix potential double ena_destroy_device()
ena_destroy_device() can potentially be called twice.
To avoid this, check that the device is running and
only then proceed destroying it.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 07:59:55 -07:00
Netanel Belgazal cfa324a514 net: ena: fix device destruction to gracefully free resources
When ena_destroy_device() is called from ena_suspend(), the device is
still reachable from the driver. Therefore, the driver can send a command
to the device to free all resources.
However, in all other cases of calling ena_destroy_device(), the device is
potentially in an error state and unreachable from the driver. In these
cases the driver must not send commands to the device.

The current implementation does not request resource freeing from the
device even when possible. We add the graceful parameter to
ena_destroy_device() to enable resource freeing when possible, and
use it in ena_suspend().

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 07:59:55 -07:00
Netanel Belgazal ef5b0771d2 net: ena: fix driver when PAGE_SIZE == 64kB
The buffer length field in the ena rx descriptor is 16 bit, and the
current driver passes a full page in each ena rx descriptor.
When PAGE_SIZE equals 64kB or more, the buffer length field becomes
zero.
To solve this issue, limit the ena Rx descriptor to use 16kB even
when allocating 64kB kernel pages. This change would not impact ena
device functionality, as 16kB is still larger than maximum MTU.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 07:59:55 -07:00
Netanel Belgazal 772ed869f5 net: ena: fix surprise unplug NULL dereference kernel crash
Starting with driver version 1.5.0, in case of a surprise device
unplug, there is a race caused by invoking ena_destroy_device()
from two different places. As a result, the readless register might
be accessed after it was destroyed.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09 07:59:55 -07:00
Linus Torvalds 9a5682765a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for x86:

   - Prevent multiplication result truncation on 32bit. Introduced with
     the early timestamp reworrk.

   - Ensure microcode revision storage to be consistent under all
     circumstances

   - Prevent write tearing of PTEs

   - Prevent confusion of user and kernel reegisters when dumping fatal
     signals verbosely

   - Make an error return value in a failure path of the vector
     allocation negative. Returning EINVAL might the caller assume
     success and causes further wreckage.

   - A trivial kernel doc warning fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Use WRITE_ONCE() when setting PTEs
  x86/apic/vector: Make error return value negative
  x86/process: Don't mix user/kernel regs in 64bit __show_regs()
  x86/tsc: Prevent result truncation on 32bit
  x86: Fix kernel-doc atomic.h warnings
  x86/microcode: Update the new microcode revision unconditionally
  x86/microcode: Make sure boot_cpu_data.microcode is up-to-date
2018-09-09 07:05:15 -07:00