From 07af7810e0a5bc4e51682c90f9fa19fc4cb93f18 Mon Sep 17 00:00:00 2001 From: "H. Nikolaus Schaller" Date: Sat, 12 Dec 2020 10:55:25 +0100 Subject: [PATCH 001/284] DTS: ARM: gta04: remove legacy spi-cs-high to make display work again This reverts commit f1f028ff89cb ("DTS: ARM: gta04: introduce legacy spi-cs-high to make display work again") which had to be intruduced after commit 6953c57ab172 ("gpio: of: Handle SPI chipselect legacy bindings") broke the GTA04 display. This contradicted the data sheet but was the only way to get it as an spi client operational again. The panel data sheet defines the chip-select to be active low. Now, with the arrival of commit 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors") the logic of interaction between spi-cs-high and the gpio descriptor flags has been changed a second time, making the display broken again. So we have to remove the original fix which in retrospect was a workaround of a bug in the spi subsystem and not a feature of the panel or bug in the device tree. With this fix the device tree is back in sync with the data sheet and spi subsystem code. Fixes: 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors") CC: stable@vger.kernel.org Signed-off-by: H. Nikolaus Schaller Signed-off-by: Tony Lindgren --- arch/arm/boot/dts/omap3-gta04.dtsi | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm/boot/dts/omap3-gta04.dtsi b/arch/arm/boot/dts/omap3-gta04.dtsi index c8745bc800f7..003202d12990 100644 --- a/arch/arm/boot/dts/omap3-gta04.dtsi +++ b/arch/arm/boot/dts/omap3-gta04.dtsi @@ -124,7 +124,6 @@ spi-max-frequency = <100000>; spi-cpol; spi-cpha; - spi-cs-high; backlight= <&backlight>; label = "lcd"; From 6efac0173cd15460b48c91e1b0a000379f341f00 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Mon, 14 Dec 2020 23:01:21 +0200 Subject: [PATCH 002/284] ARM: OMAP1: OSK: fix ohci-omap breakage Commit 45c5775460f3 ("usb: ohci-omap: Fix descriptor conversion") tried to fix all issues related to ohci-omap descriptor conversion, but a wrong patch was applied, and one needed change to the OSK board file is still missing. Fix that. Fixes: 45c5775460f3 ("usb: ohci-omap: Fix descriptor conversion") Signed-off-by: Linus Walleij [aaro.koskinen@iki.fi: rebased and updated the changelog] Signed-off-by: Aaro Koskinen Signed-off-by: Tony Lindgren --- arch/arm/mach-omap1/board-osk.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm/mach-omap1/board-osk.c b/arch/arm/mach-omap1/board-osk.c index a720259099ed..0a4c9b0b13b0 100644 --- a/arch/arm/mach-omap1/board-osk.c +++ b/arch/arm/mach-omap1/board-osk.c @@ -203,6 +203,8 @@ static int osk_tps_setup(struct i2c_client *client, void *context) */ gpio_request(OSK_TPS_GPIO_USB_PWR_EN, "n_vbus_en"); gpio_direction_output(OSK_TPS_GPIO_USB_PWR_EN, 1); + /* Free the GPIO again as the driver will request it */ + gpio_free(OSK_TPS_GPIO_USB_PWR_EN); /* Set GPIO 2 high so LED D3 is off by default */ tps65010_set_gpio_out_value(GPIO2, HIGH); From 7078a5ba7a58e5db07583b176f8a03e0b8714731 Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Tue, 8 Dec 2020 16:08:02 +0200 Subject: [PATCH 003/284] soc: ti: omap-prm: Fix boot time errors for rst_map_012 bits 0 and 1 We have rst_map_012 used for various accelerators like dsp, ipu and iva. For these use cases, we have rstctrl bit 2 control the subsystem module reset, and have and bits 0 and 1 control the accelerator specific features. If the bootloader, or kexec boot, has left any accelerator specific reset bits deasserted, deasserting bit 2 reset will potentially enable an accelerator with unconfigured MMU and no firmware. And we may get spammed with a lot by warnings on boot with "Data Access in User mode during Functional access", or depending on the accelerator, the system can also just hang. This issue can be quite easily reproduced by setting a rst_map_012 type rstctrl register to 0 or 4 in the bootloader, and booting the system. Let's just assert all reset bits for rst_map_012 type resets. So far it looks like the other rstctrl types don't need this. If it turns out that the other type rstctrl bits also need reset on init, we need to add an instance specific reset mask for the bits to avoid resetting unwanted bits. Reported-by: Carl Philipp Klemm Cc: Philipp Zabel Cc: Santosh Shilimkar Cc: Suman Anna Cc: Tero Kristo Tested-by: Carl Philipp Klemm Signed-off-by: Tony Lindgren --- drivers/soc/ti/omap_prm.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/soc/ti/omap_prm.c b/drivers/soc/ti/omap_prm.c index 980b04c38fd9..3dc3f34afbab 100644 --- a/drivers/soc/ti/omap_prm.c +++ b/drivers/soc/ti/omap_prm.c @@ -548,6 +548,7 @@ static int omap_prm_reset_init(struct platform_device *pdev, const struct omap_rst_map *map; struct ti_prm_platform_data *pdata = dev_get_platdata(&pdev->dev); char buf[32]; + u32 v; /* * Check if we have controllable resets. If either rstctrl is non-zero @@ -595,6 +596,16 @@ static int omap_prm_reset_init(struct platform_device *pdev, map++; } + /* Quirk handling to assert rst_map_012 bits on reset and avoid errors */ + if (prm->data->rstmap == rst_map_012) { + v = readl_relaxed(reset->prm->base + reset->prm->data->rstctrl); + if ((v & reset->mask) != reset->mask) { + dev_dbg(&pdev->dev, "Asserting all resets: %08x\n", v); + writel_relaxed(reset->mask, reset->prm->base + + reset->prm->data->rstctrl); + } + } + return devm_reset_controller_register(&pdev->dev, &reset->rcdev); } From 181739822cf6f8f4e12b173913af2967a28906c0 Mon Sep 17 00:00:00 2001 From: "H. Nikolaus Schaller" Date: Wed, 23 Dec 2020 11:30:21 +0100 Subject: [PATCH 004/284] ARM: dts; gta04: SPI panel chip select is active low With the arrival of commit 2fee9583198eb9 ("spi: dt-bindings: clarify CS behavior for spi-cs-high and gpio descriptors") it was clarified what the proper state for cs-gpios should be, even if the flag is ignored. The driver code is doing the right thing since 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors") The chip-select of the td028ttec1 panel is active-low, so we must omit spi-cs-high; attribute (already removed by separate patch) and should now use GPIO_ACTIVE_LOW for the client device description to be fully consistent. Fixes: 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors") CC: stable@vger.kernel.org Signed-off-by: H. Nikolaus Schaller Signed-off-by: Tony Lindgren --- arch/arm/boot/dts/omap3-gta04.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/omap3-gta04.dtsi b/arch/arm/boot/dts/omap3-gta04.dtsi index 003202d12990..7b8c18e6605e 100644 --- a/arch/arm/boot/dts/omap3-gta04.dtsi +++ b/arch/arm/boot/dts/omap3-gta04.dtsi @@ -114,7 +114,7 @@ gpio-sck = <&gpio1 12 GPIO_ACTIVE_HIGH>; gpio-miso = <&gpio1 18 GPIO_ACTIVE_HIGH>; gpio-mosi = <&gpio1 20 GPIO_ACTIVE_HIGH>; - cs-gpios = <&gpio1 19 GPIO_ACTIVE_HIGH>; + cs-gpios = <&gpio1 19 GPIO_ACTIVE_LOW>; num-chipselects = <1>; /* lcd panel */ From 5b5465dd947cb655550332d3fa509f91a768482b Mon Sep 17 00:00:00 2001 From: Bjorn Andersson Date: Mon, 21 Dec 2020 20:37:45 -0800 Subject: [PATCH 005/284] arm64: defconfig: Make INTERCONNECT_QCOM_SDM845 builtin As of v5.11-rc1 the QUP nodes of SDM845 has got their interconnect properties specified, this means that the relevant interconnect provider needs to be builtin for the UART device to probe and the console to be registered before userspace needs to access it. Reviewed-by: Georgi Djakov Signed-off-by: Bjorn Andersson Link: https://lore.kernel.org/r/20201222043745.3420447-1-bjorn.andersson@linaro.org Signed-off-by: Bjorn Andersson --- arch/arm64/configs/defconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index 838301650a79..3848ae99501c 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -1078,7 +1078,7 @@ CONFIG_INTERCONNECT=y CONFIG_INTERCONNECT_QCOM=y CONFIG_INTERCONNECT_QCOM_MSM8916=m CONFIG_INTERCONNECT_QCOM_OSM_L3=m -CONFIG_INTERCONNECT_QCOM_SDM845=m +CONFIG_INTERCONNECT_QCOM_SDM845=y CONFIG_INTERCONNECT_QCOM_SM8150=m CONFIG_INTERCONNECT_QCOM_SM8250=m CONFIG_EXT2_FS=y From 291b5c9870fc546376d69cf792b7885cd0c9c1b3 Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Mon, 21 Dec 2020 19:59:31 -0700 Subject: [PATCH 006/284] i3c/master/mipi-i3c-hci: Fix position of __maybe_unused in i3c_hci_of_match Clang warns: ../drivers/i3c/master/mipi-i3c-hci/core.c:780:21: warning: attribute declaration must precede definition [-Wignored-attributes] static const struct __maybe_unused of_device_id i3c_hci_of_match[] = { ^ ../include/linux/compiler_attributes.h:267:56: note: expanded from macro '__maybe_unused' #define __maybe_unused __attribute__((__unused__)) ^ ../include/linux/mod_devicetable.h:262:8: note: previous definition is here struct of_device_id { ^ 1 warning generated. 'struct of_device_id' should not be split, as it is a type. Move the __maybe_unused attribute after the static and const qualifiers so that there are no warnings about this variable, period. Fixes: 95393f3e07ab ("i3c/master/mipi-i3c-hci: quiet maybe-unused variable warning") Link: https://github.com/ClangBuiltLinux/linux/issues/1221 Signed-off-by: Nathan Chancellor Acked-by: Nicolas Pitre Signed-off-by: Alexandre Belloni Link: https://lore.kernel.org/r/20201222025931.3043480-1-natechancellor@gmail.com --- drivers/i3c/master/mipi-i3c-hci/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/i3c/master/mipi-i3c-hci/core.c b/drivers/i3c/master/mipi-i3c-hci/core.c index 500abd27fb22..1b73647cc3b1 100644 --- a/drivers/i3c/master/mipi-i3c-hci/core.c +++ b/drivers/i3c/master/mipi-i3c-hci/core.c @@ -777,7 +777,7 @@ static int i3c_hci_remove(struct platform_device *pdev) return 0; } -static const struct __maybe_unused of_device_id i3c_hci_of_match[] = { +static const __maybe_unused struct of_device_id i3c_hci_of_match[] = { { .compatible = "mipi-i3c-hci", }, {}, }; From 928eedf013b25fcaeb6aef2ad721ed92c2e8bc66 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Fri, 1 Jan 2021 22:12:27 -0800 Subject: [PATCH 007/284] Input: st1232 - fix off-by-one error in resolution handling Before, the maximum coordinates were fixed to (799, 479) or (319, 479), depending on touchscreen controller type. The driver was changed to read the actual values from the touchscreen controller, but did not take into account the returned values are not the maximum coordinates, but the touchscreen resolution (e.g. 800 and 480). Fix this by subtracting 1. Fixes: 3a54a215410b1650 ("Input: st1232 - add support resolution reading") Signed-off-by: Geert Uytterhoeven Link: https://lore.kernel.org/r/20201229162601.2154566-2-geert+renesas@glider.be Signed-off-by: Dmitry Torokhov --- drivers/input/touchscreen/st1232.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/input/touchscreen/st1232.c b/drivers/input/touchscreen/st1232.c index bda96762744e..f18d4c7e03da 100644 --- a/drivers/input/touchscreen/st1232.c +++ b/drivers/input/touchscreen/st1232.c @@ -85,8 +85,8 @@ static int st1232_ts_read_resolution(struct st1232_ts_data *ts, u16 *max_x, buf = ts->read_buf; - *max_x = ((buf[0] & 0x0070) << 4) | buf[1]; - *max_y = ((buf[0] & 0x0007) << 8) | buf[2]; + *max_x = (((buf[0] & 0x0070) << 4) | buf[1]) - 1; + *max_y = (((buf[0] & 0x0007) << 8) | buf[2]) - 1; return 0; } From b999dbea06b9874c7724a410f47a6bac1e219e37 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Fri, 1 Jan 2021 22:13:13 -0800 Subject: [PATCH 008/284] Input: st1232 - do not read more bytes than needed st1232_ts_read_data() already reads ts->read_buf_len bytes (8 or 20 bytes) from the touchscreen controller. This was fine when it was used to read touch point coordinates only, but is overkill for reading the touchscreen resolution, which just needs 3 bytes. Optimize transfers by passing the wanted number of bytes. Fixes: 3a54a215410b1650 ("Input: st1232 - add support resolution reading") Signed-off-by: Geert Uytterhoeven Link: https://lore.kernel.org/r/20201229162601.2154566-3-geert+renesas@glider.be Signed-off-by: Dmitry Torokhov --- drivers/input/touchscreen/st1232.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/input/touchscreen/st1232.c b/drivers/input/touchscreen/st1232.c index f18d4c7e03da..459701056f2b 100644 --- a/drivers/input/touchscreen/st1232.c +++ b/drivers/input/touchscreen/st1232.c @@ -47,7 +47,8 @@ struct st1232_ts_data { u8 *read_buf; }; -static int st1232_ts_read_data(struct st1232_ts_data *ts, u8 reg) +static int st1232_ts_read_data(struct st1232_ts_data *ts, u8 reg, + unsigned int n) { struct i2c_client *client = ts->client; struct i2c_msg msg[] = { @@ -59,7 +60,7 @@ static int st1232_ts_read_data(struct st1232_ts_data *ts, u8 reg) { .addr = client->addr, .flags = I2C_M_RD | I2C_M_DMA_SAFE, - .len = ts->read_buf_len, + .len = n, .buf = ts->read_buf, } }; @@ -79,7 +80,7 @@ static int st1232_ts_read_resolution(struct st1232_ts_data *ts, u16 *max_x, int error; /* select resolution register */ - error = st1232_ts_read_data(ts, REG_XY_RESOLUTION); + error = st1232_ts_read_data(ts, REG_XY_RESOLUTION, 3); if (error) return error; @@ -140,7 +141,7 @@ static irqreturn_t st1232_ts_irq_handler(int irq, void *dev_id) int count; int error; - error = st1232_ts_read_data(ts, REG_XY_COORDINATES); + error = st1232_ts_read_data(ts, REG_XY_COORDINATES, ts->read_buf_len); if (error) goto out; From f605be6a57b439df7568a865c187b81863018c95 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Fri, 1 Jan 2021 22:15:15 -0800 Subject: [PATCH 009/284] Input: st1232 - wait until device is ready before reading resolution According to the st1232 datasheet, the host has to wait for the device to change into Normal state before accessing registers other than the Status Register. If the reset GPIO is wired, the device is powered on during driver probe, just before reading the resolution. However, the latter may happen before the device is ready, leading to a probe failure: st1232-ts 1-0055: Failed to read resolution: -6 Fix this by waiting until the device is ready, by trying to read the Status Register until it indicates so, or until timeout. On Armadillo 800 EVA, typically the first read fails with an I2C transfer error, while the second read indicates the device is ready. Fixes: 3a54a215410b1650 ("Input: st1232 - add support resolution reading") Signed-off-by: Geert Uytterhoeven Link: https://lore.kernel.org/r/20201229162601.2154566-4-geert+renesas@glider.be Signed-off-by: Dmitry Torokhov --- drivers/input/touchscreen/st1232.c | 35 ++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/drivers/input/touchscreen/st1232.c b/drivers/input/touchscreen/st1232.c index 459701056f2b..b4e7bcbe9b91 100644 --- a/drivers/input/touchscreen/st1232.c +++ b/drivers/input/touchscreen/st1232.c @@ -26,6 +26,20 @@ #define ST1232_TS_NAME "st1232-ts" #define ST1633_TS_NAME "st1633-ts" +#define REG_STATUS 0x01 /* Device Status | Error Code */ + +#define STATUS_NORMAL 0x00 +#define STATUS_INIT 0x01 +#define STATUS_ERROR 0x02 +#define STATUS_AUTO_TUNING 0x03 +#define STATUS_IDLE 0x04 +#define STATUS_POWER_DOWN 0x05 + +#define ERROR_NONE 0x00 +#define ERROR_INVALID_ADDRESS 0x10 +#define ERROR_INVALID_VALUE 0x20 +#define ERROR_INVALID_PLATFORM 0x30 + #define REG_XY_RESOLUTION 0x04 #define REG_XY_COORDINATES 0x12 #define ST_TS_MAX_FINGERS 10 @@ -73,6 +87,22 @@ static int st1232_ts_read_data(struct st1232_ts_data *ts, u8 reg, return 0; } +static int st1232_ts_wait_ready(struct st1232_ts_data *ts) +{ + unsigned int retries; + int error; + + for (retries = 10; retries; retries--) { + error = st1232_ts_read_data(ts, REG_STATUS, 1); + if (!error && ts->read_buf[0] == (STATUS_NORMAL | ERROR_NONE)) + return 0; + + usleep_range(1000, 2000); + } + + return -ENXIO; +} + static int st1232_ts_read_resolution(struct st1232_ts_data *ts, u16 *max_x, u16 *max_y) { @@ -252,6 +282,11 @@ static int st1232_ts_probe(struct i2c_client *client, input_dev->name = "st1232-touchscreen"; input_dev->id.bustype = BUS_I2C; + /* Wait until device is ready */ + error = st1232_ts_wait_ready(ts); + if (error) + return error; + /* Read resolution from the chip */ error = st1232_ts_read_resolution(ts, &max_x, &max_y); if (error) { From a9164910c5ceed63551280a4a0b85d37ac2b19a5 Mon Sep 17 00:00:00 2001 From: Shawn Guo Date: Sat, 2 Jan 2021 12:59:40 +0800 Subject: [PATCH 010/284] arm64: dts: qcom: c630: keep both touchpad devices enabled Indicated by AML code in ACPI table, the touchpad in-use could be found on two possible slave addresses on &i2c3, i.e. hid@15 and hid@2c. And which one is in-use can be determined by reading another address on the I2C bus. Unfortunately, for DT boot, there is currently no support in firmware to make this check and patch DT accordingly. This results in a non-functional touchpad on those C630 devices with hid@2c. As i2c-hid driver will stop probing the device if there is nothing on the slave address, we can actually keep both devices enabled in DT, and i2c-hid driver will only probe the existing one. The only problem is that we cannot set up pinctrl in both device nodes, as two devices with the same pinctrl will cause pin conflict that makes the second device fail to probe. Let's move the pinctrl state up to parent node to solve this problem. As the pinctrl state of parent node is already defined in sdm845.dtsi, it ends up with overwriting pinctrl-0 with i2c3_hid_active state added in there. Fixes: 11d0e4f28156 ("arm64: dts: qcom: c630: Polish i2c-hid devices") Signed-off-by: Shawn Guo Link: https://lore.kernel.org/r/20210102045940.26874-1-shawn.guo@linaro.org Signed-off-by: Bjorn Andersson --- arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts b/arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts index 13fdd02cffe6..3be85161a54e 100644 --- a/arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts +++ b/arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts @@ -320,6 +320,8 @@ &i2c3 { status = "okay"; clock-frequency = <400000>; + /* Overwrite pinctrl-0 from sdm845.dtsi */ + pinctrl-0 = <&qup_i2c3_default &i2c3_hid_active>; tsel: hid@15 { compatible = "hid-over-i2c"; @@ -327,9 +329,6 @@ hid-descr-addr = <0x1>; interrupts-extended = <&tlmm 37 IRQ_TYPE_LEVEL_HIGH>; - - pinctrl-names = "default"; - pinctrl-0 = <&i2c3_hid_active>; }; tsc2: hid@2c { @@ -338,11 +337,6 @@ hid-descr-addr = <0x20>; interrupts-extended = <&tlmm 37 IRQ_TYPE_LEVEL_HIGH>; - - pinctrl-names = "default"; - pinctrl-0 = <&i2c3_hid_active>; - - status = "disabled"; }; }; From a3a9060ecad030e2c7903b2b258383d2c716b56c Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Sun, 3 Jan 2021 17:59:51 -0800 Subject: [PATCH 011/284] Input: i8042 - unbreak Pegatron C15B MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit g++ reports drivers/input/serio/i8042-x86ia64io.h:225:3: error: ‘.matches’ designator used multiple times in the same initializer list C99 semantics is that last duplicated initialiser wins, so DMI entry gets overwritten. Fixes: a48491c65b51 ("Input: i8042 - add ByteSpeed touchpad to noloop table") Signed-off-by: Alexey Dobriyan Acked-by: Po-Hsu Lin Link: https://lore.kernel.org/r/20201228072335.GA27766@localhost.localdomain Signed-off-by: Dmitry Torokhov --- drivers/input/serio/i8042-x86ia64io.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/input/serio/i8042-x86ia64io.h b/drivers/input/serio/i8042-x86ia64io.h index 3a2dcf0805f1..c74b020796a9 100644 --- a/drivers/input/serio/i8042-x86ia64io.h +++ b/drivers/input/serio/i8042-x86ia64io.h @@ -219,6 +219,8 @@ static const struct dmi_system_id __initconst i8042_dmi_noloop_table[] = { DMI_MATCH(DMI_SYS_VENDOR, "PEGATRON CORPORATION"), DMI_MATCH(DMI_PRODUCT_NAME, "C15B"), }, + }, + { .matches = { DMI_MATCH(DMI_SYS_VENDOR, "ByteSpeed LLC"), DMI_MATCH(DMI_PRODUCT_NAME, "ByteSpeed Laptop C15B"), From 60159e9e7bc7e528c103b6b6d47dfd83af29669c Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Sun, 3 Jan 2021 17:43:04 -0800 Subject: [PATCH 012/284] Input: ili210x - implement pressure reporting for ILI251x The ILI251x seems to report pressure information in the 5th byte of each per-finger touch data element. On the available hardware, this information has the values ranging from 0x0 to 0xa, which is also matching the downstream example code. Report pressure information on the ILI251x. Signed-off-by: Marek Vasut Link: https://lore.kernel.org/r/20201224071238.160098-1-marex@denx.de Signed-off-by: Dmitry Torokhov --- drivers/input/touchscreen/ili210x.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/drivers/input/touchscreen/ili210x.c b/drivers/input/touchscreen/ili210x.c index 199cf3daec10..d8fccf048bf4 100644 --- a/drivers/input/touchscreen/ili210x.c +++ b/drivers/input/touchscreen/ili210x.c @@ -29,11 +29,13 @@ struct ili2xxx_chip { void *buf, size_t len); int (*get_touch_data)(struct i2c_client *client, u8 *data); bool (*parse_touch_data)(const u8 *data, unsigned int finger, - unsigned int *x, unsigned int *y); + unsigned int *x, unsigned int *y, + unsigned int *z); bool (*continue_polling)(const u8 *data, bool touch); unsigned int max_touches; unsigned int resolution; bool has_calibrate_reg; + bool has_pressure_reg; }; struct ili210x { @@ -82,7 +84,8 @@ static int ili210x_read_touch_data(struct i2c_client *client, u8 *data) static bool ili210x_touchdata_to_coords(const u8 *touchdata, unsigned int finger, - unsigned int *x, unsigned int *y) + unsigned int *x, unsigned int *y, + unsigned int *z) { if (touchdata[0] & BIT(finger)) return false; @@ -137,7 +140,8 @@ static int ili211x_read_touch_data(struct i2c_client *client, u8 *data) static bool ili211x_touchdata_to_coords(const u8 *touchdata, unsigned int finger, - unsigned int *x, unsigned int *y) + unsigned int *x, unsigned int *y, + unsigned int *z) { u32 data; @@ -169,7 +173,8 @@ static const struct ili2xxx_chip ili211x_chip = { static bool ili212x_touchdata_to_coords(const u8 *touchdata, unsigned int finger, - unsigned int *x, unsigned int *y) + unsigned int *x, unsigned int *y, + unsigned int *z) { u16 val; @@ -235,7 +240,8 @@ static int ili251x_read_touch_data(struct i2c_client *client, u8 *data) static bool ili251x_touchdata_to_coords(const u8 *touchdata, unsigned int finger, - unsigned int *x, unsigned int *y) + unsigned int *x, unsigned int *y, + unsigned int *z) { u16 val; @@ -245,6 +251,7 @@ static bool ili251x_touchdata_to_coords(const u8 *touchdata, *x = val & 0x3fff; *y = get_unaligned_be16(touchdata + 1 + (finger * 5) + 2); + *z = touchdata[1 + (finger * 5) + 4]; return true; } @@ -261,6 +268,7 @@ static const struct ili2xxx_chip ili251x_chip = { .continue_polling = ili251x_check_continue_polling, .max_touches = 10, .has_calibrate_reg = true, + .has_pressure_reg = true, }; static bool ili210x_report_events(struct ili210x *priv, u8 *touchdata) @@ -268,14 +276,16 @@ static bool ili210x_report_events(struct ili210x *priv, u8 *touchdata) struct input_dev *input = priv->input; int i; bool contact = false, touch; - unsigned int x = 0, y = 0; + unsigned int x = 0, y = 0, z = 0; for (i = 0; i < priv->chip->max_touches; i++) { - touch = priv->chip->parse_touch_data(touchdata, i, &x, &y); + touch = priv->chip->parse_touch_data(touchdata, i, &x, &y, &z); input_mt_slot(input, i); if (input_mt_report_slot_state(input, MT_TOOL_FINGER, touch)) { touchscreen_report_pos(input, &priv->prop, x, y, true); + if (priv->chip->has_pressure_reg) + input_report_abs(input, ABS_MT_PRESSURE, z); contact = true; } } @@ -437,6 +447,8 @@ static int ili210x_i2c_probe(struct i2c_client *client, max_xy = (chip->resolution ?: SZ_64K) - 1; input_set_abs_params(input, ABS_MT_POSITION_X, 0, max_xy, 0, 0); input_set_abs_params(input, ABS_MT_POSITION_Y, 0, max_xy, 0, 0); + if (priv->chip->has_pressure_reg) + input_set_abs_params(input, ABS_MT_PRESSURE, 0, 0xa, 0, 0); touchscreen_parse_properties(input, true, &priv->prop); error = input_mt_init_slots(input, priv->chip->max_touches, From 7386a559caa6414e74578172c2bc4e636d6bd0a0 Mon Sep 17 00:00:00 2001 From: Serge Semin Date: Thu, 10 Dec 2020 12:17:47 +0300 Subject: [PATCH 013/284] arm64: dts: amlogic: meson-g12: Set FL-adj property value In accordance with the DWC USB3 bindings the property is supposed to have uint32 type. It's erroneous from the DT schema and driver points of view to declare it as boolean. As Neil suggested set it to 0x20 so not break the platform and to make the dtbs checker happy. Link: https://lore.kernel.org/linux-usb/20201010224121.12672-16-Sergey.Semin@baikalelectronics.ru/ Signed-off-by: Serge Semin Reviewed-by: Martin Blumenstingl Reviewed-by: Neil Armstrong Reviewed-by: Krzysztof Kozlowski Fixes: 9baf7d6be730 ("arm64: dts: meson: g12a: Add G12A USB nodes") Signed-off-by: Kevin Hilman Link: https://lore.kernel.org/r/20201210091756.18057-3-Sergey.Semin@baikalelectronics.ru --- arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi index 9c90d562ada1..221fcca3b0b9 100644 --- a/arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi +++ b/arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi @@ -2390,7 +2390,7 @@ interrupts = ; dr_mode = "host"; snps,dis_u2_susphy_quirk; - snps,quirk-frame-length-adjustment; + snps,quirk-frame-length-adjustment = <0x20>; snps,parkmode-disable-ss-quirk; }; }; From 698dc0cf944772a79a9aa417e647c0f7587e51df Mon Sep 17 00:00:00 2001 From: Heinrich Schuchardt Date: Tue, 5 Jan 2021 14:20:07 -0800 Subject: [PATCH 014/284] dt-bindings: input: adc-keys: clarify description The current description of ADC keys is not precise enough. "when this key is pressed" leaves it open if a key is considered pressed below or above the threshold. This has led to confusion: drivers/input/keyboard/adc-keys.c ignores the meaning of thresholds and sets the key that is closest to press-threshold-microvolt. This patch nails down the definitions and provides an interpretation of the supplied example. Signed-off-by: Heinrich Schuchardt Acked-by: Rob Herring Link: https://lore.kernel.org/r/20201222110815.24121-1-xypron.glpk@gmx.de Signed-off-by: Dmitry Torokhov --- .../devicetree/bindings/input/adc-keys.txt | 22 +++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/input/adc-keys.txt b/Documentation/devicetree/bindings/input/adc-keys.txt index e551814629b4..6c8be6a9ace2 100644 --- a/Documentation/devicetree/bindings/input/adc-keys.txt +++ b/Documentation/devicetree/bindings/input/adc-keys.txt @@ -5,7 +5,8 @@ Required properties: - compatible: "adc-keys" - io-channels: Phandle to an ADC channel - io-channel-names = "buttons"; - - keyup-threshold-microvolt: Voltage at which all the keys are considered up. + - keyup-threshold-microvolt: Voltage above or equal to which all the keys are + considered up. Optional properties: - poll-interval: Poll interval time in milliseconds @@ -17,7 +18,12 @@ Each button (key) is represented as a sub-node of "adc-keys": Required subnode-properties: - label: Descriptive name of the key. - linux,code: Keycode to emit. - - press-threshold-microvolt: Voltage ADC input when this key is pressed. + - press-threshold-microvolt: voltage above or equal to which this key is + considered pressed. + +No two values of press-threshold-microvolt may be the same. +All values of press-threshold-microvolt must be less than +keyup-threshold-microvolt. Example: @@ -47,3 +53,15 @@ Example: press-threshold-microvolt = <500000>; }; }; + ++--------------------------------+------------------------+ +| 2.000.000 <= value | no key pressed | ++--------------------------------+------------------------+ +| 1.500.000 <= value < 2.000.000 | KEY_VOLUMEUP pressed | ++--------------------------------+------------------------+ +| 1.000.000 <= value < 1.500.000 | KEY_VOLUMEDOWN pressed | ++--------------------------------+------------------------+ +| 500.000 <= value < 1.000.000 | KEY_ENTER pressed | ++--------------------------------+------------------------+ +| value < 500.000 | no key pressed | ++--------------------------------+------------------------+ From 656c648354e1561fa4f445b0b3252ec1d24e3951 Mon Sep 17 00:00:00 2001 From: Sandy Huang Date: Fri, 8 Jan 2021 12:06:27 +0100 Subject: [PATCH 015/284] arm64: dts: rockchip: fix vopl iommu irq on px30 The vop-mmu shares the irq with its matched vop but not the vpu. Fixes: 7053e06b1422 ("arm64: dts: rockchip: add core dtsi file for PX30 SoCs") Signed-off-by: Sandy Huang Signed-off-by: Heiko Stuebner Reviewed-by: Ezequiel Garcia Reviewed-by: Paul Kocialkowski Tested-by: Paul Kocialkowski Link: https://lore.kernel.org/r/20210108110627.3231226-1-heiko@sntech.de --- arch/arm64/boot/dts/rockchip/px30.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/rockchip/px30.dtsi b/arch/arm64/boot/dts/rockchip/px30.dtsi index 2695ea8cda14..64193292d26c 100644 --- a/arch/arm64/boot/dts/rockchip/px30.dtsi +++ b/arch/arm64/boot/dts/rockchip/px30.dtsi @@ -1097,7 +1097,7 @@ vopl_mmu: iommu@ff470f00 { compatible = "rockchip,iommu"; reg = <0x0 0xff470f00 0x0 0x100>; - interrupts = ; + interrupts = ; interrupt-names = "vopl_mmu"; clocks = <&cru ACLK_VOPL>, <&cru HCLK_VOPL>; clock-names = "aclk", "iface"; From 642fb2795290c4abe629ca34fb8ff6d78baa9fd3 Mon Sep 17 00:00:00 2001 From: Simon South Date: Wed, 30 Sep 2020 14:56:27 -0400 Subject: [PATCH 016/284] arm64: dts: rockchip: Use only supported PCIe link speed on Pinebook Pro On Pinebook Pro laptops with an NVMe SSD installed, prevent random crashes in the NVMe driver by not attempting to use a PCIe link speed higher than that supported by the RK3399 SoC. See commit 712fa1777207 ("arm64: dts: rockchip: add max-link-speed for rk3399"). Fixes: 5a65505a6988 ("arm64: dts: rockchip: Add initial support for Pinebook Pro") Signed-off-by: Simon South Link: https://lore.kernel.org/r/20200930185627.5918-1-simon@simonsouth.net Signed-off-by: Heiko Stuebner --- arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts b/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts index 06d48338c836..219b7507a10f 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts +++ b/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts @@ -790,7 +790,6 @@ &pcie0 { bus-scan-delay-ms = <1000>; ep-gpios = <&gpio2 RK_PD4 GPIO_ACTIVE_HIGH>; - max-link-speed = <2>; num-lanes = <4>; pinctrl-names = "default"; pinctrl-0 = <&pcie_clkreqn_cpm>; From 43f20b1c6140896916f4e91aacc166830a7ba849 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Sat, 15 Aug 2020 13:51:12 +0100 Subject: [PATCH 017/284] arm64: dts: rockchip: Fix PCIe DT properties on rk3399 It recently became apparent that the lack of a 'device_type = "pci"' in the PCIe root complex node for rk3399 is a violation of the PCI binding, as documented in IEEE Std 1275-1994. Changes to the kernel's parsing of the DT made such violation fatal, as drivers cannot probe the controller anymore. Add the missing property makes the PCIe node compliant. While we are at it, drop the pointless linux,pci-domain property, which only makes sense when there are multiple host bridges. Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20200815125112.462652-3-maz@kernel.org Signed-off-by: Heiko Stuebner --- arch/arm64/boot/dts/rockchip/rk3399.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi b/arch/arm64/boot/dts/rockchip/rk3399.dtsi index f5dee5f447bb..52bce81cfe77 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi @@ -234,6 +234,7 @@ reg = <0x0 0xf8000000 0x0 0x2000000>, <0x0 0xfd000000 0x0 0x1000000>; reg-names = "axi-base", "apb-base"; + device_type = "pci"; #address-cells = <3>; #size-cells = <2>; #interrupt-cells = <1>; @@ -252,7 +253,6 @@ <0 0 0 2 &pcie0_intc 1>, <0 0 0 3 &pcie0_intc 2>, <0 0 0 4 &pcie0_intc 3>; - linux,pci-domain = <0>; max-link-speed = <1>; msi-map = <0x0 &its 0x0 0x1000>; phys = <&pcie_phy 0>, <&pcie_phy 1>, From 25669e943e06c56750fb2347cce4f3343379e4b2 Mon Sep 17 00:00:00 2001 From: AngeloGioacchino Del Regno Date: Sat, 9 Jan 2021 22:13:56 -0800 Subject: [PATCH 018/284] dt-bindings: input: touchscreen: goodix: Add binding for GT9286 IC Support for this chip is being added to the goodix driver: add the DT binding for it. Signed-off-by: AngeloGioacchino Del Regno Link: https://lore.kernel.org/r/20210109135512.149032-3-angelogioacchino.delregno@somainline.org Reviewed-by: Bastien Nocera Signed-off-by: Dmitry Torokhov --- Documentation/devicetree/bindings/input/touchscreen/goodix.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/input/touchscreen/goodix.yaml b/Documentation/devicetree/bindings/input/touchscreen/goodix.yaml index da5b0d87e16d..93f2ce3130ae 100644 --- a/Documentation/devicetree/bindings/input/touchscreen/goodix.yaml +++ b/Documentation/devicetree/bindings/input/touchscreen/goodix.yaml @@ -26,6 +26,7 @@ properties: - goodix,gt927 - goodix,gt9271 - goodix,gt928 + - goodix,gt9286 - goodix,gt967 reg: From 2dce6db70c77bbe639f5cd9cc796fb8f2694a7d0 Mon Sep 17 00:00:00 2001 From: AngeloGioacchino Del Regno Date: Sat, 9 Jan 2021 22:14:39 -0800 Subject: [PATCH 019/284] Input: goodix - add support for Goodix GT9286 chip The Goodix GT9286 is a capacitive touch sensor IC based on GT1x. This chip can be found on a number of smartphones, including the F(x)tec Pro 1 and the Elephone U. This has been tested on F(x)Tec Pro1 (MSM8998). Signed-off-by: AngeloGioacchino Del Regno Link: https://lore.kernel.org/r/20210109135512.149032-2-angelogioacchino.delregno@somainline.org Reviewed-by: Bastien Nocera Signed-off-by: Dmitry Torokhov --- drivers/input/touchscreen/goodix.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/input/touchscreen/goodix.c b/drivers/input/touchscreen/goodix.c index 19765f1c04f7..c682b028f0a2 100644 --- a/drivers/input/touchscreen/goodix.c +++ b/drivers/input/touchscreen/goodix.c @@ -157,6 +157,7 @@ static const struct goodix_chip_id goodix_chip_ids[] = { { .id = "5663", .data = >1x_chip_data }, { .id = "5688", .data = >1x_chip_data }, { .id = "917S", .data = >1x_chip_data }, + { .id = "9286", .data = >1x_chip_data }, { .id = "911", .data = >911_chip_data }, { .id = "9271", .data = >911_chip_data }, @@ -1448,6 +1449,7 @@ static const struct of_device_id goodix_of_match[] = { { .compatible = "goodix,gt927" }, { .compatible = "goodix,gt9271" }, { .compatible = "goodix,gt928" }, + { .compatible = "goodix,gt9286" }, { .compatible = "goodix,gt967" }, { } }; From 637464c59e0bb13a1da6abf1d7c4b9f9c01646d2 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Mon, 4 Jan 2021 17:21:25 -0800 Subject: [PATCH 020/284] ACPI: NFIT: Fix flexible_array.cocci warnings Julia and 0day report: Zero-length and one-element arrays are deprecated, see Documentation/process/deprecated.rst Flexible-array members should be used instead. However, a straight conversion to flexible arrays yields: drivers/acpi/nfit/core.c:2276:4: error: flexible array member in a struct with no named members drivers/acpi/nfit/core.c:2287:4: error: flexible array member in a struct with no named members Instead, just use plain arrays not embedded flexible arrays. Cc: Denis Efremov Reported-by: kernel test robot Reported-by: Julia Lawall Signed-off-by: Dan Williams --- drivers/acpi/nfit/core.c | 75 +++++++++++++++------------------------- 1 file changed, 28 insertions(+), 47 deletions(-) diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index b11b08a60684..8c5dde628405 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -2269,40 +2269,24 @@ static const struct attribute_group *acpi_nfit_region_attribute_groups[] = { /* enough info to uniquely specify an interleave set */ struct nfit_set_info { - struct nfit_set_info_map { - u64 region_offset; - u32 serial_number; - u32 pad; - } mapping[0]; + u64 region_offset; + u32 serial_number; + u32 pad; }; struct nfit_set_info2 { - struct nfit_set_info_map2 { - u64 region_offset; - u32 serial_number; - u16 vendor_id; - u16 manufacturing_date; - u8 manufacturing_location; - u8 reserved[31]; - } mapping[0]; + u64 region_offset; + u32 serial_number; + u16 vendor_id; + u16 manufacturing_date; + u8 manufacturing_location; + u8 reserved[31]; }; -static size_t sizeof_nfit_set_info(int num_mappings) -{ - return sizeof(struct nfit_set_info) - + num_mappings * sizeof(struct nfit_set_info_map); -} - -static size_t sizeof_nfit_set_info2(int num_mappings) -{ - return sizeof(struct nfit_set_info2) - + num_mappings * sizeof(struct nfit_set_info_map2); -} - static int cmp_map_compat(const void *m0, const void *m1) { - const struct nfit_set_info_map *map0 = m0; - const struct nfit_set_info_map *map1 = m1; + const struct nfit_set_info *map0 = m0; + const struct nfit_set_info *map1 = m1; return memcmp(&map0->region_offset, &map1->region_offset, sizeof(u64)); @@ -2310,8 +2294,8 @@ static int cmp_map_compat(const void *m0, const void *m1) static int cmp_map(const void *m0, const void *m1) { - const struct nfit_set_info_map *map0 = m0; - const struct nfit_set_info_map *map1 = m1; + const struct nfit_set_info *map0 = m0; + const struct nfit_set_info *map1 = m1; if (map0->region_offset < map1->region_offset) return -1; @@ -2322,8 +2306,8 @@ static int cmp_map(const void *m0, const void *m1) static int cmp_map2(const void *m0, const void *m1) { - const struct nfit_set_info_map2 *map0 = m0; - const struct nfit_set_info_map2 *map1 = m1; + const struct nfit_set_info2 *map0 = m0; + const struct nfit_set_info2 *map1 = m1; if (map0->region_offset < map1->region_offset) return -1; @@ -2361,22 +2345,22 @@ static int acpi_nfit_init_interleave_set(struct acpi_nfit_desc *acpi_desc, return -ENOMEM; import_guid(&nd_set->type_guid, spa->range_guid); - info = devm_kzalloc(dev, sizeof_nfit_set_info(nr), GFP_KERNEL); + info = devm_kcalloc(dev, nr, sizeof(*info), GFP_KERNEL); if (!info) return -ENOMEM; - info2 = devm_kzalloc(dev, sizeof_nfit_set_info2(nr), GFP_KERNEL); + info2 = devm_kcalloc(dev, nr, sizeof(*info2), GFP_KERNEL); if (!info2) return -ENOMEM; for (i = 0; i < nr; i++) { struct nd_mapping_desc *mapping = &ndr_desc->mapping[i]; - struct nfit_set_info_map *map = &info->mapping[i]; - struct nfit_set_info_map2 *map2 = &info2->mapping[i]; struct nvdimm *nvdimm = mapping->nvdimm; struct nfit_mem *nfit_mem = nvdimm_provider_data(nvdimm); - struct acpi_nfit_memory_map *memdev = memdev_from_spa(acpi_desc, - spa->range_index, i); + struct nfit_set_info *map = &info[i]; + struct nfit_set_info2 *map2 = &info2[i]; + struct acpi_nfit_memory_map *memdev = + memdev_from_spa(acpi_desc, spa->range_index, i); struct acpi_nfit_control_region *dcr = nfit_mem->dcr; if (!memdev || !nfit_mem->dcr) { @@ -2395,23 +2379,20 @@ static int acpi_nfit_init_interleave_set(struct acpi_nfit_desc *acpi_desc, } /* v1.1 namespaces */ - sort(&info->mapping[0], nr, sizeof(struct nfit_set_info_map), - cmp_map, NULL); - nd_set->cookie1 = nd_fletcher64(info, sizeof_nfit_set_info(nr), 0); + sort(info, nr, sizeof(*info), cmp_map, NULL); + nd_set->cookie1 = nd_fletcher64(info, sizeof(*info) * nr, 0); /* v1.2 namespaces */ - sort(&info2->mapping[0], nr, sizeof(struct nfit_set_info_map2), - cmp_map2, NULL); - nd_set->cookie2 = nd_fletcher64(info2, sizeof_nfit_set_info2(nr), 0); + sort(info2, nr, sizeof(*info2), cmp_map2, NULL); + nd_set->cookie2 = nd_fletcher64(info2, sizeof(*info2) * nr, 0); /* support v1.1 namespaces created with the wrong sort order */ - sort(&info->mapping[0], nr, sizeof(struct nfit_set_info_map), - cmp_map_compat, NULL); - nd_set->altcookie = nd_fletcher64(info, sizeof_nfit_set_info(nr), 0); + sort(info, nr, sizeof(*info), cmp_map_compat, NULL); + nd_set->altcookie = nd_fletcher64(info, sizeof(*info) * nr, 0); /* record the result of the sort for the mapping position */ for (i = 0; i < nr; i++) { - struct nfit_set_info_map2 *map2 = &info2->mapping[i]; + struct nfit_set_info2 *map2 = &info2[i]; int j; for (j = 0; j < nr; j++) { From 5b04cb8224ef9bf0d9af8a4c0e6e23806bb2d720 Mon Sep 17 00:00:00 2001 From: Jianpeng Ma Date: Tue, 29 Dec 2020 08:26:35 +0800 Subject: [PATCH 021/284] libnvdimm/pmem: Remove unused header 'commit a8b456d01cd6 ("bdi: remove BDI_CAP_SYNCHRONOUS_IO")' forgot remove the related header file. Signed-off-by: Jianpeng Ma Reviewed-by: Ira Weiny Link: https://lore.kernel.org/r/20201229002635.42555-1-jianpeng.ma@intel.com Signed-off-by: Dan Williams --- drivers/nvdimm/pmem.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 875076b0ea6c..f33bdae626ba 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -23,7 +23,6 @@ #include #include #include -#include #include #include #include "pmem.h" From 5d06f72dc29c16a4868dd7ea0a6122454267809b Mon Sep 17 00:00:00 2001 From: Souptick Joarder Date: Mon, 11 Jan 2021 19:41:01 -0800 Subject: [PATCH 022/284] Input: ariel-pwrbutton - remove unused variable ariel_pwrbutton_id_table Kernel test robot throws below warning -> >> drivers/input/misc/ariel-pwrbutton.c:152:35: warning: unused variable >> 'ariel_pwrbutton_id_table' [-Wunused-const-variable] static const struct spi_device_id ariel_pwrbutton_id_table[] = { ^ 1 warning generated. Remove unused variable ariel_pwrbutton_id_table[] if no plan to use it further. Reported-by: kernel test robot Signed-off-by: Souptick Joarder Reviewed-by: Lubomir Rintel Link: https://lore.kernel.org/r/1608581041-4354-1-git-send-email-jrdr.linux@gmail.com Signed-off-by: Dmitry Torokhov --- drivers/input/misc/ariel-pwrbutton.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/drivers/input/misc/ariel-pwrbutton.c b/drivers/input/misc/ariel-pwrbutton.c index eda86ab552b9..17bbaac8b80c 100644 --- a/drivers/input/misc/ariel-pwrbutton.c +++ b/drivers/input/misc/ariel-pwrbutton.c @@ -149,12 +149,6 @@ static const struct of_device_id ariel_pwrbutton_of_match[] = { }; MODULE_DEVICE_TABLE(of, ariel_pwrbutton_of_match); -static const struct spi_device_id ariel_pwrbutton_id_table[] = { - { "wyse-ariel-ec-input", 0 }, - {} -}; -MODULE_DEVICE_TABLE(spi, ariel_pwrbutton_id_table); - static struct spi_driver ariel_pwrbutton_driver = { .driver = { .name = "dell-wyse-ariel-ec-input", From 2672b94d730c4b69a17ce297dc3fa60b980e72dc Mon Sep 17 00:00:00 2001 From: Tero Kristo Date: Thu, 17 Dec 2020 15:07:21 +0200 Subject: [PATCH 023/284] MAINTAINERS: Update my email address and maintainer level status My employment with TI is ending tomorrow, so update the email address entry in the maintainers file. Also, I don't expect to spend that much time with maintaining TI code anymore, so downgrade the status level to odd fixes only on areas where I remain as the main contact point for now, and move myself as secondary contact point where someone else has taken over the maintainership. Signed-off-by: Tero Kristo Signed-off-by: Tero Kristo Signed-off-by: Nishanth Menon Acked-by: Nishanth Menon Cc: Stephen Boyd Cc: Michael Turquette Cc: Nishanth Menon Cc: Santosh Shilimkar Cc: Borislav Petkov Cc: Tony Luck Link: https://lore.kernel.org/r/20201217130721.23555-1-t-kristo@ti.com --- MAINTAINERS | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 546aa66428c9..ba3e71fde323 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2617,8 +2617,8 @@ S: Maintained F: drivers/power/reset/keystone-reset.c ARM/TEXAS INSTRUMENTS K3 ARCHITECTURE -M: Tero Kristo M: Nishanth Menon +M: Tero Kristo L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) S: Supported F: Documentation/devicetree/bindings/arm/ti/k3.yaml @@ -6474,9 +6474,9 @@ S: Maintained F: drivers/edac/skx_*.[ch] EDAC-TI -M: Tero Kristo +M: Tero Kristo L: linux-edac@vger.kernel.org -S: Maintained +S: Odd Fixes F: drivers/edac/ti_edac.c EDIROL UA-101/UA-1000 DRIVER @@ -17555,7 +17555,7 @@ F: drivers/iio/dac/ti-dac7612.c TEXAS INSTRUMENTS' SYSTEM CONTROL INTERFACE (TISCI) PROTOCOL DRIVER M: Nishanth Menon -M: Tero Kristo +M: Tero Kristo M: Santosh Shilimkar L: linux-arm-kernel@lists.infradead.org S: Maintained @@ -17699,9 +17699,9 @@ S: Maintained F: drivers/clk/clk-cdce706.c TI CLOCK DRIVER -M: Tero Kristo +M: Tero Kristo L: linux-omap@vger.kernel.org -S: Maintained +S: Odd Fixes F: drivers/clk/ti/ F: include/linux/clk/ti.h From 93f2a11580a9732c1d90f9e01a7e9facc825658f Mon Sep 17 00:00:00 2001 From: Bjorn Andersson Date: Mon, 21 Dec 2020 16:11:03 -0800 Subject: [PATCH 024/284] arm64: dts: qcom: sdm845: Reserve LPASS clocks in gcc The GCC_LPASS_Q6_AXI_CLK and GCC_LPASS_SWAY_CLK clocks may not be touched on a typical UEFI based SDM845 device, but when the kernel is built with CONFIG_SDM_LPASSCC_845 this happens, unless they are marked as protected-clocks in the DT. This was done for the MTP and the Pocophone, but not for DB845c and the Lenovo Yoga C630 - causing these to fail to boot if the LPASS clock controller is enabled (which it typically isn't). Tested-by: Vinod Koul #on db845c Reviewed-by: Vinod Koul Link: https://lore.kernel.org/r/20201222001103.3112306-1-bjorn.andersson@linaro.org Signed-off-by: Bjorn Andersson --- arch/arm64/boot/dts/qcom/sdm845-db845c.dts | 4 +++- arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts index 7cc236575ee2..c0b93813ea9a 100644 --- a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts +++ b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts @@ -415,7 +415,9 @@ &gcc { protected-clocks = , , - ; + , + , + ; }; &gpu { diff --git a/arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts b/arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts index 3be85161a54e..8b40f96e9780 100644 --- a/arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts +++ b/arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts @@ -302,7 +302,9 @@ &gcc { protected-clocks = , , - ; + , + , + ; }; &gpu { From 43377df70480f82919032eb09832e9646a8a5efb Mon Sep 17 00:00:00 2001 From: Chenxin Jin Date: Wed, 13 Jan 2021 16:59:05 +0800 Subject: [PATCH 025/284] USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000 Teraoka AD2000 uses the CP210x driver, but the chip VID/PID is customized with 0988/0578. We need the driver to support the new VID/PID. Signed-off-by: Chenxin Jin Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold --- drivers/usb/serial/cp210x.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c index fbb10dfc56e3..06f3cfc9f19a 100644 --- a/drivers/usb/serial/cp210x.c +++ b/drivers/usb/serial/cp210x.c @@ -61,6 +61,7 @@ static const struct usb_device_id id_table[] = { { USB_DEVICE(0x08e6, 0x5501) }, /* Gemalto Prox-PU/CU contactless smartcard reader */ { USB_DEVICE(0x08FD, 0x000A) }, /* Digianswer A/S , ZigBee/802.15.4 MAC Device */ { USB_DEVICE(0x0908, 0x01FF) }, /* Siemens RUGGEDCOM USB Serial Console */ + { USB_DEVICE(0x0988, 0x0578) }, /* Teraoka AD2000 */ { USB_DEVICE(0x0B00, 0x3070) }, /* Ingenico 3070 */ { USB_DEVICE(0x0BED, 0x1100) }, /* MEI (TM) Cashflow-SC Bill/Voucher Acceptor */ { USB_DEVICE(0x0BED, 0x1101) }, /* MEI series 2000 Combo Acceptor */ From a0572c0734e4926ac51a31f97c12f752e1cdc7c8 Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Tue, 1 Dec 2020 12:13:29 +0100 Subject: [PATCH 026/284] ARM: dts: stm32: Fix polarity of the DH DRC02 uSD card detect The uSD card detect signal on the DH DRC02 is active-high, with a default pull down resistor on the board. Invert the polarity. Fixes: fde180f06d7b ("ARM: dts: stm32: Add DHSOM based DRC02 board") Signed-off-by: Marek Vasut Cc: Alexandre Torgue Cc: Maxime Coquelin Cc: Patrice Chotard Cc: Patrick Delaunay Cc: linux-stm32@st-md-mailman.stormreply.com To: linux-arm-kernel@lists.infradead.org -- Note that this could not be tested on prototype SoMs, now that it is tested, this issue surfaced, so it needs to be fixed. Signed-off-by: Alexandre Torgue --- arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi b/arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi index 62ab23824a3e..3299a42d8063 100644 --- a/arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi +++ b/arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi @@ -104,7 +104,7 @@ * are used for on-board microSD slot instead. */ /delete-property/broken-cd; - cd-gpios = <&gpioi 10 (GPIO_ACTIVE_LOW | GPIO_PULL_UP)>; + cd-gpios = <&gpioi 10 GPIO_ACTIVE_HIGH>; disable-wp; }; From 1a9b001237f85d3cf11a408c2daca6a2245b2add Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Tue, 1 Dec 2020 12:13:30 +0100 Subject: [PATCH 027/284] ARM: dts: stm32: Connect card-detect signal on DHCOM The DHCOM SoM uSD slot card detect signal is connected to GPIO PG1, describe it in the DT. Fixes: 34e0c7847dcf ("ARM: dts: stm32: Add DH Electronics DHCOM STM32MP1 SoM and PDK2 board") Signed-off-by: Marek Vasut Cc: Alexandre Torgue Cc: Maxime Coquelin Cc: Patrice Chotard Cc: Patrick Delaunay Cc: linux-stm32@st-md-mailman.stormreply.com To: linux-arm-kernel@lists.infradead.org Signed-off-by: Alexandre Torgue --- arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi b/arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi index ac46ab363e1b..c77ab1bfdd3e 100644 --- a/arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi +++ b/arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi @@ -390,7 +390,7 @@ pinctrl-0 = <&sdmmc1_b4_pins_a &sdmmc1_dir_pins_a>; pinctrl-1 = <&sdmmc1_b4_od_pins_a &sdmmc1_dir_pins_a>; pinctrl-2 = <&sdmmc1_b4_sleep_pins_a &sdmmc1_dir_sleep_pins_a>; - broken-cd; + cd-gpios = <&gpiog 1 (GPIO_ACTIVE_LOW | GPIO_PULL_UP)>; st,sig-dir; st,neg-edge; st,use-ckin; From 063a60634d48ee89f697371c9850c9370e494f22 Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Tue, 1 Dec 2020 12:13:31 +0100 Subject: [PATCH 028/284] ARM: dts: stm32: Disable WP on DHCOM uSD slot The uSD slot has no WP detection, disable it. Fixes: 34e0c7847dcf ("ARM: dts: stm32: Add DH Electronics DHCOM STM32MP1 SoM and PDK2 board") Signed-off-by: Marek Vasut Cc: Alexandre Torgue Cc: Maxime Coquelin Cc: Patrice Chotard Cc: Patrick Delaunay Cc: linux-stm32@st-md-mailman.stormreply.com To: linux-arm-kernel@lists.infradead.org Signed-off-by: Alexandre Torgue --- arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi b/arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi index c77ab1bfdd3e..daff5318f301 100644 --- a/arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi +++ b/arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi @@ -391,6 +391,7 @@ pinctrl-1 = <&sdmmc1_b4_od_pins_a &sdmmc1_dir_pins_a>; pinctrl-2 = <&sdmmc1_b4_sleep_pins_a &sdmmc1_dir_sleep_pins_a>; cd-gpios = <&gpiog 1 (GPIO_ACTIVE_LOW | GPIO_PULL_UP)>; + disable-wp; st,sig-dir; st,neg-edge; st,use-ckin; From 087698939f30d489e785d7df3e6aa5dce2487b39 Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Thu, 7 Jan 2021 16:07:42 +0100 Subject: [PATCH 029/284] ARM: dts: stm32: Disable optional TSC2004 on DRC02 board The DRC02 has no use for the on-SoM touchscreen controller, and the on-SoM touchscreen controller may not even be populated, which then results in error messages in kernel log. Disable the touchscreen controller in DT. Fixes: fde180f06d7b ("ARM: dts: stm32: Add DHSOM based DRC02 board") Signed-off-by: Marek Vasut Cc: Alexandre Torgue Cc: Maxime Coquelin Cc: Patrice Chotard Cc: Patrick Delaunay Cc: linux-stm32@st-md-mailman.stormreply.com To: linux-arm-kernel@lists.infradead.org Signed-off-by: Alexandre Torgue --- arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi b/arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi index 3299a42d8063..4cabdade6432 100644 --- a/arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi +++ b/arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi @@ -87,6 +87,12 @@ }; }; +&i2c4 { + touchscreen@49 { + status = "disabled"; + }; +}; + &i2c5 { /* TP7/TP8 */ pinctrl-names = "default"; pinctrl-0 = <&i2c5_pins_a>; From bcbacfb82c7010431182a8aecb860c752e3aed8c Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Thu, 24 Dec 2020 07:24:38 +0100 Subject: [PATCH 030/284] ARM: dts: stm32: Fix GPIO hog names on DHCOM The GPIO hog node name should match regex '^(hog-[0-9]+|.+-hog(-[0-9]+)?)$', make it so and fix the following two make dtbs_check warnings: arch/arm/boot/dts/stm32mp157c-dhcom-picoitx.dt.yaml: hog-usb-port-power: $nodename:0: 'hog-usb-port-power' does not match '^(hog-[0-9]+|.+-hog(-[0-9]+)?)$' arch/arm/boot/dts/stm32mp153c-dhcom-drc02.dt.yaml: hog-usb-hub: $nodename:0: 'hog-usb-hub' does not match '^(hog-[0-9]+|.+-hog(-[0-9]+)?)$' Fixes: ac68793f49de ("ARM: dts: stm32: Add DHCOM based PicoITX board") Signed-off-by: Marek Vasut Cc: Alexandre Torgue Cc: Maxime Coquelin Cc: Patrice Chotard Cc: Patrick Delaunay Cc: linux-stm32@st-md-mailman.stormreply.com To: linux-arm-kernel@lists.infradead.org Signed-off-by: Alexandre Torgue --- arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi | 4 ++-- arch/arm/boot/dts/stm32mp15xx-dhcom-picoitx.dtsi | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi b/arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi index 4cabdade6432..28c01cd9ac70 100644 --- a/arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi +++ b/arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi @@ -33,7 +33,7 @@ * during TX anyway and that it only controls drive enable DE * line. Hence, the RX is always enabled here. */ - rs485-rx-en { + rs485-rx-en-hog { gpio-hog; gpios = <8 GPIO_ACTIVE_HIGH>; output-low; @@ -61,7 +61,7 @@ * order to reset the Hub when USB bus is powered down, but * so far there is no such functionality. */ - usb-hub { + usb-hub-hog { gpio-hog; gpios = <2 GPIO_ACTIVE_HIGH>; output-high; diff --git a/arch/arm/boot/dts/stm32mp15xx-dhcom-picoitx.dtsi b/arch/arm/boot/dts/stm32mp15xx-dhcom-picoitx.dtsi index 356150d28c42..b99f2b891629 100644 --- a/arch/arm/boot/dts/stm32mp15xx-dhcom-picoitx.dtsi +++ b/arch/arm/boot/dts/stm32mp15xx-dhcom-picoitx.dtsi @@ -43,7 +43,7 @@ * in order to turn on port power when USB bus is powered up, but so * far there is no such functionality. */ - usb-port-power { + usb-port-power-hog { gpio-hog; gpios = <13 GPIO_ACTIVE_LOW>; output-low; From 10793e557acece49fe1c55e8f4563f6b89543c18 Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Tue, 29 Dec 2020 18:55:20 +0100 Subject: [PATCH 031/284] ARM: dts: stm32: Fix GPIO hog flags on DHCOM PicoITX The GPIO hog flags are ignored by gpiolib-of.c now, set the flags to 0. Due to a change in gpiolib-of.c, setting flags to GPIO_ACTIVE_LOW and using output-low DT property leads to the GPIO being set high instead. Fixes: ac68793f49de ("ARM: dts: stm32: Add DHCOM based PicoITX board") Signed-off-by: Marek Vasut Cc: Alexandre Torgue Cc: Maxime Coquelin Cc: Patrice Chotard Cc: Patrick Delaunay Cc: linux-stm32@st-md-mailman.stormreply.com To: linux-arm-kernel@lists.infradead.org Signed-off-by: Alexandre Torgue --- arch/arm/boot/dts/stm32mp15xx-dhcom-picoitx.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/stm32mp15xx-dhcom-picoitx.dtsi b/arch/arm/boot/dts/stm32mp15xx-dhcom-picoitx.dtsi index b99f2b891629..32700cca24c8 100644 --- a/arch/arm/boot/dts/stm32mp15xx-dhcom-picoitx.dtsi +++ b/arch/arm/boot/dts/stm32mp15xx-dhcom-picoitx.dtsi @@ -45,7 +45,7 @@ */ usb-port-power-hog { gpio-hog; - gpios = <13 GPIO_ACTIVE_LOW>; + gpios = <13 0>; output-low; line-name = "usb-port-power"; }; From 83d411224025ac1baab981e3d2f5d29e7761541d Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Tue, 29 Dec 2020 18:55:21 +0100 Subject: [PATCH 032/284] ARM: dts: stm32: Fix GPIO hog flags on DHCOM DRC02 The GPIO hog flags are ignored by gpiolib-of.c now, set the flags to 0. Since GPIO_ACTIVE_HIGH is defined as 0, this change only increases the correctness of the DT. Fixes: fde180f06d7b ("ARM: dts: stm32: Add DHSOM based DRC02 board") Signed-off-by: Marek Vasut Cc: Alexandre Torgue Cc: Maxime Coquelin Cc: Patrice Chotard Cc: Patrick Delaunay Cc: linux-stm32@st-md-mailman.stormreply.com To: linux-arm-kernel@lists.infradead.org Signed-off-by: Alexandre Torgue --- arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi b/arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi index 28c01cd9ac70..5088dd3a301b 100644 --- a/arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi +++ b/arch/arm/boot/dts/stm32mp15xx-dhcom-drc02.dtsi @@ -35,7 +35,7 @@ */ rs485-rx-en-hog { gpio-hog; - gpios = <8 GPIO_ACTIVE_HIGH>; + gpios = <8 0>; output-low; line-name = "rs485-rx-en"; }; @@ -63,7 +63,7 @@ */ usb-hub-hog { gpio-hog; - gpios = <2 GPIO_ACTIVE_HIGH>; + gpios = <2 0>; output-high; line-name = "usb-hub-reset"; }; From 06862d789ddde8a99c1e579e934ca17c15a84755 Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Sun, 10 Jan 2021 21:09:11 +0200 Subject: [PATCH 033/284] ARM: OMAP2+: Fix suspcious RCU usage splats for omap_enter_idle_coupled We get suspcious RCU usage splats with cpuidle in several places in omap_enter_idle_coupled() with the kernel debug options enabled: RCU used illegally from extended quiescent state! ... (_raw_spin_lock_irqsave) (omap_enter_idle_coupled+0x17c/0x2d8) (omap_enter_idle_coupled) (cpuidle_enter_state) (cpuidle_enter_state_coupled) (cpuidle_enter) Let's use RCU_NONIDLE to suppress these splats. Things got changed around with commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the idle path") that started triggering these warnings. For the tick_broadcast related calls, ideally we'd just switch over to using CPUIDLE_FLAG_TIMER_STOP for omap_enter_idle_coupled() to have the generic cpuidle code handle the tick_broadcast related calls for us and then just drop the tick_broadcast calls here. But we're currently missing the call in the common cpuidle code for tick_broadcast_enable() that CPU1 hotplug needs as described in earlier commit 50d6b3cf9403 ("ARM: OMAP2+: fix lack of timer interrupts on CPU1 after hotplug"). Cc: Daniel Lezcano Cc: Paul E. McKenney Cc: Russell King Acked-by: Paul E. McKenney Signed-off-by: Tony Lindgren --- arch/arm/mach-omap2/cpuidle44xx.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c index c8d317fafe2e..de37027ad758 100644 --- a/arch/arm/mach-omap2/cpuidle44xx.c +++ b/arch/arm/mach-omap2/cpuidle44xx.c @@ -151,10 +151,10 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, (cx->mpu_logic_state == PWRDM_POWER_OFF); /* Enter broadcast mode for periodic timers */ - tick_broadcast_enable(); + RCU_NONIDLE(tick_broadcast_enable()); /* Enter broadcast mode for one-shot timers */ - tick_broadcast_enter(); + RCU_NONIDLE(tick_broadcast_enter()); /* * Call idle CPU PM enter notifier chain so that @@ -166,7 +166,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, if (dev->cpu == 0) { pwrdm_set_logic_retst(mpu_pd, cx->mpu_logic_state); - omap_set_pwrdm_state(mpu_pd, cx->mpu_state); + RCU_NONIDLE(omap_set_pwrdm_state(mpu_pd, cx->mpu_state)); /* * Call idle CPU cluster PM enter notifier chain @@ -178,7 +178,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, index = 0; cx = state_ptr + index; pwrdm_set_logic_retst(mpu_pd, cx->mpu_logic_state); - omap_set_pwrdm_state(mpu_pd, cx->mpu_state); + RCU_NONIDLE(omap_set_pwrdm_state(mpu_pd, cx->mpu_state)); mpuss_can_lose_context = 0; } } @@ -194,9 +194,9 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, mpuss_can_lose_context) gic_dist_disable(); - clkdm_deny_idle(cpu_clkdm[1]); - omap_set_pwrdm_state(cpu_pd[1], PWRDM_POWER_ON); - clkdm_allow_idle(cpu_clkdm[1]); + RCU_NONIDLE(clkdm_deny_idle(cpu_clkdm[1])); + RCU_NONIDLE(omap_set_pwrdm_state(cpu_pd[1], PWRDM_POWER_ON)); + RCU_NONIDLE(clkdm_allow_idle(cpu_clkdm[1])); if (IS_PM44XX_ERRATUM(PM_OMAP4_ROM_SMP_BOOT_ERRATUM_GICD) && mpuss_can_lose_context) { @@ -222,7 +222,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, cpu_pm_exit(); cpu_pm_out: - tick_broadcast_exit(); + RCU_NONIDLE(tick_broadcast_exit()); fail: cpuidle_coupled_parallel_barrier(dev, &abort_barrier); From 2a39af3870e99304df81d2a4058408d68efb02e0 Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Fri, 15 Jan 2021 10:06:12 +0200 Subject: [PATCH 034/284] ARM: OMAP2+: Fix booting for am335x after moving to simple-pm-bus We now depend on SIMPLE_PM_BUS for probing devices. While we have it selected in omap2plus_defconfig, custom configs can fail if it's missing. As SIMPLE_PM_BUS depends on OF and PM, we must now select PM in Kconfig. We have already OF selected by ARCH_MULTIPLATFORM. Let's also drop the earlier PM dependency entries as suggested by Geert Uytterhoeven . Reported-by: Geert Uytterhoeven Reported-by: Matti Vaittinen Fixes: 5a230524f879 ("ARM: dts: Use simple-pm-bus for genpd for am3 l4_wkup") Signed-off-by: Tony Lindgren --- arch/arm/mach-omap2/Kconfig | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/arch/arm/mach-omap2/Kconfig b/arch/arm/mach-omap2/Kconfig index 4a59c169a113..4178c0ee46eb 100644 --- a/arch/arm/mach-omap2/Kconfig +++ b/arch/arm/mach-omap2/Kconfig @@ -17,11 +17,10 @@ config ARCH_OMAP3 bool "TI OMAP3" depends on ARCH_MULTI_V7 select ARCH_OMAP2PLUS - select ARM_CPU_SUSPEND if PM + select ARM_CPU_SUSPEND select OMAP_HWMOD select OMAP_INTERCONNECT - select PM_OPP if PM - select PM if CPU_IDLE + select PM_OPP select SOC_HAS_OMAP2_SDRC select ARM_ERRATA_430973 @@ -30,7 +29,7 @@ config ARCH_OMAP4 depends on ARCH_MULTI_V7 select ARCH_OMAP2PLUS select ARCH_NEEDS_CPU_IDLE_COUPLED if SMP - select ARM_CPU_SUSPEND if PM + select ARM_CPU_SUSPEND select ARM_ERRATA_720789 select ARM_GIC select HAVE_ARM_SCU if SMP @@ -40,7 +39,7 @@ config ARCH_OMAP4 select OMAP_INTERCONNECT_BARRIER select PL310_ERRATA_588369 if CACHE_L2X0 select PL310_ERRATA_727915 if CACHE_L2X0 - select PM_OPP if PM + select PM_OPP select PM if CPU_IDLE select ARM_ERRATA_754322 select ARM_ERRATA_775420 @@ -50,7 +49,7 @@ config SOC_OMAP5 bool "TI OMAP5" depends on ARCH_MULTI_V7 select ARCH_OMAP2PLUS - select ARM_CPU_SUSPEND if PM + select ARM_CPU_SUSPEND select ARM_GIC select HAVE_ARM_SCU if SMP select HAVE_ARM_ARCH_TIMER @@ -58,14 +57,14 @@ config SOC_OMAP5 select OMAP_HWMOD select OMAP_INTERCONNECT select OMAP_INTERCONNECT_BARRIER - select PM_OPP if PM + select PM_OPP select ZONE_DMA if ARM_LPAE config SOC_AM33XX bool "TI AM33XX" depends on ARCH_MULTI_V7 select ARCH_OMAP2PLUS - select ARM_CPU_SUSPEND if PM + select ARM_CPU_SUSPEND config SOC_AM43XX bool "TI AM43x" @@ -79,13 +78,13 @@ config SOC_AM43XX select ARM_ERRATA_754322 select ARM_ERRATA_775420 select OMAP_INTERCONNECT - select ARM_CPU_SUSPEND if PM + select ARM_CPU_SUSPEND config SOC_DRA7XX bool "TI DRA7XX" depends on ARCH_MULTI_V7 select ARCH_OMAP2PLUS - select ARM_CPU_SUSPEND if PM + select ARM_CPU_SUSPEND select ARM_GIC select HAVE_ARM_SCU if SMP select HAVE_ARM_ARCH_TIMER @@ -94,7 +93,7 @@ config SOC_DRA7XX select OMAP_HWMOD select OMAP_INTERCONNECT select OMAP_INTERCONNECT_BARRIER - select PM_OPP if PM + select PM_OPP select ZONE_DMA if ARM_LPAE select PINCTRL_TI_IODELAY if OF && PINCTRL @@ -112,9 +111,11 @@ config ARCH_OMAP2PLUS select OMAP_DM_TIMER select OMAP_GPMC select PINCTRL - select PM_GENERIC_DOMAINS if PM - select PM_GENERIC_DOMAINS_OF if PM + select PM + select PM_GENERIC_DOMAINS + select PM_GENERIC_DOMAINS_OF select RESET_CONTROLLER + select SIMPLE_PM_BUS select SOC_BUS select TI_SYSC select OMAP_IRQCHIP @@ -140,7 +141,6 @@ config ARCH_OMAP2PLUS_TYPICAL select I2C_OMAP select MENELAUS if ARCH_OMAP2 select NEON if CPU_V7 - select PM select REGULATOR select REGULATOR_FIXED_VOLTAGE select TWL4030_CORE if ARCH_OMAP3 || ARCH_OMAP4 From eda080eabf5b9555e4d574ba035b0cb8aa42f052 Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Fri, 15 Jan 2021 10:47:17 +0200 Subject: [PATCH 035/284] drivers: bus: simple-pm-bus: Fix compatibility with simple-bus for auxdata After converting am335x to probe devices with simple-pm-bus I noticed that we are not passing auxdata for of_platform_populate() like we do with simple-bus. While device tree using SoCs should no longer need platform data, there are still quite a few drivers that still need it as can be seen with git grep OF_DEV_AUXDATA. We want to have simple-pm-bus be usable as a replacement for simple-bus also for cases where OF_DEV_AUXDATA is still needed. Let's fix the issue by passing auxdata as platform data to simple-pm-bus. That way the SoCs needing this can pass the auxdata with OF_DEV_AUXDATA. And let's pass the auxdata for omaps to fix the issue for am335x. As an alternative solution, adding simple-pm-bus handling directly to drivers/of/platform.c was considered, but we would still need simple-pm-bus device driver. So passing auxdata as platform data seems like the simplest solution. Fixes: 5a230524f879 ("ARM: dts: Use simple-pm-bus for genpd for am3 l4_wkup") Acked-by: Arnd Bergmann Reviewed-by: Geert Uytterhoeven Acked-by: Geert Uytterhoeven Signed-off-by: Tony Lindgren --- arch/arm/mach-omap2/pdata-quirks.c | 1 + drivers/bus/simple-pm-bus.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/arm/mach-omap2/pdata-quirks.c b/arch/arm/mach-omap2/pdata-quirks.c index cd38bf07c094..2e3a10914c40 100644 --- a/arch/arm/mach-omap2/pdata-quirks.c +++ b/arch/arm/mach-omap2/pdata-quirks.c @@ -522,6 +522,7 @@ static struct of_dev_auxdata omap_auxdata_lookup[] = { &dra7_ipu1_dsp_iommu_pdata), #endif /* Common auxdata */ + OF_DEV_AUXDATA("simple-pm-bus", 0, NULL, omap_auxdata_lookup), OF_DEV_AUXDATA("ti,sysc", 0, NULL, &ti_sysc_pdata), OF_DEV_AUXDATA("pinctrl-single", 0, NULL, &pcs_pdata), OF_DEV_AUXDATA("ti,omap-prm-inst", 0, NULL, &ti_prm_pdata), diff --git a/drivers/bus/simple-pm-bus.c b/drivers/bus/simple-pm-bus.c index c5eb46cbf388..01a3d0cd08ed 100644 --- a/drivers/bus/simple-pm-bus.c +++ b/drivers/bus/simple-pm-bus.c @@ -16,6 +16,7 @@ static int simple_pm_bus_probe(struct platform_device *pdev) { + const struct of_dev_auxdata *lookup = dev_get_platdata(&pdev->dev); struct device_node *np = pdev->dev.of_node; dev_dbg(&pdev->dev, "%s\n", __func__); @@ -23,7 +24,7 @@ static int simple_pm_bus_probe(struct platform_device *pdev) pm_runtime_enable(&pdev->dev); if (np) - of_platform_populate(np, NULL, NULL, &pdev->dev); + of_platform_populate(np, NULL, lookup, &pdev->dev); return 0; } From 16e19e11228ba660d9e322035635e7dcf160d5c2 Mon Sep 17 00:00:00 2001 From: Dave Jiang Date: Fri, 15 Jan 2021 14:52:52 -0700 Subject: [PATCH 036/284] dmaengine: idxd: Fix list corruption in description completion Sanjay reported the following kernel splat after running dmatest for stress testing. The current code is giving up the spinlock in the middle of a completion list walk, and that opens up opportunity for list corruption if another thread touches the list at the same time. In order to make sure the list is always protected, the hardware completed descriptors will be put on a local list to be completed with callbacks from the outside of the list lock. kernel: general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP NOPTI kernel: CPU: 62 PID: 1814 Comm: irq/89-idxd-por Tainted: G W 5.10.0-intel-next_10_16+ #1 kernel: Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.SBT.4915.D02.2012070418 12/07/2020 kernel: RIP: 0010:irq_process_work_list+0xcd/0x170 [idxd] kernel: Code: e8 18 65 5c d3 8b 45 d4 85 c0 75 b3 4c 89 f7 e8 b9 fe ff ff 84 c0 74 bf 4c 89 e7 e8 dd 6b 5c d3 49 8b 3f 49 8b 4f 08 48 89 c6 <48> 89 4f 08 48 89 39 4c 89 e7 48 b9 00 01 00 00 00 00 ad de 49 89 kernel: RSP: 0018:ff256768c4353df8 EFLAGS: 00010046 kernel: RAX: 0000000000000202 RBX: dead000000000100 RCX: dead000000000122 kernel: RDX: 0000000000000001 RSI: 0000000000000202 RDI: dead000000000100 kernel: RBP: ff256768c4353e40 R08: 0000000000000001 R09: 0000000000000000 kernel: R10: 0000000000000000 R11: 00000000ffffffff R12: ff1fdf9fd06b3e48 kernel: R13: 0000000000000005 R14: ff1fdf9fc4275980 R15: ff1fdf9fc4275a00 kernel: FS: 0000000000000000(0000) GS:ff1fdfa32f880000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007f87f24012a0 CR3: 000000010f630004 CR4: 0000000003771ee0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 kernel: PKRU: 55555554 kernel: Call Trace: kernel: ? irq_thread+0xa9/0x1b0 kernel: idxd_wq_thread+0x34/0x90 [idxd] kernel: irq_thread_fn+0x24/0x60 kernel: irq_thread+0x10f/0x1b0 kernel: ? irq_forced_thread_fn+0x80/0x80 kernel: ? wake_threads_waitq+0x30/0x30 kernel: ? irq_thread_check_affinity+0xe0/0xe0 kernel: kthread+0x142/0x160 kernel: ? kthread_park+0x90/0x90 kernel: ret_from_fork+0x1f/0x30 kernel: Modules linked in: idxd_ktest dmatest intel_rapl_msr idxd_mdev iTCO_wdt vfio_pci vfio_virqfd iTCO_vendor_support intel_rapl_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper rapl msr pcspkr snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg ofpart snd_hda_codec snd_hda_core snd_hwdep cmdlinepart snd_seq snd_seq_device idxd snd_pcm intel_spi_pci intel_spi snd_timer spi_nor input_leds joydev snd i2c_i801 mtd soundcore i2c_smbus mei_me mei i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler mac_hid sunrpc nls_iso8859_1 sch_fq_codel ip_tables x_tables xfs libcrc32c ast drm_vram_helper drm_ttm_helper ttm igc wmi pinctrl_sunrisepoint hid_generic usbmouse usbkbd usbhid hid kernel: ---[ end trace cd5ca950ef0db25f ]--- Reported-by: Sanjay Kumar Signed-off-by: Dave Jiang Tested-by: Sanjay Kumar Fixes: e4f4d8cdeb9a ("dmaengine: idxd: Clean up descriptors with fault error") Link: https://lore.kernel.org/r/161074757267.2183951.17912830858060607266.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul --- drivers/dma/idxd/irq.c | 86 +++++++++++++++++++++--------------------- 1 file changed, 44 insertions(+), 42 deletions(-) diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c index 593a2f6ed16c..fe996cec7c74 100644 --- a/drivers/dma/idxd/irq.c +++ b/drivers/dma/idxd/irq.c @@ -219,29 +219,27 @@ irqreturn_t idxd_misc_thread(int vec, void *data) return IRQ_HANDLED; } -static bool process_fault(struct idxd_desc *desc, u64 fault_addr) +static inline bool match_fault(struct idxd_desc *desc, u64 fault_addr) { /* * Completion address can be bad as well. Check fault address match for descriptor * and completion address. */ - if ((u64)desc->hw == fault_addr || - (u64)desc->completion == fault_addr) { - idxd_dma_complete_txd(desc, IDXD_COMPLETE_DEV_FAIL); + if ((u64)desc->hw == fault_addr || (u64)desc->completion == fault_addr) { + struct idxd_device *idxd = desc->wq->idxd; + struct device *dev = &idxd->pdev->dev; + + dev_warn(dev, "desc with fault address: %#llx\n", fault_addr); return true; } return false; } -static bool complete_desc(struct idxd_desc *desc) +static inline void complete_desc(struct idxd_desc *desc, enum idxd_complete_type reason) { - if (desc->completion->status) { - idxd_dma_complete_txd(desc, IDXD_COMPLETE_NORMAL); - return true; - } - - return false; + idxd_dma_complete_txd(desc, reason); + idxd_free_desc(desc->wq, desc); } static int irq_process_pending_llist(struct idxd_irq_entry *irq_entry, @@ -251,25 +249,25 @@ static int irq_process_pending_llist(struct idxd_irq_entry *irq_entry, struct idxd_desc *desc, *t; struct llist_node *head; int queued = 0; - bool completed = false; unsigned long flags; + enum idxd_complete_type reason; *processed = 0; head = llist_del_all(&irq_entry->pending_llist); if (!head) goto out; - llist_for_each_entry_safe(desc, t, head, llnode) { - if (wtype == IRQ_WORK_NORMAL) - completed = complete_desc(desc); - else if (wtype == IRQ_WORK_PROCESS_FAULT) - completed = process_fault(desc, data); + if (wtype == IRQ_WORK_NORMAL) + reason = IDXD_COMPLETE_NORMAL; + else + reason = IDXD_COMPLETE_DEV_FAIL; - if (completed) { - idxd_free_desc(desc->wq, desc); + llist_for_each_entry_safe(desc, t, head, llnode) { + if (desc->completion->status) { + if ((desc->completion->status & DSA_COMP_STATUS_MASK) != DSA_COMP_SUCCESS) + match_fault(desc, data); + complete_desc(desc, reason); (*processed)++; - if (wtype == IRQ_WORK_PROCESS_FAULT) - break; } else { spin_lock_irqsave(&irq_entry->list_lock, flags); list_add_tail(&desc->list, @@ -287,42 +285,46 @@ static int irq_process_work_list(struct idxd_irq_entry *irq_entry, enum irq_work_type wtype, int *processed, u64 data) { - struct list_head *node, *next; int queued = 0; - bool completed = false; unsigned long flags; + LIST_HEAD(flist); + struct idxd_desc *desc, *n; + enum idxd_complete_type reason; *processed = 0; + if (wtype == IRQ_WORK_NORMAL) + reason = IDXD_COMPLETE_NORMAL; + else + reason = IDXD_COMPLETE_DEV_FAIL; + + /* + * This lock protects list corruption from access of list outside of the irq handler + * thread. + */ spin_lock_irqsave(&irq_entry->list_lock, flags); - if (list_empty(&irq_entry->work_list)) - goto out; - - list_for_each_safe(node, next, &irq_entry->work_list) { - struct idxd_desc *desc = - container_of(node, struct idxd_desc, list); - + if (list_empty(&irq_entry->work_list)) { spin_unlock_irqrestore(&irq_entry->list_lock, flags); - if (wtype == IRQ_WORK_NORMAL) - completed = complete_desc(desc); - else if (wtype == IRQ_WORK_PROCESS_FAULT) - completed = process_fault(desc, data); + return 0; + } - if (completed) { - spin_lock_irqsave(&irq_entry->list_lock, flags); + list_for_each_entry_safe(desc, n, &irq_entry->work_list, list) { + if (desc->completion->status) { list_del(&desc->list); - spin_unlock_irqrestore(&irq_entry->list_lock, flags); - idxd_free_desc(desc->wq, desc); (*processed)++; - if (wtype == IRQ_WORK_PROCESS_FAULT) - return queued; + list_add_tail(&desc->list, &flist); } else { queued++; } - spin_lock_irqsave(&irq_entry->list_lock, flags); } - out: spin_unlock_irqrestore(&irq_entry->list_lock, flags); + + list_for_each_entry(desc, &flist, list) { + if ((desc->completion->status & DSA_COMP_STATUS_MASK) != DSA_COMP_SUCCESS) + match_fault(desc, data); + complete_desc(desc, reason); + } + return queued; } From f5cc9ace24fbdf41b4814effbb2f9bad7046e988 Mon Sep 17 00:00:00 2001 From: Dave Jiang Date: Fri, 15 Jan 2021 14:52:33 -0700 Subject: [PATCH 037/284] dmaengine: idxd: fix misc interrupt completion Nikhil reported the misc interrupt handler can sometimes miss handling the command interrupt when an error interrupt happens near the same time. Have the irq handling thread continue to process the misc interrupts until all interrupts are processed. This is a low usage interrupt and is not expected to handle high volume traffic. Therefore there is no concern of this thread running for a long time. Fixes: 0d5c10b4c84d ("dmaengine: idxd: add work queue drain support") Reported-by: Nikhil Rao Signed-off-by: Dave Jiang Link: https://lore.kernel.org/r/161074755329.2183844.13295528344116907983.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul --- drivers/dma/idxd/irq.c | 36 +++++++++++++++++++++++++++--------- 1 file changed, 27 insertions(+), 9 deletions(-) diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c index fe996cec7c74..a60ca11a5784 100644 --- a/drivers/dma/idxd/irq.c +++ b/drivers/dma/idxd/irq.c @@ -111,19 +111,14 @@ irqreturn_t idxd_irq_handler(int vec, void *data) return IRQ_WAKE_THREAD; } -irqreturn_t idxd_misc_thread(int vec, void *data) +static int process_misc_interrupts(struct idxd_device *idxd, u32 cause) { - struct idxd_irq_entry *irq_entry = data; - struct idxd_device *idxd = irq_entry->idxd; struct device *dev = &idxd->pdev->dev; union gensts_reg gensts; - u32 cause, val = 0; + u32 val = 0; int i; bool err = false; - cause = ioread32(idxd->reg_base + IDXD_INTCAUSE_OFFSET); - iowrite32(cause, idxd->reg_base + IDXD_INTCAUSE_OFFSET); - if (cause & IDXD_INTC_ERR) { spin_lock_bh(&idxd->dev_lock); for (i = 0; i < 4; i++) @@ -181,7 +176,7 @@ irqreturn_t idxd_misc_thread(int vec, void *data) val); if (!err) - goto out; + return 0; /* * This case should rarely happen and typically is due to software @@ -211,10 +206,33 @@ irqreturn_t idxd_misc_thread(int vec, void *data) gensts.reset_type == IDXD_DEVICE_RESET_FLR ? "FLR" : "system reset"); spin_unlock_bh(&idxd->dev_lock); + return -ENXIO; } } - out: + return 0; +} + +irqreturn_t idxd_misc_thread(int vec, void *data) +{ + struct idxd_irq_entry *irq_entry = data; + struct idxd_device *idxd = irq_entry->idxd; + int rc; + u32 cause; + + cause = ioread32(idxd->reg_base + IDXD_INTCAUSE_OFFSET); + if (cause) + iowrite32(cause, idxd->reg_base + IDXD_INTCAUSE_OFFSET); + + while (cause) { + rc = process_misc_interrupts(idxd, cause); + if (rc < 0) + break; + cause = ioread32(idxd->reg_base + IDXD_INTCAUSE_OFFSET); + if (cause) + iowrite32(cause, idxd->reg_base + IDXD_INTCAUSE_OFFSET); + } + idxd_unmask_msix_vector(idxd, irq_entry->id); return IRQ_HANDLED; } From 94a5400f8b966c91c49991bae41c2ef911b935ac Mon Sep 17 00:00:00 2001 From: Johan Jonker Date: Sun, 17 Jan 2021 19:16:53 +0100 Subject: [PATCH 038/284] arm64: dts: rockchip: remove interrupt-names property from rk3399 vdec node A test with the command below gives this error: /arch/arm64/boot/dts/rockchip/rk3399-evb.dt.yaml: video-codec@ff660000: 'interrupt-names' does not match any of the regexes: 'pinctrl-[0-9]+' The rkvdec driver gets it irq with help of the platform_get_irq() function, so remove the interrupt-names property from the rk3399 vdec node. make ARCH=arm64 dtbs_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/ media/rockchip,vdec.yaml Signed-off-by: Johan Jonker Link: https://lore.kernel.org/r/20210117181653.24886-1-jbx6244@gmail.com Signed-off-by: Heiko Stuebner --- arch/arm64/boot/dts/rockchip/rk3399.dtsi | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi b/arch/arm64/boot/dts/rockchip/rk3399.dtsi index 52bce81cfe77..2551b238b97c 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi @@ -1278,7 +1278,6 @@ compatible = "rockchip,rk3399-vdec"; reg = <0x0 0xff660000 0x0 0x400>; interrupts = ; - interrupt-names = "vdpu"; clocks = <&cru ACLK_VDU>, <&cru HCLK_VDU>, <&cru SCLK_VDU_CA>, <&cru SCLK_VDU_CORE>; clock-names = "axi", "ahb", "cabac", "core"; From e594443196d6e0ef3d3b30320c49b3a4d4f9a547 Mon Sep 17 00:00:00 2001 From: Dave Jiang Date: Mon, 18 Jan 2021 10:28:44 -0700 Subject: [PATCH 039/284] dmaengine: move channel device_node deletion to driver Channel device_node deletion is managed by the device driver rather than the dmaengine core. The deletion was accidentally introduced when making channel unregister dynamic. It causes xilinx_dma module to crash on unload as reported by Radhey. Remove chan->device_node delete in dmaengine and also fix up idxd driver. [ 42.142705] Internal error: Oops: 96000044 [#1] SMP [ 42.147566] Modules linked in: xilinx_dma(-) clk_xlnx_clock_wizard uio_pdrv_genirq [ 42.155139] CPU: 1 PID: 2075 Comm: rmmod Not tainted 5.10.1-00026-g3a2e6dd7a05-dirty #192 [ 42.163302] Hardware name: Enclustra XU5 SOM (DT) [ 42.167992] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--) [ 42.173996] pc : xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma] [ 42.179815] lr : xilinx_dma_chan_remove+0x70/0xa0 [xilinx_dma] [ 42.185636] sp : ffffffc01112bca0 [ 42.188935] x29: ffffffc01112bca0 x28: ffffff80402ea640 xilinx_dma_chan_remove+0x74/0xa0: __list_del at ./include/linux/list.h:112 (inlined by) __list_del_entry at./include/linux/list.h:135 (inlined by) list_del at ./include/linux/list.h:146 (inlined by) xilinx_dma_chan_remove at drivers/dma/xilinx/xilinx_dma.c:2546 Fixes: e81274cd6b52 ("dmaengine: add support to dynamic register/unregister of channels") Reported-by: Radhey Shyam Pandey Signed-off-by: Dave Jiang Tested-by: Radhey Shyam Pandey Link: https://lore.kernel.org/r/161099092469.2495902.5064826526660062342.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul Cc: stable@vger.kernel.org # 5.9+ --- drivers/dma/dmaengine.c | 1 - drivers/dma/idxd/dma.c | 5 ++++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index 962cbb5e5f7f..fe6a460c4373 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -1110,7 +1110,6 @@ static void __dma_async_device_channel_unregister(struct dma_device *device, "%s called while %d clients hold a reference\n", __func__, chan->client_count); mutex_lock(&dma_list_mutex); - list_del(&chan->device_node); device->chancnt--; chan->dev->chan = NULL; mutex_unlock(&dma_list_mutex); diff --git a/drivers/dma/idxd/dma.c b/drivers/dma/idxd/dma.c index 8ed2773d8285..71fd6e4c42cd 100644 --- a/drivers/dma/idxd/dma.c +++ b/drivers/dma/idxd/dma.c @@ -205,5 +205,8 @@ int idxd_register_dma_channel(struct idxd_wq *wq) void idxd_unregister_dma_channel(struct idxd_wq *wq) { - dma_async_device_channel_unregister(&wq->idxd->dma_dev, &wq->dma_chan); + struct dma_chan *chan = &wq->dma_chan; + + dma_async_device_channel_unregister(&wq->idxd->dma_dev, chan); + list_del(&chan->device_node); } From 51839e29cb5954470ea4db7236ef8c3d77a6e0bb Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 9 Dec 2020 13:50:17 +0200 Subject: [PATCH 040/284] scripts: switch explicitly to Python 3 Some distributions are about to switch to Python 3 support only. This means that /usr/bin/python, which is Python 2, is not available anymore. Hence, switch scripts to use Python 3 explicitly. Signed-off-by: Andy Shevchenko Signed-off-by: Masahiro Yamada --- scripts/bloat-o-meter | 2 +- scripts/diffconfig | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/bloat-o-meter b/scripts/bloat-o-meter index 652e9542043f..dcd8d8750b8b 100755 --- a/scripts/bloat-o-meter +++ b/scripts/bloat-o-meter @@ -1,4 +1,4 @@ -#!/usr/bin/env python +#!/usr/bin/env python3 # # Copyright 2004 Matt Mackall # diff --git a/scripts/diffconfig b/scripts/diffconfig index 627eba5849b5..d5da5fa05d1d 100755 --- a/scripts/diffconfig +++ b/scripts/diffconfig @@ -1,4 +1,4 @@ -#!/usr/bin/env python +#!/usr/bin/env python3 # SPDX-License-Identifier: GPL-2.0 # # diffconfig - a tool to compare .config files. From 1cabe74f148f7b99d9f08274a62467f96c870f07 Mon Sep 17 00:00:00 2001 From: Robert Karszniewicz Date: Fri, 22 Jan 2021 19:04:13 +0100 Subject: [PATCH 041/284] Documentation/Kbuild: Remove references to gcc-plugin.sh gcc-plugin.sh has been removed in commit 1e860048c53e ("gcc-plugins: simplify GCC plugin-dev capability test"). Signed-off-by: Robert Karszniewicz Reviewed-by: Kees Cook Signed-off-by: Masahiro Yamada --- Documentation/kbuild/gcc-plugins.rst | 6 ------ 1 file changed, 6 deletions(-) diff --git a/Documentation/kbuild/gcc-plugins.rst b/Documentation/kbuild/gcc-plugins.rst index 4b1c10f88e30..63379d0150e3 100644 --- a/Documentation/kbuild/gcc-plugins.rst +++ b/Documentation/kbuild/gcc-plugins.rst @@ -47,12 +47,6 @@ Files This is a compatibility header for GCC plugins. It should be always included instead of individual gcc headers. -**$(src)/scripts/gcc-plugin.sh** - - This script checks the availability of the included headers in - gcc-common.h and chooses the proper host compiler to build the plugins - (gcc-4.7 can be built by either gcc or g++). - **$(src)/scripts/gcc-plugins/gcc-generate-gimple-pass.h, $(src)/scripts/gcc-plugins/gcc-generate-ipa-pass.h, $(src)/scripts/gcc-plugins/gcc-generate-simple_ipa-pass.h, From f4c3b83b75b91c5059726cb91e3165cc01764ce7 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Sat, 23 Jan 2021 18:16:30 +0900 Subject: [PATCH 042/284] kbuild: simplify GCC_PLUGINS enablement in dummy-tools/gcc With commit 1e860048c53e ("gcc-plugins: simplify GCC plugin-dev capability test") applied, this hunk can be way simplified because now scripts/gcc-plugins/Kconfig only checks plugin-version.h Signed-off-by: Masahiro Yamada --- scripts/dummy-tools/gcc | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/scripts/dummy-tools/gcc b/scripts/dummy-tools/gcc index 33487e99d83e..5c113cad5601 100755 --- a/scripts/dummy-tools/gcc +++ b/scripts/dummy-tools/gcc @@ -75,16 +75,12 @@ if arg_contain -S "$@"; then fi fi -# For scripts/gcc-plugin.sh +# To set GCC_PLUGINS if arg_contain -print-file-name=plugin "$@"; then plugin_dir=$(mktemp -d) - sed -n 's/.*#include "\(.*\)"/\1/p' $(dirname $0)/../gcc-plugins/gcc-common.h | - while read header - do - mkdir -p $plugin_dir/include/$(dirname $header) - touch $plugin_dir/include/$header - done + mkdir -p $plugin_dir/include + touch $plugin_dir/include/plugin-version.h echo $plugin_dir exit 0 From 9bbd77d5bbc9aff8cb74d805c31751f5f0691ba8 Mon Sep 17 00:00:00 2001 From: Benjamin Valentin Date: Thu, 21 Jan 2021 19:24:17 -0800 Subject: [PATCH 043/284] Input: xpad - sync supported devices with fork on GitHub There is a fork of this driver on GitHub [0] that has been updated with new device IDs. Merge those into the mainline driver, so the out-of-tree fork is not needed for users of those devices anymore. [0] https://github.com/paroj/xpad Signed-off-by: Benjamin Valentin Link: https://lore.kernel.org/r/20210121142523.1b6b050f@rechenknecht2k11 Signed-off-by: Dmitry Torokhov --- drivers/input/joystick/xpad.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/input/joystick/xpad.c b/drivers/input/joystick/xpad.c index 0687f0ed60b8..8cc8ca4a9ac0 100644 --- a/drivers/input/joystick/xpad.c +++ b/drivers/input/joystick/xpad.c @@ -215,9 +215,17 @@ static const struct xpad_device { { 0x0e6f, 0x0213, "Afterglow Gamepad for Xbox 360", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x021f, "Rock Candy Gamepad for Xbox 360", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x0246, "Rock Candy Gamepad for Xbox One 2015", 0, XTYPE_XBOXONE }, - { 0x0e6f, 0x02ab, "PDP Controller for Xbox One", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02a0, "PDP Xbox One Controller", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02a1, "PDP Xbox One Controller", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02a2, "PDP Wired Controller for Xbox One - Crimson Red", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x02a4, "PDP Wired Controller for Xbox One - Stealth Series", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x02a6, "PDP Wired Controller for Xbox One - Camo Series", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02a7, "PDP Xbox One Controller", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02a8, "PDP Xbox One Controller", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02ab, "PDP Controller for Xbox One", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02ad, "PDP Wired Controller for Xbox One - Stealth Series", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02b3, "Afterglow Prismatic Wired Controller", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02b8, "Afterglow Prismatic Wired Controller", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x0301, "Logic3 Controller", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x0346, "Rock Candy Gamepad for Xbox One 2016", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x0401, "Logic3 Controller", 0, XTYPE_XBOX360 }, @@ -296,6 +304,9 @@ static const struct xpad_device { { 0x1bad, 0xfa01, "MadCatz GamePad", 0, XTYPE_XBOX360 }, { 0x1bad, 0xfd00, "Razer Onza TE", 0, XTYPE_XBOX360 }, { 0x1bad, 0xfd01, "Razer Onza", 0, XTYPE_XBOX360 }, + { 0x20d6, 0x2001, "BDA Xbox Series X Wired Controller", 0, XTYPE_XBOXONE }, + { 0x20d6, 0x281f, "PowerA Wired Controller For Xbox 360", 0, XTYPE_XBOX360 }, + { 0x2e24, 0x0652, "Hyperkin Duke X-Box One pad", 0, XTYPE_XBOXONE }, { 0x24c6, 0x5000, "Razer Atrox Arcade Stick", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x24c6, 0x5300, "PowerA MINI PROEX Controller", 0, XTYPE_XBOX360 }, { 0x24c6, 0x5303, "Xbox Airflo wired controller", 0, XTYPE_XBOX360 }, @@ -429,8 +440,12 @@ static const struct usb_device_id xpad_table[] = { XPAD_XBOX360_VENDOR(0x162e), /* Joytech X-Box 360 controllers */ XPAD_XBOX360_VENDOR(0x1689), /* Razer Onza */ XPAD_XBOX360_VENDOR(0x1bad), /* Harminix Rock Band Guitar and Drums */ + XPAD_XBOX360_VENDOR(0x20d6), /* PowerA Controllers */ + XPAD_XBOXONE_VENDOR(0x20d6), /* PowerA Controllers */ XPAD_XBOX360_VENDOR(0x24c6), /* PowerA Controllers */ XPAD_XBOXONE_VENDOR(0x24c6), /* PowerA Controllers */ + XPAD_XBOXONE_VENDOR(0x2e24), /* Hyperkin Duke X-Box One pad */ + XPAD_XBOX360_VENDOR(0x2f24), /* GameSir Controllers */ { } }; From 177d8f1f7f47fe7c18ceb1d87893890d7e9c95a7 Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Wed, 30 Dec 2020 17:07:40 +0200 Subject: [PATCH 044/284] ARM: dts: omap4-droid4: Fix lost keypad slide interrupts for droid4 We may lose edge interrupts for gpio banks other than the first gpio bank as they are not always powered. Instead, we must use the padconf interrupt as that is always powered. Note that we still also use the gpio for reading the pin state as that can't be done with the padconf device. Signed-off-by: Tony Lindgren --- arch/arm/boot/dts/omap4-droid4-xt894.dts | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/arm/boot/dts/omap4-droid4-xt894.dts b/arch/arm/boot/dts/omap4-droid4-xt894.dts index 3ea4c5b9fd31..e833c21f1c01 100644 --- a/arch/arm/boot/dts/omap4-droid4-xt894.dts +++ b/arch/arm/boot/dts/omap4-droid4-xt894.dts @@ -16,8 +16,13 @@ debounce-interval = <10>; }; + /* + * We use pad 0x4a100116 abe_dmic_din3.gpio_122 as the irq instead + * of the gpio interrupt to avoid lost events in deeper idle states. + */ slider { label = "Keypad Slide"; + interrupts-extended = <&omap4_pmx_core 0xd6>; gpios = <&gpio4 26 GPIO_ACTIVE_HIGH>; /* gpio122 */ linux,input-type = ; linux,code = ; From 3c4f6ecd93442f4376a58b38bb40ee0b8c46e0e6 Mon Sep 17 00:00:00 2001 From: Pho Tran Date: Mon, 25 Jan 2021 09:26:54 +0000 Subject: [PATCH 045/284] USB: serial: cp210x: add pid/vid for WSDA-200-USB Information pid/vid of WSDA-200-USB, Lord corporation company: vid: 199b pid: ba30 Signed-off-by: Pho Tran [ johan: amend comment with product name ] Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold --- drivers/usb/serial/cp210x.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c index 06f3cfc9f19a..7bec1e730b20 100644 --- a/drivers/usb/serial/cp210x.c +++ b/drivers/usb/serial/cp210x.c @@ -202,6 +202,7 @@ static const struct usb_device_id id_table[] = { { USB_DEVICE(0x1901, 0x0194) }, /* GE Healthcare Remote Alarm Box */ { USB_DEVICE(0x1901, 0x0195) }, /* GE B850/B650/B450 CP2104 DP UART interface */ { USB_DEVICE(0x1901, 0x0196) }, /* GE B850 CP2105 DP UART interface */ + { USB_DEVICE(0x199B, 0xBA30) }, /* LORD WSDA-200-USB */ { USB_DEVICE(0x19CF, 0x3000) }, /* Parrot NMEA GPS Flight Recorder */ { USB_DEVICE(0x1ADB, 0x0001) }, /* Schweitzer Engineering C662 Cable */ { USB_DEVICE(0x1B1C, 0x1C00) }, /* Corsair USB Dongle */ From e500b805c39daff2670494fff94909d7e3d094d9 Mon Sep 17 00:00:00 2001 From: Andrew Scull Date: Mon, 25 Jan 2021 14:54:15 +0000 Subject: [PATCH 046/284] KVM: arm64: Don't clobber x4 in __do_hyp_init arm_smccc_1_1_hvc() only adds write contraints for x0-3 in the inline assembly for the HVC instruction so make sure those are the only registers that change when __do_hyp_init is called. Tested-by: David Brazdil Signed-off-by: Andrew Scull Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20210125145415.122439-3-ascull@google.com --- arch/arm64/kvm/hyp/nvhe/hyp-init.S | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S index 31b060a44045..b17bf19217f1 100644 --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S @@ -47,6 +47,8 @@ __invalid: b . /* + * Only uses x0..x3 so as to not clobber callee-saved SMCCC registers. + * * x0: SMCCC function ID * x1: struct kvm_nvhe_init_params PA */ @@ -70,9 +72,9 @@ __do_hyp_init: eret 1: mov x0, x1 - mov x4, lr - bl ___kvm_hyp_init - mov lr, x4 + mov x3, lr + bl ___kvm_hyp_init // Clobbers x0..x2 + mov lr, x3 /* Hello, World! */ mov x0, #SMCCC_RET_SUCCESS @@ -82,8 +84,8 @@ SYM_CODE_END(__kvm_hyp_init) /* * Initialize the hypervisor in EL2. * - * Only uses x0..x3 so as to not clobber callee-saved SMCCC registers - * and leave x4 for the caller. + * Only uses x0..x2 so as to not clobber callee-saved SMCCC registers + * and leave x3 for the caller. * * x0: struct kvm_nvhe_init_params PA */ @@ -112,9 +114,9 @@ alternative_else_nop_endif /* * Set the PS bits in TCR_EL2. */ - ldr x1, [x0, #NVHE_INIT_TCR_EL2] - tcr_compute_pa_size x1, #TCR_EL2_PS_SHIFT, x2, x3 - msr tcr_el2, x1 + ldr x0, [x0, #NVHE_INIT_TCR_EL2] + tcr_compute_pa_size x0, #TCR_EL2_PS_SHIFT, x1, x2 + msr tcr_el2, x0 isb @@ -193,7 +195,7 @@ SYM_CODE_START_LOCAL(__kvm_hyp_init_cpu) /* Enable MMU, set vectors and stack. */ mov x0, x28 - bl ___kvm_hyp_init // Clobbers x0..x3 + bl ___kvm_hyp_init // Clobbers x0..x2 /* Leave idmap. */ mov x0, x29 From 67fbe02a5cebc3c653610f12e3c0424e58450153 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Wed, 20 Jan 2021 13:49:41 +0100 Subject: [PATCH 047/284] platform/x86: hp-wmi: Disable tablet-mode reporting by default MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Recently userspace has started making more use of SW_TABLET_MODE (when an input-dev reports this). Specifically recent GNOME3 versions will: 1. When SW_TABLET_MODE is reported and is reporting 0: 1.1 Disable accelerometer-based screen auto-rotation 1.2 Disable automatically showing the on-screen keyboard when a text-input field is focussed 2. When SW_TABLET_MODE is reported and is reporting 1: 2.1 Ignore input-events from the builtin keyboard and touchpad (this is for 360° hinges style 2-in-1s where the keyboard and touchpads are accessible on the back of the tablet when folded into tablet-mode) This means that claiming to support SW_TABLET_MODE when it does not actually work / reports correct values has bad side-effects. The check in the hp-wmi code which is used to decide if the input-dev should claim SW_TABLET_MODE support, only checks if the HPWMI_HARDWARE_QUERY is supported. It does *not* check if the hardware actually is capable of reporting SW_TABLET_MODE. This leads to the hp-wmi input-dev claiming SW_TABLET_MODE support, while in reality it will always report 0 as SW_TABLET_MODE value. This has been seen on a "HP ENVY x360 Convertible 15-cp0xxx" and this likely is the case on a whole lot of other HP models. This problem causes both auto-rotation and on-screen keyboard support to not work on affected x360 models. There is no easy fix for this, but since userspace expects SW_TABLET_MODE reporting to be reliable when advertised it is better to not claim/report SW_TABLET_MODE support at all, then to claim to support it while it does not work. To avoid the mentioned problems, add a new enable_tablet_mode_sw module-parameter which defaults to false. Note I've made this an int using the standard -1=auto, 0=off, 1=on triplett, with the hope that in the future we can come up with a better way to detect SW_TABLET_MODE support. ATM the default auto option just does the same as off. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1918255 Cc: Stefan Brüns Signed-off-by: Hans de Goede Acked-by: Mark Gross Link: https://lore.kernel.org/r/20210120124941.73409-1-hdegoede@redhat.com --- drivers/platform/x86/hp-wmi.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/platform/x86/hp-wmi.c b/drivers/platform/x86/hp-wmi.c index 18bf8aeb5f87..e94e59283ecb 100644 --- a/drivers/platform/x86/hp-wmi.c +++ b/drivers/platform/x86/hp-wmi.c @@ -32,6 +32,10 @@ MODULE_LICENSE("GPL"); MODULE_ALIAS("wmi:95F24279-4D7B-4334-9387-ACCDC67EF61C"); MODULE_ALIAS("wmi:5FB7F034-2C63-45e9-BE91-3D44E2C707E4"); +static int enable_tablet_mode_sw = -1; +module_param(enable_tablet_mode_sw, int, 0444); +MODULE_PARM_DESC(enable_tablet_mode_sw, "Enable SW_TABLET_MODE reporting (-1=auto, 0=no, 1=yes)"); + #define HPWMI_EVENT_GUID "95F24279-4D7B-4334-9387-ACCDC67EF61C" #define HPWMI_BIOS_GUID "5FB7F034-2C63-45e9-BE91-3D44E2C707E4" @@ -654,10 +658,12 @@ static int __init hp_wmi_input_setup(void) } /* Tablet mode */ - val = hp_wmi_hw_state(HPWMI_TABLET_MASK); - if (!(val < 0)) { - __set_bit(SW_TABLET_MODE, hp_wmi_input_dev->swbit); - input_report_switch(hp_wmi_input_dev, SW_TABLET_MODE, val); + if (enable_tablet_mode_sw > 0) { + val = hp_wmi_hw_state(HPWMI_TABLET_MASK); + if (val >= 0) { + __set_bit(SW_TABLET_MODE, hp_wmi_input_dev->swbit); + input_report_switch(hp_wmi_input_dev, SW_TABLET_MODE, val); + } } err = sparse_keymap_setup(hp_wmi_input_dev, hp_wmi_keymap, NULL); From 9b6164342e981d751e69f5a165dd596ffcdfd6fe Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Sat, 23 Jan 2021 22:33:33 +0900 Subject: [PATCH 048/284] doc: gcc-plugins: update gcc-plugins.rst This document was written a long time ago. Update it. [1] Drop the version information The range of the supported GCC versions are always changing. The current minimal GCC version is 4.9, and commit 1e860048c53e ("gcc-plugins: simplify GCC plugin-dev capability test") removed the old code accordingly. We do not need to mention specific version ranges like "all gcc versions from 4.5 to 6.0" since we forget to update the documentation when we raise the minimal compiler version. [2] Drop the C compiler statements Since commit 77342a02ff6e ("gcc-plugins: drop support for GCC <= 4.7") the GCC plugin infrastructure only supports g++. [3] Drop supported architectures As of v5.11-rc4, the infrastructure supports more architectures; arm, arm64, mips, powerpc, riscv, s390, um, and x86. (just grep "select HAVE_GCC_PLUGINS") Again, we miss to update this document when a new architecture is supported. Let's just say "only some architectures". [4] Update the apt-get example We are now discussing to bump the minimal version to GCC 5. The GCC 4.9 support will be removed sooner or later. Change the package example to gcc-10-plugin-dev while we are here. [5] Update the build target Since commit ce2fd53a10c7 ("kbuild: descend into scripts/gcc-plugins/ via scripts/Makefile"), "make gcc-plugins" is not supported. "make scripts" builds all the enabled plugins, including some other tools. [6] Update the steps for adding a new plugin At first, all CONFIG options for GCC plugins were located in arch/Kconfig. After commit 45332b1bdfdc ("gcc-plugins: split out Kconfig entries to scripts/gcc-plugins/Kconfig"), scripts/gcc-plugins/Kconfig became the central place to collect plugin CONFIG options. In my understanding, this requirement no longer exists because commit 9f671e58159a ("security: Create "kernel hardening" config area") moved some of plugin CONFIG options to another file. Find an appropriate place to add the new CONFIG. The sub-directory support was never used by anyone, and removed by commit c17d6179ad5a ("gcc-plugins: remove unused GCC_PLUGIN_SUBDIR"). Remove the useless $(src)/ prefix. Signed-off-by: Masahiro Yamada --- Documentation/kbuild/gcc-plugins.rst | 41 ++++++++++++++-------------- 1 file changed, 21 insertions(+), 20 deletions(-) diff --git a/Documentation/kbuild/gcc-plugins.rst b/Documentation/kbuild/gcc-plugins.rst index 63379d0150e3..3349966f213d 100644 --- a/Documentation/kbuild/gcc-plugins.rst +++ b/Documentation/kbuild/gcc-plugins.rst @@ -11,16 +11,13 @@ compiler [1]_. They are useful for runtime instrumentation and static analysis. We can analyse, change and add further code during compilation via callbacks [2]_, GIMPLE [3]_, IPA [4]_ and RTL passes [5]_. -The GCC plugin infrastructure of the kernel supports all gcc versions from -4.5 to 6.0, building out-of-tree modules, cross-compilation and building in a -separate directory. -Plugin source files have to be compilable by both a C and a C++ compiler as well -because gcc versions 4.5 and 4.6 are compiled by a C compiler, -gcc-4.7 can be compiled by a C or a C++ compiler, -and versions 4.8+ can only be compiled by a C++ compiler. +The GCC plugin infrastructure of the kernel supports building out-of-tree +modules, cross-compilation and building in a separate directory. +Plugin source files have to be compilable by a C++ compiler. -Currently the GCC plugin infrastructure supports only the x86, arm, arm64 and -powerpc architectures. +Currently the GCC plugin infrastructure supports only some architectures. +Grep "select HAVE_GCC_PLUGINS" to find out which architectures support +GCC plugins. This infrastructure was ported from grsecurity [6]_ and PaX [7]_. @@ -53,8 +50,7 @@ $(src)/scripts/gcc-plugins/gcc-generate-simple_ipa-pass.h, $(src)/scripts/gcc-plugins/gcc-generate-rtl-pass.h** These headers automatically generate the registration structures for - GIMPLE, SIMPLE_IPA, IPA and RTL passes. They support all gcc versions - from 4.5 to 6.0. + GIMPLE, SIMPLE_IPA, IPA and RTL passes. They should be preferred to creating the structures by hand. @@ -62,21 +58,25 @@ Usage ===== You must install the gcc plugin headers for your gcc version, -e.g., on Ubuntu for gcc-4.9:: +e.g., on Ubuntu for gcc-10:: - apt-get install gcc-4.9-plugin-dev + apt-get install gcc-10-plugin-dev Or on Fedora:: dnf install gcc-plugin-devel -Enable a GCC plugin based feature in the kernel config:: +Enable the GCC plugin infrastructure and some plugin(s) you want to use +in the kernel config:: - CONFIG_GCC_PLUGIN_CYC_COMPLEXITY = y + CONFIG_GCC_PLUGINS=y + CONFIG_GCC_PLUGIN_CYC_COMPLEXITY=y + CONFIG_GCC_PLUGIN_LATENT_ENTROPY=y + ... -To compile only the plugin(s):: +To compile the minimum tool set including the plugin(s):: - make gcc-plugins + make scripts or just run the kernel make and compile the whole kernel with the cyclomatic complexity GCC plugin. @@ -85,7 +85,8 @@ the cyclomatic complexity GCC plugin. 4. How to add a new GCC plugin ============================== -The GCC plugins are in $(src)/scripts/gcc-plugins/. You can use a file or a directory -here. It must be added to $(src)/scripts/gcc-plugins/Makefile, -$(src)/scripts/Makefile.gcc-plugins and $(src)/arch/Kconfig. +The GCC plugins are in scripts/gcc-plugins/. You need to put plugin source files +right under scripts/gcc-plugins/. Creating subdirectories is not supported. +It must be added to scripts/gcc-plugins/Makefile, scripts/Makefile.gcc-plugins +and a relevant Kconfig file. See the cyc_complexity_plugin.c (CONFIG_GCC_PLUGIN_CYC_COMPLEXITY) GCC plugin. From 74532de460ec664e5a725507d1b59aa9e4d40776 Mon Sep 17 00:00:00 2001 From: Robin Murphy Date: Wed, 20 Jan 2021 23:41:39 +0000 Subject: [PATCH 049/284] arm64: dts: rockchip: Disable display for NanoPi R2S NanoPi R2S is headless, so rightly does not enable any of the display interface hardware, which currently provokes an obnoxious error in the boot log from the fake DRM device failing to find anything to bind to. It probably isn't *too* hard to obviate the fake device shenanigans entirely with a bit of driver reshuffling, but for now let's just disable it here to shut up the spurious error. Signed-off-by: Robin Murphy Link: https://lore.kernel.org/r/c4553dfad1ad6792c4f22454c135ff55de77e2d6.1611186099.git.robin.murphy@arm.com Signed-off-by: Heiko Stuebner --- arch/arm64/boot/dts/rockchip/rk3328-nanopi-r2s.dts | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/arm64/boot/dts/rockchip/rk3328-nanopi-r2s.dts b/arch/arm64/boot/dts/rockchip/rk3328-nanopi-r2s.dts index 2ee07d15a6e3..1eecad724f04 100644 --- a/arch/arm64/boot/dts/rockchip/rk3328-nanopi-r2s.dts +++ b/arch/arm64/boot/dts/rockchip/rk3328-nanopi-r2s.dts @@ -114,6 +114,10 @@ cpu-supply = <&vdd_arm>; }; +&display_subsystem { + status = "disabled"; +}; + &gmac2io { assigned-clocks = <&cru SCLK_MAC2IO>, <&cru SCLK_MAC2IO_EXT>; assigned-clock-parents = <&gmac_clk>, <&gmac_clk>; From d8c6edfa3f4ee0d45d7ce5ef18d1245b78774b9d Mon Sep 17 00:00:00 2001 From: Jeremy Figgins Date: Sat, 23 Jan 2021 18:21:36 -0600 Subject: [PATCH 050/284] USB: usblp: don't call usb_set_interface if there's a single alt Some devices, such as the Winbond Electronics Corp. Virtual Com Port (Vendor=0416, ProdId=5011), lockup when usb_set_interface() or usb_clear_halt() are called. This device has only a single altsetting, so it should not be necessary to call usb_set_interface(). Acked-by: Pete Zaitcev Signed-off-by: Jeremy Figgins Link: https://lore.kernel.org/r/YAy9kJhM/rG8EQXC@watson Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/usb/class/usblp.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/drivers/usb/class/usblp.c b/drivers/usb/class/usblp.c index 134dc2005ce9..c9f6e9758288 100644 --- a/drivers/usb/class/usblp.c +++ b/drivers/usb/class/usblp.c @@ -1329,14 +1329,17 @@ static int usblp_set_protocol(struct usblp *usblp, int protocol) if (protocol < USBLP_FIRST_PROTOCOL || protocol > USBLP_LAST_PROTOCOL) return -EINVAL; - alts = usblp->protocol[protocol].alt_setting; - if (alts < 0) - return -EINVAL; - r = usb_set_interface(usblp->dev, usblp->ifnum, alts); - if (r < 0) { - printk(KERN_ERR "usblp: can't set desired altsetting %d on interface %d\n", - alts, usblp->ifnum); - return r; + /* Don't unnecessarily set the interface if there's a single alt. */ + if (usblp->intf->num_altsetting > 1) { + alts = usblp->protocol[protocol].alt_setting; + if (alts < 0) + return -EINVAL; + r = usb_set_interface(usblp->dev, usblp->ifnum, alts); + if (r < 0) { + printk(KERN_ERR "usblp: can't set desired altsetting %d on interface %d\n", + alts, usblp->ifnum); + return r; + } } usblp->bidir = (usblp->protocol[protocol].epread != NULL); From a55a9a4c5c6253f6e4dea268af728664ac997790 Mon Sep 17 00:00:00 2001 From: kernel test robot Date: Thu, 21 Jan 2021 19:12:54 +0100 Subject: [PATCH 051/284] usb: gadget: aspeed: add missing of_node_put Breaking out of for_each_child_of_node requires a put on the child value. Generated by: scripts/coccinelle/iterators/for_each_child.cocci Fixes: 82c2d81361ec ("coccinelle: iterators: Add for_each_child.cocci script") CC: Sumera Priyadarsini Reported-by: kernel test robot Signed-off-by: kernel test robot Signed-off-by: Julia Lawall Cc: stable Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2101211907060.14700@hadrien Signed-off-by: Greg Kroah-Hartman --- drivers/usb/gadget/udc/aspeed-vhub/hub.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/usb/gadget/udc/aspeed-vhub/hub.c b/drivers/usb/gadget/udc/aspeed-vhub/hub.c index 6497185ec4e7..bfd8e77788e2 100644 --- a/drivers/usb/gadget/udc/aspeed-vhub/hub.c +++ b/drivers/usb/gadget/udc/aspeed-vhub/hub.c @@ -999,8 +999,10 @@ static int ast_vhub_of_parse_str_desc(struct ast_vhub *vhub, str_array[offset].s = NULL; ret = ast_vhub_str_alloc_add(vhub, &lang_str); - if (ret) + if (ret) { + of_node_put(child); break; + } } return ret; From 1d69f9d901ef14d81c3b004e3282b8cc7b456280 Mon Sep 17 00:00:00 2001 From: Ikjoon Jang Date: Wed, 13 Jan 2021 18:05:11 +0800 Subject: [PATCH 052/284] usb: xhci-mtk: fix unreleased bandwidth data xhci-mtk needs XHCI_MTK_HOST quirk functions in add_endpoint() and drop_endpoint() to handle its own sw bandwidth management. It stores bandwidth data into an internal table every time add_endpoint() is called, and drops those in drop_endpoint(). But when bandwidth allocation fails at one endpoint, all earlier allocation from the same interface could still remain at the table. This patch moves bandwidth management codes to check_bandwidth() and reset_bandwidth() path. To do so, this patch also adds those functions to xhci_driver_overrides and lets mtk-xhci to release all failed endpoints in reset_bandwidth() path. Fixes: 08e469de87a2 ("usb: xhci-mtk: supports bandwidth scheduling with multi-TT") Signed-off-by: Ikjoon Jang Link: https://lore.kernel.org/r/20210113180444.v6.1.Id0d31b5f3ddf5e734d2ab11161ac5821921b1e1e@changeid Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-mtk-sch.c | 125 ++++++++++++++++++++++---------- drivers/usb/host/xhci-mtk.c | 2 + drivers/usb/host/xhci-mtk.h | 13 ++++ drivers/usb/host/xhci.c | 8 +- drivers/usb/host/xhci.h | 4 + 5 files changed, 112 insertions(+), 40 deletions(-) diff --git a/drivers/usb/host/xhci-mtk-sch.c b/drivers/usb/host/xhci-mtk-sch.c index 45c54d56ecbd..a313e75ff1c6 100644 --- a/drivers/usb/host/xhci-mtk-sch.c +++ b/drivers/usb/host/xhci-mtk-sch.c @@ -200,6 +200,7 @@ static struct mu3h_sch_ep_info *create_sch_ep(struct usb_device *udev, sch_ep->sch_tt = tt; sch_ep->ep = ep; + INIT_LIST_HEAD(&sch_ep->tt_endpoint); return sch_ep; } @@ -583,6 +584,8 @@ int xhci_mtk_sch_init(struct xhci_hcd_mtk *mtk) mtk->sch_array = sch_array; + INIT_LIST_HEAD(&mtk->bw_ep_list_new); + return 0; } EXPORT_SYMBOL_GPL(xhci_mtk_sch_init); @@ -601,19 +604,14 @@ int xhci_mtk_add_ep_quirk(struct usb_hcd *hcd, struct usb_device *udev, struct xhci_ep_ctx *ep_ctx; struct xhci_slot_ctx *slot_ctx; struct xhci_virt_device *virt_dev; - struct mu3h_sch_bw_info *sch_bw; struct mu3h_sch_ep_info *sch_ep; - struct mu3h_sch_bw_info *sch_array; unsigned int ep_index; - int bw_index; - int ret = 0; xhci = hcd_to_xhci(hcd); virt_dev = xhci->devs[udev->slot_id]; ep_index = xhci_get_endpoint_index(&ep->desc); slot_ctx = xhci_get_slot_ctx(xhci, virt_dev->in_ctx); ep_ctx = xhci_get_ep_ctx(xhci, virt_dev->in_ctx, ep_index); - sch_array = mtk->sch_array; xhci_dbg(xhci, "%s() type:%d, speed:%d, mpkt:%d, dir:%d, ep:%p\n", __func__, usb_endpoint_type(&ep->desc), udev->speed, @@ -632,40 +630,35 @@ int xhci_mtk_add_ep_quirk(struct usb_hcd *hcd, struct usb_device *udev, return 0; } - bw_index = get_bw_index(xhci, udev, ep); - sch_bw = &sch_array[bw_index]; - sch_ep = create_sch_ep(udev, ep, ep_ctx); if (IS_ERR_OR_NULL(sch_ep)) return -ENOMEM; setup_sch_info(udev, ep_ctx, sch_ep); - ret = check_sch_bw(udev, sch_bw, sch_ep); - if (ret) { - xhci_err(xhci, "Not enough bandwidth!\n"); - if (is_fs_or_ls(udev->speed)) - drop_tt(udev); - - kfree(sch_ep); - return -ENOSPC; - } - - list_add_tail(&sch_ep->endpoint, &sch_bw->bw_ep_list); - - ep_ctx->reserved[0] |= cpu_to_le32(EP_BPKTS(sch_ep->pkts) - | EP_BCSCOUNT(sch_ep->cs_count) | EP_BBM(sch_ep->burst_mode)); - ep_ctx->reserved[1] |= cpu_to_le32(EP_BOFFSET(sch_ep->offset) - | EP_BREPEAT(sch_ep->repeat)); - - xhci_dbg(xhci, " PKTS:%x, CSCOUNT:%x, BM:%x, OFFSET:%x, REPEAT:%x\n", - sch_ep->pkts, sch_ep->cs_count, sch_ep->burst_mode, - sch_ep->offset, sch_ep->repeat); + list_add_tail(&sch_ep->endpoint, &mtk->bw_ep_list_new); return 0; } EXPORT_SYMBOL_GPL(xhci_mtk_add_ep_quirk); +static void xhci_mtk_drop_ep(struct xhci_hcd_mtk *mtk, struct usb_device *udev, + struct mu3h_sch_ep_info *sch_ep) +{ + struct xhci_hcd *xhci = hcd_to_xhci(mtk->hcd); + int bw_index = get_bw_index(xhci, udev, sch_ep->ep); + struct mu3h_sch_bw_info *sch_bw = &mtk->sch_array[bw_index]; + + update_bus_bw(sch_bw, sch_ep, 0); + list_del(&sch_ep->endpoint); + + if (sch_ep->sch_tt) { + list_del(&sch_ep->tt_endpoint); + drop_tt(udev); + } + kfree(sch_ep); +} + void xhci_mtk_drop_ep_quirk(struct usb_hcd *hcd, struct usb_device *udev, struct usb_host_endpoint *ep) { @@ -675,7 +668,7 @@ void xhci_mtk_drop_ep_quirk(struct usb_hcd *hcd, struct usb_device *udev, struct xhci_virt_device *virt_dev; struct mu3h_sch_bw_info *sch_array; struct mu3h_sch_bw_info *sch_bw; - struct mu3h_sch_ep_info *sch_ep; + struct mu3h_sch_ep_info *sch_ep, *tmp; int bw_index; xhci = hcd_to_xhci(hcd); @@ -694,17 +687,73 @@ void xhci_mtk_drop_ep_quirk(struct usb_hcd *hcd, struct usb_device *udev, bw_index = get_bw_index(xhci, udev, ep); sch_bw = &sch_array[bw_index]; - list_for_each_entry(sch_ep, &sch_bw->bw_ep_list, endpoint) { + list_for_each_entry_safe(sch_ep, tmp, &sch_bw->bw_ep_list, endpoint) { if (sch_ep->ep == ep) { - update_bus_bw(sch_bw, sch_ep, 0); - list_del(&sch_ep->endpoint); - if (is_fs_or_ls(udev->speed)) { - list_del(&sch_ep->tt_endpoint); - drop_tt(udev); - } - kfree(sch_ep); - break; + xhci_mtk_drop_ep(mtk, udev, sch_ep); } } } EXPORT_SYMBOL_GPL(xhci_mtk_drop_ep_quirk); + +int xhci_mtk_check_bandwidth(struct usb_hcd *hcd, struct usb_device *udev) +{ + struct xhci_hcd_mtk *mtk = hcd_to_mtk(hcd); + struct xhci_hcd *xhci = hcd_to_xhci(hcd); + struct xhci_virt_device *virt_dev = xhci->devs[udev->slot_id]; + struct mu3h_sch_bw_info *sch_bw; + struct mu3h_sch_ep_info *sch_ep, *tmp; + int bw_index, ret; + + dev_dbg(&udev->dev, "%s\n", __func__); + + list_for_each_entry(sch_ep, &mtk->bw_ep_list_new, endpoint) { + bw_index = get_bw_index(xhci, udev, sch_ep->ep); + sch_bw = &mtk->sch_array[bw_index]; + + ret = check_sch_bw(udev, sch_bw, sch_ep); + if (ret) { + xhci_err(xhci, "Not enough bandwidth!\n"); + return -ENOSPC; + } + } + + list_for_each_entry_safe(sch_ep, tmp, &mtk->bw_ep_list_new, endpoint) { + struct xhci_ep_ctx *ep_ctx; + struct usb_host_endpoint *ep = sch_ep->ep; + unsigned int ep_index = xhci_get_endpoint_index(&ep->desc); + + bw_index = get_bw_index(xhci, udev, ep); + sch_bw = &mtk->sch_array[bw_index]; + + list_move_tail(&sch_ep->endpoint, &sch_bw->bw_ep_list); + + ep_ctx = xhci_get_ep_ctx(xhci, virt_dev->in_ctx, ep_index); + ep_ctx->reserved[0] |= cpu_to_le32(EP_BPKTS(sch_ep->pkts) + | EP_BCSCOUNT(sch_ep->cs_count) + | EP_BBM(sch_ep->burst_mode)); + ep_ctx->reserved[1] |= cpu_to_le32(EP_BOFFSET(sch_ep->offset) + | EP_BREPEAT(sch_ep->repeat)); + + xhci_dbg(xhci, " PKTS:%x, CSCOUNT:%x, BM:%x, OFFSET:%x, REPEAT:%x\n", + sch_ep->pkts, sch_ep->cs_count, sch_ep->burst_mode, + sch_ep->offset, sch_ep->repeat); + } + + return xhci_check_bandwidth(hcd, udev); +} +EXPORT_SYMBOL_GPL(xhci_mtk_check_bandwidth); + +void xhci_mtk_reset_bandwidth(struct usb_hcd *hcd, struct usb_device *udev) +{ + struct xhci_hcd_mtk *mtk = hcd_to_mtk(hcd); + struct mu3h_sch_ep_info *sch_ep, *tmp; + + dev_dbg(&udev->dev, "%s\n", __func__); + + list_for_each_entry_safe(sch_ep, tmp, &mtk->bw_ep_list_new, endpoint) { + xhci_mtk_drop_ep(mtk, udev, sch_ep); + } + + xhci_reset_bandwidth(hcd, udev); +} +EXPORT_SYMBOL_GPL(xhci_mtk_reset_bandwidth); diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c index 8f321f39ab96..fe010cc61f19 100644 --- a/drivers/usb/host/xhci-mtk.c +++ b/drivers/usb/host/xhci-mtk.c @@ -347,6 +347,8 @@ static void usb_wakeup_set(struct xhci_hcd_mtk *mtk, bool enable) static int xhci_mtk_setup(struct usb_hcd *hcd); static const struct xhci_driver_overrides xhci_mtk_overrides __initconst = { .reset = xhci_mtk_setup, + .check_bandwidth = xhci_mtk_check_bandwidth, + .reset_bandwidth = xhci_mtk_reset_bandwidth, }; static struct hc_driver __read_mostly xhci_mtk_hc_driver; diff --git a/drivers/usb/host/xhci-mtk.h b/drivers/usb/host/xhci-mtk.h index a93cfe817904..577f431c5c93 100644 --- a/drivers/usb/host/xhci-mtk.h +++ b/drivers/usb/host/xhci-mtk.h @@ -130,6 +130,7 @@ struct mu3c_ippc_regs { struct xhci_hcd_mtk { struct device *dev; struct usb_hcd *hcd; + struct list_head bw_ep_list_new; struct mu3h_sch_bw_info *sch_array; struct mu3c_ippc_regs __iomem *ippc_regs; bool has_ippc; @@ -166,6 +167,8 @@ int xhci_mtk_add_ep_quirk(struct usb_hcd *hcd, struct usb_device *udev, struct usb_host_endpoint *ep); void xhci_mtk_drop_ep_quirk(struct usb_hcd *hcd, struct usb_device *udev, struct usb_host_endpoint *ep); +int xhci_mtk_check_bandwidth(struct usb_hcd *hcd, struct usb_device *udev); +void xhci_mtk_reset_bandwidth(struct usb_hcd *hcd, struct usb_device *udev); #else static inline int xhci_mtk_add_ep_quirk(struct usb_hcd *hcd, @@ -179,6 +182,16 @@ static inline void xhci_mtk_drop_ep_quirk(struct usb_hcd *hcd, { } +static inline int xhci_mtk_check_bandwidth(struct usb_hcd *hcd, + struct usb_device *udev) +{ + return 0; +} + +static inline void xhci_mtk_reset_bandwidth(struct usb_hcd *hcd, + struct usb_device *udev) +{ +} #endif #endif /* _XHCI_MTK_H_ */ diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index e86940571b4c..345a221028c6 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -2985,7 +2985,7 @@ static void xhci_check_bw_drop_ep_streams(struct xhci_hcd *xhci, * else should be touching the xhci->devs[slot_id] structure, so we * don't need to take the xhci->lock for manipulating that. */ -static int xhci_check_bandwidth(struct usb_hcd *hcd, struct usb_device *udev) +int xhci_check_bandwidth(struct usb_hcd *hcd, struct usb_device *udev) { int i; int ret = 0; @@ -3083,7 +3083,7 @@ command_cleanup: return ret; } -static void xhci_reset_bandwidth(struct usb_hcd *hcd, struct usb_device *udev) +void xhci_reset_bandwidth(struct usb_hcd *hcd, struct usb_device *udev) { struct xhci_hcd *xhci; struct xhci_virt_device *virt_dev; @@ -5510,6 +5510,10 @@ void xhci_init_driver(struct hc_driver *drv, drv->reset = over->reset; if (over->start) drv->start = over->start; + if (over->check_bandwidth) + drv->check_bandwidth = over->check_bandwidth; + if (over->reset_bandwidth) + drv->reset_bandwidth = over->reset_bandwidth; } } EXPORT_SYMBOL_GPL(xhci_init_driver); diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 25e57bc9c3cc..07ff95016f11 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1920,6 +1920,8 @@ struct xhci_driver_overrides { size_t extra_priv_size; int (*reset)(struct usb_hcd *hcd); int (*start)(struct usb_hcd *hcd); + int (*check_bandwidth)(struct usb_hcd *, struct usb_device *); + void (*reset_bandwidth)(struct usb_hcd *, struct usb_device *); }; #define XHCI_CFC_DELAY 10 @@ -2074,6 +2076,8 @@ int xhci_gen_setup(struct usb_hcd *hcd, xhci_get_quirks_t get_quirks); void xhci_shutdown(struct usb_hcd *hcd); void xhci_init_driver(struct hc_driver *drv, const struct xhci_driver_overrides *over); +int xhci_check_bandwidth(struct usb_hcd *hcd, struct usb_device *udev); +void xhci_reset_bandwidth(struct usb_hcd *hcd, struct usb_device *udev); int xhci_disable_slot(struct xhci_hcd *xhci, u32 slot_id); int xhci_ext_cap_init(struct xhci_hcd *xhci); From 19f6fe976a61f9afc289b062b7ef67f99b72e7b9 Mon Sep 17 00:00:00 2001 From: Neil Armstrong Date: Tue, 26 Jan 2021 09:09:51 +0100 Subject: [PATCH 053/284] Revert "arm64: dts: amlogic: add missing ethernet reset ID" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit It has been reported on IRC and in KernelCI boot tests, this change breaks internal PHY support on the Amlogic G12A/SM1 Based boards. We suspect the added signal to reset more than the Ethernet MAC but also the MDIO/(RG)MII mux used to redirect the MAC signals to the internal PHY. This reverts commit f3362f0c18174a1f334a419ab7d567a36bd1b3f3 while we find and acceptable solution to cleanly reset the Ethernet MAC. Reported-by: Corentin Labbe Acked-by: Jérôme Brunet Signed-off-by: Neil Armstrong Signed-off-by: Kevin Hilman Link: https://lore.kernel.org/r/20210126080951.2383740-1-narmstrong@baylibre.com --- arch/arm64/boot/dts/amlogic/meson-axg.dtsi | 2 -- arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi | 2 -- arch/arm64/boot/dts/amlogic/meson-gx.dtsi | 3 --- 3 files changed, 7 deletions(-) diff --git a/arch/arm64/boot/dts/amlogic/meson-axg.dtsi b/arch/arm64/boot/dts/amlogic/meson-axg.dtsi index ba1c6dfdc4b6..d945c84ab697 100644 --- a/arch/arm64/boot/dts/amlogic/meson-axg.dtsi +++ b/arch/arm64/boot/dts/amlogic/meson-axg.dtsi @@ -280,8 +280,6 @@ "timing-adjustment"; rx-fifo-depth = <4096>; tx-fifo-depth = <2048>; - resets = <&reset RESET_ETHERNET>; - reset-names = "stmmaceth"; power-domains = <&pwrc PWRC_AXG_ETHERNET_MEM_ID>; status = "disabled"; }; diff --git a/arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi index 221fcca3b0b9..b858c5e43cc8 100644 --- a/arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi +++ b/arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi @@ -224,8 +224,6 @@ "timing-adjustment"; rx-fifo-depth = <4096>; tx-fifo-depth = <2048>; - resets = <&reset RESET_ETHERNET>; - reset-names = "stmmaceth"; status = "disabled"; mdio0: mdio { diff --git a/arch/arm64/boot/dts/amlogic/meson-gx.dtsi b/arch/arm64/boot/dts/amlogic/meson-gx.dtsi index 726b91d3a905..0edd137151f8 100644 --- a/arch/arm64/boot/dts/amlogic/meson-gx.dtsi +++ b/arch/arm64/boot/dts/amlogic/meson-gx.dtsi @@ -13,7 +13,6 @@ #include #include #include -#include #include / { @@ -576,8 +575,6 @@ interrupt-names = "macirq"; rx-fifo-depth = <4096>; tx-fifo-depth = <2048>; - resets = <&reset RESET_ETHERNET>; - reset-names = "stmmaceth"; power-domains = <&pwrc PWRC_GXBB_ETHERNET_MEM_ID>; status = "disabled"; }; From fed1b6a00a191cad4dd843519b590e3d6ad9f843 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sun, 24 Jan 2021 08:09:23 +0100 Subject: [PATCH 054/284] dmaengine: ti: k3-udma: Fix a resource leak in an error handling path In 'dma_pool_create()', we return -ENOMEM, but don't release the resources already allocated, as in all the other error handling paths. Go to 'err_res_free' instead of returning directly. Fixes: 017794739702 ("dmaengine: ti: k3-udma: Initial support for K3 BCDMA") Signed-off-by: Christophe JAILLET Acked-by: Peter Ujfalusi Link: https://lore.kernel.org/r/20210124070923.724479-1-christophe.jaillet@wanadoo.fr Signed-off-by: Vinod Koul --- drivers/dma/ti/k3-udma.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c index 298460438bb4..f474a1232335 100644 --- a/drivers/dma/ti/k3-udma.c +++ b/drivers/dma/ti/k3-udma.c @@ -2401,7 +2401,8 @@ static int bcdma_alloc_chan_resources(struct dma_chan *chan) dev_err(ud->ddev.dev, "Descriptor pool allocation failed\n"); uc->use_dma_pool = false; - return -ENOMEM; + ret = -ENOMEM; + goto err_res_free; } uc->use_dma_pool = true; From cf81c3abe1b84c4b82fbe771f72e6d181a3d1b7c Mon Sep 17 00:00:00 2001 From: "Enrico Weigelt, metux IT consult" Date: Thu, 14 Jan 2021 11:02:16 +0100 Subject: [PATCH 055/284] kconfig: mconf: fix HOSTCC call Commit c0f975af1745 ("kconfig: Support building mconf with vendor sysroot ncurses") introduces a bug when HOSTCC contains parameters: the whole command line is treated as the program name (with spaces in it). Therefore, we have to remove the quotes. Fixes: c0f975af1745 ("kconfig: Support building mconf with vendor sysroot ncurses") Signed-off-by: Enrico Weigelt, metux IT consult Signed-off-by: Masahiro Yamada --- scripts/kconfig/mconf-cfg.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/kconfig/mconf-cfg.sh b/scripts/kconfig/mconf-cfg.sh index fcd4acd4e9cb..b520e407a8eb 100755 --- a/scripts/kconfig/mconf-cfg.sh +++ b/scripts/kconfig/mconf-cfg.sh @@ -35,7 +35,7 @@ fi # As a final fallback before giving up, check if $HOSTCC knows of a default # ncurses installation (e.g. from a vendor-specific sysroot). -if echo '#include ' | "${HOSTCC}" -E - >/dev/null 2>&1; then +if echo '#include ' | ${HOSTCC} -E - >/dev/null 2>&1; then echo cflags=\"-D_GNU_SOURCE\" echo libs=\"-lncurses\" exit 0 From 94c41b3a7c370b0d6afc5ace8fafa0531865a940 Mon Sep 17 00:00:00 2001 From: Hajime Tazaki Date: Mon, 21 Dec 2020 11:24:34 +0900 Subject: [PATCH 056/284] um: ubd: fix command line handling of ubd This commit fixes a regression to handle command line parameters of ubd. With a simple line "./linux ubd0="./disk-ext4.img", it fails at ubd_setup_common(). The commit adds additional checks to the variables in order to properly parse the paremeters which previously worked. Fixes: ef3ba87cb7c9 ("um: ubd: Set device serial attribute from cmdline") Cc: Christopher Obbard Signed-off-by: Hajime Tazaki Acked-by: Christopher Obbard Signed-off-by: Richard Weinberger --- arch/um/drivers/ubd_kern.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c index 13b1fe694b90..bd16b17ba4d6 100644 --- a/arch/um/drivers/ubd_kern.c +++ b/arch/um/drivers/ubd_kern.c @@ -375,11 +375,11 @@ break_loop: file = NULL; backing_file = strsep(&str, ",:"); - if (*backing_file == '\0') + if (backing_file && *backing_file == '\0') backing_file = NULL; serial = strsep(&str, ",:"); - if (*serial == '\0') + if (serial && *serial == '\0') serial = NULL; if (backing_file && ubd_dev->no_cow) { From 1cdcfb44370b28187a0c33cdbcb4705103ed81aa Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Thu, 17 Dec 2020 13:15:56 +0100 Subject: [PATCH 057/284] um: return error from ioremap() Back a few years ago, ioremap() was added to UML so that we'd not break the build for everything all the time. However, for some reason, v1 of the patch got applied, rather than the v2 that returned NULL, which was discussed here: https://lore.kernel.org/lkml/1495726955-27497-1-git-send-email-logang@deltatee.com/ Fix that now. Signed-off-by: Johannes Berg Acked-by: Arnd Bergmann Signed-off-by: Richard Weinberger --- arch/um/include/asm/io.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/um/include/asm/io.h b/arch/um/include/asm/io.h index 96f77b5232aa..cef03e3aa0f9 100644 --- a/arch/um/include/asm/io.h +++ b/arch/um/include/asm/io.h @@ -5,7 +5,7 @@ #define ioremap ioremap static inline void __iomem *ioremap(phys_addr_t offset, size_t size) { - return (void __iomem *)(unsigned long)offset; + return NULL; } #define iounmap iounmap From d7ffac33631b2f72ec4cbbf9a64be6aa011b5cfd Mon Sep 17 00:00:00 2001 From: Thomas Meyer Date: Tue, 5 Jan 2021 13:01:28 +0100 Subject: [PATCH 058/284] um: stdio_console: Make preferred console The addition of the "ttynull" console driver did break the ordering of the UML stdio console driver. The UML stdio console driver is added in late_initcall (7), whereby the ttynull driver is added in device_initcall (6), which always does make the ttynull driver the default console. Fix it by explicitly adding the UML stdio console as the preferred console, in case no 'console=' command line option was specified. Signed-off-by: Thomas Meyer Signed-off-by: Richard Weinberger --- arch/um/kernel/um_arch.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c index 31d356b1ffd8..80e2660782a0 100644 --- a/arch/um/kernel/um_arch.c +++ b/arch/um/kernel/um_arch.c @@ -26,7 +26,8 @@ #include #include -#define DEFAULT_COMMAND_LINE "root=98:0" +#define DEFAULT_COMMAND_LINE_ROOT "root=98:0" +#define DEFAULT_COMMAND_LINE_CONSOLE "console=tty" /* Changed in add_arg and setup_arch, which run before SMP is started */ static char __initdata command_line[COMMAND_LINE_SIZE] = { 0 }; @@ -109,7 +110,8 @@ unsigned long end_vm; int ncpus = 1; /* Set in early boot */ -static int have_root __initdata = 0; +static int have_root __initdata; +static int have_console __initdata; /* Set in uml_mem_setup and modified in linux_main */ long long physmem_size = 32 * 1024 * 1024; @@ -161,6 +163,17 @@ __uml_setup("debug", no_skas_debug_setup, " this flag is not needed to run gdb on UML in skas mode\n\n" ); +static int __init uml_console_setup(char *line, int *add) +{ + have_console = 1; + return 0; +} + +__uml_setup("console=", uml_console_setup, +"console=\n" +" Specify the preferred console output driver\n\n" +); + static int __init Usage(char *line, int *add) { const char **p; @@ -264,7 +277,10 @@ int __init linux_main(int argc, char **argv) add_arg(argv[i]); } if (have_root == 0) - add_arg(DEFAULT_COMMAND_LINE); + add_arg(DEFAULT_COMMAND_LINE_ROOT); + + if (have_console == 0) + add_arg(DEFAULT_COMMAND_LINE_CONSOLE); host_task_size = os_get_top_address(); /* From e23fe90dec286cd77e9059033aa640fc45603602 Mon Sep 17 00:00:00 2001 From: Thomas Meyer Date: Thu, 7 Jan 2021 09:05:31 +0100 Subject: [PATCH 059/284] um: kmsg_dumper: always dump when not tty console With the addition of the ttynull console driver, the chance that a console driver was already registerd did increase. Refine the logic when to dump the kernel message buffer: always dump the buffer, when the UML stdio console driver is not active and the preferred console. Signed-off-by: Thomas Meyer Signed-off-by: Richard Weinberger --- arch/um/kernel/kmsg_dump.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/um/kernel/kmsg_dump.c b/arch/um/kernel/kmsg_dump.c index e4abac6c9727..6516ef1f8274 100644 --- a/arch/um/kernel/kmsg_dump.c +++ b/arch/um/kernel/kmsg_dump.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 #include #include +#include #include #include #include @@ -16,8 +17,12 @@ static void kmsg_dumper_stdout(struct kmsg_dumper *dumper, if (!console_trylock()) return; - for_each_console(con) - break; + for_each_console(con) { + if(strcmp(con->name, "tty") == 0 && + (con->flags & (CON_ENABLED | CON_CONSDEV)) != 0) { + break; + } + } console_unlock(); From f4172b084342fd3f9e38c10650ffe19eac30d8ce Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Thu, 7 Jan 2021 22:15:21 +0100 Subject: [PATCH 060/284] um: virtio: free vu_dev only with the contained struct device Since struct device is refcounted, we shouldn't free the vu_dev immediately when it's removed from the platform device, but only when the references actually all go away. Move the freeing to the release to accomplish that. Fixes: 5d38f324993f ("um: drivers: Add virtio vhost-user driver") Signed-off-by: Johannes Berg Signed-off-by: Richard Weinberger --- arch/um/drivers/virtio_uml.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/um/drivers/virtio_uml.c b/arch/um/drivers/virtio_uml.c index 27e92d3881ff..5d957b7e7fd5 100644 --- a/arch/um/drivers/virtio_uml.c +++ b/arch/um/drivers/virtio_uml.c @@ -1084,6 +1084,7 @@ static void virtio_uml_release_dev(struct device *d) } os_close_file(vu_dev->sock); + kfree(vu_dev); } /* Platform device */ @@ -1097,7 +1098,7 @@ static int virtio_uml_probe(struct platform_device *pdev) if (!pdata) return -EINVAL; - vu_dev = devm_kzalloc(&pdev->dev, sizeof(*vu_dev), GFP_KERNEL); + vu_dev = kzalloc(sizeof(*vu_dev), GFP_KERNEL); if (!vu_dev) return -ENOMEM; From 2fcb4090cd7352665ecb756990a3087bfd86a295 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Sun, 10 Jan 2021 19:05:08 +0100 Subject: [PATCH 061/284] Revert "um: allocate a guard page to helper threads" This reverts commit ef4459a6da09 ("um: allocate a guard page to helper threads"), it's broken in multiple ways: 1) the free no longer matches the alloc; and 2) more importantly, the set_memory_ro() causes allocation of page tables for the normal memory that doesn't have any, and that later causes corruption and crashes (usually but not always in vfree()). We could fix the first bug and use vmalloc() to work around the second, but set_memory_ro() actually doesn't do anything either so I'll just revert that as well. Reported-by: Benjamin Berg Fixes: ef4459a6da09 ("um: allocate a guard page to helper threads") Signed-off-by: Johannes Berg Signed-off-by: Richard Weinberger --- arch/um/drivers/ubd_kern.c | 2 +- arch/um/include/shared/kern_util.h | 2 +- arch/um/kernel/process.c | 11 ++++------- arch/um/os-Linux/helper.c | 4 ++-- 4 files changed, 8 insertions(+), 11 deletions(-) diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c index bd16b17ba4d6..8e0b43cf089f 100644 --- a/arch/um/drivers/ubd_kern.c +++ b/arch/um/drivers/ubd_kern.c @@ -1241,7 +1241,7 @@ static int __init ubd_driver_init(void){ /* Letting ubd=sync be like using ubd#s= instead of ubd#= is * enough. So use anyway the io thread. */ } - stack = alloc_stack(0); + stack = alloc_stack(0, 0); io_pid = start_io_thread(stack + PAGE_SIZE - sizeof(void *), &thread_fd); if(io_pid < 0){ diff --git a/arch/um/include/shared/kern_util.h b/arch/um/include/shared/kern_util.h index d8c279e3312f..2888ec812f6e 100644 --- a/arch/um/include/shared/kern_util.h +++ b/arch/um/include/shared/kern_util.h @@ -19,7 +19,7 @@ extern int kmalloc_ok; #define UML_ROUND_UP(addr) \ ((((unsigned long) addr) + PAGE_SIZE - 1) & PAGE_MASK) -extern unsigned long alloc_stack(int atomic); +extern unsigned long alloc_stack(int order, int atomic); extern void free_stack(unsigned long stack, int order); struct pt_regs; diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c index 2a986ece5478..81d508daf67c 100644 --- a/arch/um/kernel/process.c +++ b/arch/um/kernel/process.c @@ -32,7 +32,6 @@ #include #include #include -#include /* * This is a per-cpu array. A processor only modifies its entry and it only @@ -63,18 +62,16 @@ void free_stack(unsigned long stack, int order) free_pages(stack, order); } -unsigned long alloc_stack(int atomic) +unsigned long alloc_stack(int order, int atomic) { - unsigned long addr; + unsigned long page; gfp_t flags = GFP_KERNEL; if (atomic) flags = GFP_ATOMIC; - addr = __get_free_pages(flags, 1); + page = __get_free_pages(flags, order); - set_memory_ro(addr, 1); - - return addr + PAGE_SIZE; + return page; } static inline void set_current(struct task_struct *task) diff --git a/arch/um/os-Linux/helper.c b/arch/um/os-Linux/helper.c index feb48d796e00..9fa6e4187d4f 100644 --- a/arch/um/os-Linux/helper.c +++ b/arch/um/os-Linux/helper.c @@ -45,7 +45,7 @@ int run_helper(void (*pre_exec)(void *), void *pre_data, char **argv) unsigned long stack, sp; int pid, fds[2], ret, n; - stack = alloc_stack(__cant_sleep()); + stack = alloc_stack(0, __cant_sleep()); if (stack == 0) return -ENOMEM; @@ -116,7 +116,7 @@ int run_helper_thread(int (*proc)(void *), void *arg, unsigned int flags, unsigned long stack, sp; int pid, status, err; - stack = alloc_stack(__cant_sleep()); + stack = alloc_stack(0, __cant_sleep()); if (stack == 0) return -ENOMEM; From a31e9c4e7247d182192e9b85abbea498d63dd850 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Sun, 10 Jan 2021 19:05:09 +0100 Subject: [PATCH 062/284] Revert "um: support some of ARCH_HAS_SET_MEMORY" This reverts commit 963285b0b47a ("um: support some of ARCH_HAS_SET_MEMORY"), as it turns out that it's not only not working (due to um never using the protection bits in the page tables) but also corrupts the page tables if used on a non-vmalloc page, since um never allocates proper page tables for the 'physmem' in the first place. Fixing all this will take more effort, so for now revert it. Reported-by: Benjamin Berg Fixes: 963285b0b47a ("um: support some of ARCH_HAS_SET_MEMORY") Signed-off-by: Johannes Berg Signed-off-by: Richard Weinberger --- arch/um/Kconfig | 1 - arch/um/include/asm/pgtable.h | 3 -- arch/um/include/asm/set_memory.h | 1 - arch/um/kernel/tlb.c | 54 -------------------------------- 4 files changed, 59 deletions(-) delete mode 100644 arch/um/include/asm/set_memory.h diff --git a/arch/um/Kconfig b/arch/um/Kconfig index 34d302d1a07f..c3030db3325f 100644 --- a/arch/um/Kconfig +++ b/arch/um/Kconfig @@ -15,7 +15,6 @@ config UML select HAVE_DEBUG_KMEMLEAK select HAVE_DEBUG_BUGVERBOSE select NO_DMA - select ARCH_HAS_SET_MEMORY select GENERIC_IRQ_SHOW select GENERIC_CPU_DEVICES select HAVE_GCC_PLUGINS diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h index 39376bb63abf..def376194dce 100644 --- a/arch/um/include/asm/pgtable.h +++ b/arch/um/include/asm/pgtable.h @@ -55,15 +55,12 @@ extern unsigned long end_iomem; #define _PAGE_CHG_MASK (PAGE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY) #define __PAGE_KERNEL_EXEC \ (_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED) -#define __PAGE_KERNEL_RO \ - (_PAGE_PRESENT | _PAGE_DIRTY | _PAGE_ACCESSED) #define PAGE_NONE __pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED) #define PAGE_SHARED __pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED) #define PAGE_COPY __pgprot(_PAGE_PRESENT | _PAGE_USER | _PAGE_ACCESSED) #define PAGE_READONLY __pgprot(_PAGE_PRESENT | _PAGE_USER | _PAGE_ACCESSED) #define PAGE_KERNEL __pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED) #define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC) -#define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO) /* * The i386 can't do page protection for execute, and considers that the same diff --git a/arch/um/include/asm/set_memory.h b/arch/um/include/asm/set_memory.h deleted file mode 100644 index 24266c63720d..000000000000 --- a/arch/um/include/asm/set_memory.h +++ /dev/null @@ -1 +0,0 @@ -#include diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c index 437d1f1cc5ec..61776790cd67 100644 --- a/arch/um/kernel/tlb.c +++ b/arch/um/kernel/tlb.c @@ -608,57 +608,3 @@ void force_flush_all(void) vma = vma->vm_next; } } - -struct page_change_data { - unsigned int set_mask, clear_mask; -}; - -static int change_page_range(pte_t *ptep, unsigned long addr, void *data) -{ - struct page_change_data *cdata = data; - pte_t pte = READ_ONCE(*ptep); - - pte_clear_bits(pte, cdata->clear_mask); - pte_set_bits(pte, cdata->set_mask); - - set_pte(ptep, pte); - return 0; -} - -static int change_memory(unsigned long start, unsigned long pages, - unsigned int set_mask, unsigned int clear_mask) -{ - unsigned long size = pages * PAGE_SIZE; - struct page_change_data data; - int ret; - - data.set_mask = set_mask; - data.clear_mask = clear_mask; - - ret = apply_to_page_range(&init_mm, start, size, change_page_range, - &data); - - flush_tlb_kernel_range(start, start + size); - - return ret; -} - -int set_memory_ro(unsigned long addr, int numpages) -{ - return change_memory(addr, numpages, 0, _PAGE_RW); -} - -int set_memory_rw(unsigned long addr, int numpages) -{ - return change_memory(addr, numpages, _PAGE_RW, 0); -} - -int set_memory_nx(unsigned long addr, int numpages) -{ - return -EOPNOTSUPP; -} - -int set_memory_x(unsigned long addr, int numpages) -{ - return -EOPNOTSUPP; -} From 9868c2081d071f7c309796c8dffc94364fc07582 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Fri, 22 Jan 2021 21:40:23 +0100 Subject: [PATCH 063/284] um: fix os_idle_sleep() to not hang Changing os_idle_sleep() to use pause() (I accidentally described it as an empty select() in the commit log because I had changed it from that to pause() in a later revision) exposed a race condition in the idle code. The following can happen: timer_settime(0, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=624017}}, NULL) = 0 ... pause() and we now hang forever. This was previously possible as well, but it could never cause UML to hang for more than a second since we could only sleep for that much, so at most you'd notice a "hiccup" in the UML. Obviously, any sort of external interrupt also "saves" it and interrupts pause(). Fix this by properly handling the race, rather than papering over it again: - first, block SIGALRM, and obtain the old signal set - check the timer - suspend, waiting for any signal out of the old set, if, and only if, the timer will fire in the future - restore the old signal mask This ensures race-free operation: as it's blocked, the signal won't be delivered while we're looking at the timer even if it were to be triggered right _after_ we've returned from timer_gettime() with a non-zero value (telling us the timer will trigger). Thus, despite getting to sigsuspend() because timer_gettime() told us we're still waiting, we'll not hang because sigsuspend() will return immediately due to the pending signal. Fixes: 49da38a3ef33 ("um: Simplify os_idle_sleep() and sleep longer") Signed-off-by: Johannes Berg Acked-By: Anton Ivanov Signed-off-by: Richard Weinberger --- arch/um/os-Linux/time.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/um/os-Linux/time.c b/arch/um/os-Linux/time.c index a61cbf73a179..6c5041c5560b 100644 --- a/arch/um/os-Linux/time.c +++ b/arch/um/os-Linux/time.c @@ -104,5 +104,18 @@ long long os_nsecs(void) */ void os_idle_sleep(void) { - pause(); + struct itimerspec its; + sigset_t set, old; + + /* block SIGALRM while we analyze the timer state */ + sigemptyset(&set); + sigaddset(&set, SIGALRM); + sigprocmask(SIG_BLOCK, &set, &old); + + /* check the timer, and if it'll fire then wait for it */ + timer_gettime(event_high_res_timer, &its); + if (its.it_value.tv_sec || its.it_value.tv_nsec) + sigsuspend(&old); + /* either way, restore the signal mask */ + sigprocmask(SIG_UNBLOCK, &set, NULL); } From 7f3414226b58b0df0426104c8ab5e8d50ae71d11 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Fri, 15 Jan 2021 12:49:45 +0100 Subject: [PATCH 064/284] um: time: fix initialization in time-travel mode In time-travel mode, since my previous patch, the start time was initialized too late, so that the system would read it before we set it, thus always starting system time at 0 (1970-01-01). This happens because timekeeping_init() reads the time and is called before time_init(). Unfortunately, I didn't see this before because I was testing it only with the RTC patch applied (and enabled), and then the time is read again by the RTC a little - after time_init() this time. Fix this by just doing the initialization whenever necessary. Fixes: 2701c1bd91dd ("um: time: Fix read_persistent_clock64() in time-travel") Signed-off-by: Johannes Berg Signed-off-by: Richard Weinberger --- arch/um/kernel/time.c | 50 +++++++++++++++++++++++++++---------------- 1 file changed, 31 insertions(+), 19 deletions(-) diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c index f4db89b5b5a6..315248b03941 100644 --- a/arch/um/kernel/time.c +++ b/arch/um/kernel/time.c @@ -535,6 +535,31 @@ invalid_number: return 1; } + +static void time_travel_set_start(void) +{ + if (time_travel_start_set) + return; + + switch (time_travel_mode) { + case TT_MODE_EXTERNAL: + time_travel_start = time_travel_ext_req(UM_TIMETRAVEL_GET_TOD, -1); + /* controller gave us the *current* time, so adjust by that */ + time_travel_ext_get_time(); + time_travel_start -= time_travel_time; + break; + case TT_MODE_INFCPU: + case TT_MODE_BASIC: + if (!time_travel_start_set) + time_travel_start = os_persistent_clock_emulation(); + break; + case TT_MODE_OFF: + /* we just read the host clock with os_persistent_clock_emulation() */ + break; + } + + time_travel_start_set = true; +} #else /* CONFIG_UML_TIME_TRAVEL_SUPPORT */ #define time_travel_start_set 0 #define time_travel_start 0 @@ -553,6 +578,10 @@ static void time_travel_set_interval(unsigned long long interval) { } +static inline void time_travel_set_start(void) +{ +} + /* fail link if this actually gets used */ extern u64 time_travel_ext_req(u32 op, u64 time); @@ -731,6 +760,8 @@ void read_persistent_clock64(struct timespec64 *ts) { long long nsecs; + time_travel_set_start(); + if (time_travel_mode != TT_MODE_OFF) nsecs = time_travel_start + time_travel_time; else @@ -742,25 +773,6 @@ void read_persistent_clock64(struct timespec64 *ts) void __init time_init(void) { -#ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT - switch (time_travel_mode) { - case TT_MODE_EXTERNAL: - time_travel_start = time_travel_ext_req(UM_TIMETRAVEL_GET_TOD, -1); - /* controller gave us the *current* time, so adjust by that */ - time_travel_ext_get_time(); - time_travel_start -= time_travel_time; - break; - case TT_MODE_INFCPU: - case TT_MODE_BASIC: - if (!time_travel_start_set) - time_travel_start = os_persistent_clock_emulation(); - break; - case TT_MODE_OFF: - /* we just read the host clock with os_persistent_clock_emulation() */ - break; - } -#endif - timer_set_signal_handler(); late_time_init = um_timer_setup; } From 03a58ea5905fdbd93ff9e52e670d802600ba38cd Mon Sep 17 00:00:00 2001 From: Kent Gibson Date: Thu, 21 Jan 2021 22:10:38 +0800 Subject: [PATCH 065/284] gpiolib: cdev: clear debounce period if line set to output When set_config changes a line from input to output debounce is implicitly disabled, as debounce makes no sense for outputs, but the debounce period is not being cleared and is still reported in the line info. So clear the debounce period when the debouncer is stopped in edge_detector_stop(). Fixes: 65cff7046406 ("gpiolib: cdev: support setting debounce") Cc: stable@vger.kernel.org Signed-off-by: Kent Gibson Reviewed-by: Linus Walleij Signed-off-by: Bartosz Golaszewski --- drivers/gpio/gpiolib-cdev.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c index 1a7b51163528..1631727bf0da 100644 --- a/drivers/gpio/gpiolib-cdev.c +++ b/drivers/gpio/gpiolib-cdev.c @@ -776,6 +776,8 @@ static void edge_detector_stop(struct line *line) cancel_delayed_work_sync(&line->work); WRITE_ONCE(line->sw_debounced, 0); WRITE_ONCE(line->eflags, 0); + if (line->desc) + WRITE_ONCE(line->desc->debounce_period_us, 0); /* do not change line->level - see comment in debounced_value() */ } From 40fb68c7725aee024ed99ad38504f5d25820c6f0 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Wed, 27 Jan 2021 09:50:41 -0600 Subject: [PATCH 066/284] Revert "PCI/ASPM: Save/restore L1SS Capability for suspend/resume" This reverts commit 4257f7e008ea394fcecc050f1569c3503b8bcc15. Kenneth reported that after 4257f7e008ea, he sees a torrent of disk I/O errors on his NVMe device after suspend/resume until a reboot. Link: https://lore.kernel.org/linux-pci/20201228040513.GA611645@bjorn-Precision-5520/ Reported-by: Kenneth R. Crudup Signed-off-by: Bjorn Helgaas --- drivers/pci/pci.c | 7 ------- drivers/pci/pci.h | 4 ---- drivers/pci/pcie/aspm.c | 44 ----------------------------------------- 3 files changed, 55 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index b9fecc25d213..790393d1e318 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1558,7 +1558,6 @@ int pci_save_state(struct pci_dev *dev) return i; pci_save_ltr_state(dev); - pci_save_aspm_l1ss_state(dev); pci_save_dpc_state(dev); pci_save_aer_state(dev); pci_save_ptm_state(dev); @@ -1665,7 +1664,6 @@ void pci_restore_state(struct pci_dev *dev) * LTR itself (in the PCIe capability). */ pci_restore_ltr_state(dev); - pci_restore_aspm_l1ss_state(dev); pci_restore_pcie_state(dev); pci_restore_pasid_state(dev); @@ -3353,11 +3351,6 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev) if (error) pci_err(dev, "unable to allocate suspend buffer for LTR\n"); - error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS, - 2 * sizeof(u32)); - if (error) - pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n"); - pci_allocate_vc_save_buffers(dev); } diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 5c59365092fa..a7bdf0b1d45d 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -582,15 +582,11 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev); void pcie_aspm_exit_link_state(struct pci_dev *pdev); void pcie_aspm_pm_state_change(struct pci_dev *pdev); void pcie_aspm_powersave_config_link(struct pci_dev *pdev); -void pci_save_aspm_l1ss_state(struct pci_dev *dev); -void pci_restore_aspm_l1ss_state(struct pci_dev *dev); #else static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { } static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { } static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { } static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { } -static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { } -static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { } #endif #ifdef CONFIG_PCIE_ECRC diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index a08e7d6dc248..ac0557a305af 100644 --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -734,50 +734,6 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state) PCI_L1SS_CTL1_L1SS_MASK, val); } -void pci_save_aspm_l1ss_state(struct pci_dev *dev) -{ - int aspm_l1ss; - struct pci_cap_saved_state *save_state; - u32 *cap; - - if (!pci_is_pcie(dev)) - return; - - aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS); - if (!aspm_l1ss) - return; - - save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS); - if (!save_state) - return; - - cap = (u32 *)&save_state->cap.data[0]; - pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++); - pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++); -} - -void pci_restore_aspm_l1ss_state(struct pci_dev *dev) -{ - int aspm_l1ss; - struct pci_cap_saved_state *save_state; - u32 *cap; - - if (!pci_is_pcie(dev)) - return; - - aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS); - if (!aspm_l1ss) - return; - - save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS); - if (!save_state) - return; - - cap = (u32 *)&save_state->cap.data[0]; - pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++); - pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++); -} - static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val) { pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL, From daf12bee07b9e2f38216f58aca7ac4e4e66a7146 Mon Sep 17 00:00:00 2001 From: Marek Szyprowski Date: Fri, 22 Jan 2021 06:52:18 +0100 Subject: [PATCH 067/284] arm64: dts: meson: switch TFLASH_VDD_EN pin to open drain on Odroid-C4 For the proper reboot Odroid-C4 board requires to switch TFLASH_VDD_EN pin to the high impedance mode, otherwise the board is stuck in the middle of loading early stages of the bootloader from SD card. This can be achieved by using the OPEN_DRAIN flag instead of the ACTIVE_HIGH, what will leave the pin in input mode to achieve high state (pin has the pull-up) and solve the issue. Suggested-by: Neil Armstrong Fixes: 326e57518b0d ("arm64: dts: meson-sm1: add support for Hardkernel ODROID-C4") Signed-off-by: Marek Szyprowski Acked-by: Martin Blumenstingl Signed-off-by: Kevin Hilman Link: https://lore.kernel.org/r/20210122055218.27241-1-m.szyprowski@samsung.com --- arch/arm64/boot/dts/amlogic/meson-sm1-odroid-c4.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/amlogic/meson-sm1-odroid-c4.dts b/arch/arm64/boot/dts/amlogic/meson-sm1-odroid-c4.dts index cf5a98f0e47c..a712273c905a 100644 --- a/arch/arm64/boot/dts/amlogic/meson-sm1-odroid-c4.dts +++ b/arch/arm64/boot/dts/amlogic/meson-sm1-odroid-c4.dts @@ -52,7 +52,7 @@ regulator-min-microvolt = <3300000>; regulator-max-microvolt = <3300000>; - gpio = <&gpio_ao GPIOAO_3 GPIO_ACTIVE_HIGH>; + gpio = <&gpio_ao GPIOAO_3 GPIO_OPEN_DRAIN>; enable-active-high; regulator-always-on; }; From 2cea4a7a1885bd0c765089afc14f7ff0eb77864e Mon Sep 17 00:00:00 2001 From: Rolf Eike Beer Date: Thu, 22 Nov 2018 16:40:49 +0100 Subject: [PATCH 068/284] scripts: use pkg-config to locate libcrypto Otherwise build fails if the headers are not in the default location. While at it also ask pkg-config for the libs, with fallback to the existing value. Signed-off-by: Rolf Eike Beer Cc: stable@vger.kernel.org # 5.6.x Signed-off-by: Masahiro Yamada --- scripts/Makefile | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/scripts/Makefile b/scripts/Makefile index b5418ec587fb..9de3c03b94aa 100644 --- a/scripts/Makefile +++ b/scripts/Makefile @@ -3,6 +3,9 @@ # scripts contains sources for various helper programs used throughout # the kernel for the build process. +CRYPTO_LIBS = $(shell pkg-config --libs libcrypto 2> /dev/null || echo -lcrypto) +CRYPTO_CFLAGS = $(shell pkg-config --cflags libcrypto 2> /dev/null) + hostprogs-always-$(CONFIG_BUILD_BIN2C) += bin2c hostprogs-always-$(CONFIG_KALLSYMS) += kallsyms hostprogs-always-$(BUILD_C_RECORDMCOUNT) += recordmcount @@ -14,8 +17,9 @@ hostprogs-always-$(CONFIG_SYSTEM_EXTRA_CERTIFICATE) += insert-sys-cert HOSTCFLAGS_sorttable.o = -I$(srctree)/tools/include HOSTCFLAGS_asn1_compiler.o = -I$(srctree)/include -HOSTLDLIBS_sign-file = -lcrypto -HOSTLDLIBS_extract-cert = -lcrypto +HOSTLDLIBS_sign-file = $(CRYPTO_LIBS) +HOSTCFLAGS_extract-cert.o = $(CRYPTO_CFLAGS) +HOSTLDLIBS_extract-cert = $(CRYPTO_LIBS) ifdef CONFIG_UNWINDER_ORC ifeq ($(ARCH),x86_64) From ae9162e2be767240065b2f16c3061fc0a3622f61 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Wed, 27 Jan 2021 04:15:40 +0900 Subject: [PATCH 069/284] Revert "checkpatch: add check for keyword 'boolean' in Kconfig definitions" This reverts commit 327953e9af6c59ad111b28359e59e3ec0cbd71b6. You cannot use 'boolean' since commit b92d804a5179 ("kconfig: drop 'boolean' keyword"). This check is no longer needed. Signed-off-by: Masahiro Yamada Acked-by: Joe Perches --- scripts/checkpatch.pl | 7 ------- 1 file changed, 7 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 92e888ed939f..1afe3af1cc09 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -3390,13 +3390,6 @@ sub process { } } -# discourage the use of boolean for type definition attributes of Kconfig options - if ($realfile =~ /Kconfig/ && - $line =~ /^\+\s*\bboolean\b/) { - WARN("CONFIG_TYPE_BOOLEAN", - "Use of boolean is deprecated, please use bool instead.\n" . $herecurr); - } - if (($realfile =~ /Makefile.*/ || $realfile =~ /Kbuild.*/) && ($line =~ /\+(EXTRA_[A-Z]+FLAGS).*/)) { my $flag = $1; From b64acb28da8394485f0762e657470c9fc33aca4d Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 25 Jan 2021 12:36:42 +0100 Subject: [PATCH 070/284] ath9k: fix build error with LEDS_CLASS=m When CONFIG_ATH9K is built-in but LED support is in a loadable module, both ath9k drivers fails to link: x86_64-linux-ld: drivers/net/wireless/ath/ath9k/gpio.o: in function `ath_deinit_leds': gpio.c:(.text+0x36): undefined reference to `led_classdev_unregister' x86_64-linux-ld: drivers/net/wireless/ath/ath9k/gpio.o: in function `ath_init_leds': gpio.c:(.text+0x179): undefined reference to `led_classdev_register_ext' The problem is that the 'imply' keyword does not enforce any dependency but is only a weak hint to Kconfig to enable another symbol from a defconfig file. Change imply to a 'depends on LEDS_CLASS' that prevents the incorrect configuration but still allows building the driver without LED support. The 'select MAC80211_LEDS' is now ensures that the LED support is actually used if it is present, and the added Kconfig dependency on MAC80211_LEDS ensures that it cannot be enabled manually when it has no effect. Fixes: 197f466e93f5 ("ath9k_htc: Do not select MAC80211_LEDS by default") Signed-off-by: Arnd Bergmann Acked-by: Johannes Berg Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20210125113654.2408057-1-arnd@kernel.org --- drivers/net/wireless/ath/ath9k/Kconfig | 8 ++------ net/mac80211/Kconfig | 2 +- 2 files changed, 3 insertions(+), 7 deletions(-) diff --git a/drivers/net/wireless/ath/ath9k/Kconfig b/drivers/net/wireless/ath/ath9k/Kconfig index a84bb9b6573f..e150d82eddb6 100644 --- a/drivers/net/wireless/ath/ath9k/Kconfig +++ b/drivers/net/wireless/ath/ath9k/Kconfig @@ -21,11 +21,9 @@ config ATH9K_BTCOEX_SUPPORT config ATH9K tristate "Atheros 802.11n wireless cards support" depends on MAC80211 && HAS_DMA + select MAC80211_LEDS if LEDS_CLASS=y || LEDS_CLASS=MAC80211 select ATH9K_HW select ATH9K_COMMON - imply NEW_LEDS - imply LEDS_CLASS - imply MAC80211_LEDS help This module adds support for wireless adapters based on Atheros IEEE 802.11n AR5008, AR9001 and AR9002 family @@ -176,11 +174,9 @@ config ATH9K_PCI_NO_EEPROM config ATH9K_HTC tristate "Atheros HTC based wireless cards support" depends on USB && MAC80211 + select MAC80211_LEDS if LEDS_CLASS=y || LEDS_CLASS=MAC80211 select ATH9K_HW select ATH9K_COMMON - imply NEW_LEDS - imply LEDS_CLASS - imply MAC80211_LEDS help Support for Atheros HTC based cards. Chipsets supported: AR9271 diff --git a/net/mac80211/Kconfig b/net/mac80211/Kconfig index cd9a9bd242ba..51ec8256b7fa 100644 --- a/net/mac80211/Kconfig +++ b/net/mac80211/Kconfig @@ -69,7 +69,7 @@ config MAC80211_MESH config MAC80211_LEDS bool "Enable LED triggers" depends on MAC80211 - depends on LEDS_CLASS + depends on LEDS_CLASS=y || LEDS_CLASS=MAC80211 select LEDS_TRIGGERS help This option enables a few LED triggers for different From 93a1d4791c10d443bc67044def7efee2991d48b7 Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Tue, 26 Jan 2021 12:02:13 +0100 Subject: [PATCH 071/284] mt76: dma: fix a possible memory leak in mt76_add_fragment() Fix a memory leak in mt76_add_fragment routine returning the buffer to the page_frag_cache when we receive a new fragment and the skb_shared_info frag array is full. Fixes: b102f0c522cf6 ("mt76: fix array overflow on receiving too many fragments for a packet") Signed-off-by: Lorenzo Bianconi Acked-by: Felix Fietkau Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/4f9dd73407da88b2a552517ce8db242d86bf4d5c.1611616130.git.lorenzo@kernel.org --- drivers/net/wireless/mediatek/mt76/dma.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/dma.c b/drivers/net/wireless/mediatek/mt76/dma.c index 73eeb00d5aa6..e81dfaf99bcb 100644 --- a/drivers/net/wireless/mediatek/mt76/dma.c +++ b/drivers/net/wireless/mediatek/mt76/dma.c @@ -509,15 +509,17 @@ static void mt76_add_fragment(struct mt76_dev *dev, struct mt76_queue *q, void *data, int len, bool more) { - struct page *page = virt_to_head_page(data); - int offset = data - page_address(page); struct sk_buff *skb = q->rx_head; struct skb_shared_info *shinfo = skb_shinfo(skb); if (shinfo->nr_frags < ARRAY_SIZE(shinfo->frags)) { - offset += q->buf_offset; + struct page *page = virt_to_head_page(data); + int offset = data - page_address(page) + q->buf_offset; + skb_add_rx_frag(skb, shinfo->nr_frags, page, offset, len, q->buf_size); + } else { + skb_free_frag(data); } if (more) From 181f494888d5b178ffda41bed965f187d5e5c432 Mon Sep 17 00:00:00 2001 From: Michael Roth Date: Wed, 27 Jan 2021 20:44:51 -0600 Subject: [PATCH 072/284] KVM: x86: fix CPUID entries returned by KVM_GET_CPUID2 ioctl Recent commit 255cbecfe0 modified struct kvm_vcpu_arch to make 'cpuid_entries' a pointer to an array of kvm_cpuid_entry2 entries rather than embedding the array in the struct. KVM_SET_CPUID and KVM_SET_CPUID2 were updated accordingly, but KVM_GET_CPUID2 was missed. As a result, KVM_GET_CPUID2 currently returns random fields from struct kvm_vcpu_arch to userspace rather than the expected CPUID values. Fix this by treating 'cpuid_entries' as a pointer when copying its contents to userspace buffer. Fixes: 255cbecfe0c9 ("KVM: x86: allocate vcpu->arch.cpuid_entries dynamically") Cc: Vitaly Kuznetsov Signed-off-by: Michael Roth Message-Id: <20210128024451.1816770-1-michael.roth@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini --- arch/x86/kvm/cpuid.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 13036cf0b912..38172ca627d3 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -321,7 +321,7 @@ int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu, if (cpuid->nent < vcpu->arch.cpuid_nent) goto out; r = -EFAULT; - if (copy_to_user(entries, &vcpu->arch.cpuid_entries, + if (copy_to_user(entries, vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent * sizeof(struct kvm_cpuid_entry2))) goto out; return 0; From e478d6029dca9d8462f426aee0d32896ef64f10f Mon Sep 17 00:00:00 2001 From: Christoph Schemmel Date: Wed, 27 Jan 2021 20:58:46 +0100 Subject: [PATCH 073/284] USB: serial: option: Adding support for Cinterion MV31 Adding support for Cinterion device MV31 for enumeration with PID 0x00B3 and 0x00B7. usb-devices output for 0x00B3 T: Bus=04 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 6 Spd=5000 MxCh= 0 D: Ver= 3.20 Cls=ef(misc ) Sub=02 Prot=01 MxPS= 9 #Cfgs= 1 P: Vendor=1e2d ProdID=00b3 Rev=04.14 S: Manufacturer=Cinterion S: Product=Cinterion PID 0x00B3 USB Mobile Broadband S: SerialNumber=b3246eed C: #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=896mA I: If#=0x0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim I: If#=0x1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#=0x3 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=ff Driver=cdc_wdm I: If#=0x4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#=0x5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option usb-devices output for 0x00B7 T: Bus=04 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 5 Spd=5000 MxCh= 0 D: Ver= 3.20 Cls=ef(misc ) Sub=02 Prot=01 MxPS= 9 #Cfgs= 1 P: Vendor=1e2d ProdID=00b7 Rev=04.14 S: Manufacturer=Cinterion S: Product=Cinterion PID 0x00B3 USB Mobile Broadband S: SerialNumber=b3246eed C: #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=896mA I: If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan I: If#=0x1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#=0x3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option Signed-off-by: Christoph Schemmel Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold --- drivers/usb/serial/option.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index 3fe959104311..2049e66f34a3 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -425,6 +425,8 @@ static void option_instat_callback(struct urb *urb); #define CINTERION_PRODUCT_AHXX_2RMNET 0x0084 #define CINTERION_PRODUCT_AHXX_AUDIO 0x0085 #define CINTERION_PRODUCT_CLS8 0x00b0 +#define CINTERION_PRODUCT_MV31_MBIM 0x00b3 +#define CINTERION_PRODUCT_MV31_RMNET 0x00b7 /* Olivetti products */ #define OLIVETTI_VENDOR_ID 0x0b3c @@ -1914,6 +1916,10 @@ static const struct usb_device_id option_ids[] = { { USB_DEVICE(SIEMENS_VENDOR_ID, CINTERION_PRODUCT_HC25_MDMNET) }, { USB_DEVICE(SIEMENS_VENDOR_ID, CINTERION_PRODUCT_HC28_MDM) }, /* HC28 enumerates with Siemens or Cinterion VID depending on FW revision */ { USB_DEVICE(SIEMENS_VENDOR_ID, CINTERION_PRODUCT_HC28_MDMNET) }, + { USB_DEVICE_INTERFACE_CLASS(CINTERION_VENDOR_ID, CINTERION_PRODUCT_MV31_MBIM, 0xff), + .driver_info = RSVD(3)}, + { USB_DEVICE_INTERFACE_CLASS(CINTERION_VENDOR_ID, CINTERION_PRODUCT_MV31_RMNET, 0xff), + .driver_info = RSVD(0)}, { USB_DEVICE(OLIVETTI_VENDOR_ID, OLIVETTI_PRODUCT_OLICARD100), .driver_info = RSVD(4) }, { USB_DEVICE(OLIVETTI_VENDOR_ID, OLIVETTI_PRODUCT_OLICARD120), From 13f445d65955f388499f00851dc9a86280970f7c Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Tue, 12 Jan 2021 23:35:50 -0800 Subject: [PATCH 074/284] libnvdimm/namespace: Fix visibility of namespace resource attribute Legacy pmem namespaces lost support for the "resource" attribute when the code was cleaned up to put the permission visibility in the declaration. Restore this by listing 'resource' in the default attributes. A new ndctl regression test for pfn_to_online_page() corner cases builds on this fix. Fixes: bfd2e9140656 ("libnvdimm: Simplify root read-only definition for the 'resource' attribute") Cc: Vishal Verma Cc: Dave Jiang Cc: Ira Weiny Cc: Link: https://lore.kernel.org/r/161052334995.1805594.12054873528154362921.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams --- drivers/nvdimm/namespace_devs.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c index 6da67f4d641a..2403b71b601e 100644 --- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -1635,11 +1635,11 @@ static umode_t namespace_visible(struct kobject *kobj, return a->mode; } - if (a == &dev_attr_nstype.attr || a == &dev_attr_size.attr - || a == &dev_attr_holder.attr - || a == &dev_attr_holder_class.attr - || a == &dev_attr_force_raw.attr - || a == &dev_attr_mode.attr) + /* base is_namespace_io() attributes */ + if (a == &dev_attr_nstype.attr || a == &dev_attr_size.attr || + a == &dev_attr_holder.attr || a == &dev_attr_holder_class.attr || + a == &dev_attr_force_raw.attr || a == &dev_attr_mode.attr || + a == &dev_attr_resource.attr) return a->mode; return 0; From 9a27e109a391c9021147553b97c3fe4356e2261c Mon Sep 17 00:00:00 2001 From: Santosh Sivaraj Date: Tue, 22 Dec 2020 09:52:34 +0530 Subject: [PATCH 075/284] testing/nvdimm: Add test module for non-nfit platforms The current test module cannot be used for testing platforms (make check) that do not have support for NFIT. In order to get the ndctl tests working, we need a module which can emulate NVDIMM devices without relying on ACPI/NFIT. The aim of this proposed module is to implement a similar functionality to the existing module but without the ACPI dependencies. This RFC series is split into reviewable and compilable chunks. This patch adds a new driver and registers two nvdimm bus needed for ndctl make check. Signed-off-by: Santosh Sivaraj Link: https://lore.kernel.org/r/20201222042240.2983755-2-santosh@fossix.org Signed-off-by: Dan Williams --- tools/testing/nvdimm/config_check.c | 3 +- tools/testing/nvdimm/test/Kbuild | 6 +- tools/testing/nvdimm/test/ndtest.c | 206 ++++++++++++++++++++++++++++ tools/testing/nvdimm/test/ndtest.h | 16 +++ 4 files changed, 229 insertions(+), 2 deletions(-) create mode 100644 tools/testing/nvdimm/test/ndtest.c create mode 100644 tools/testing/nvdimm/test/ndtest.h diff --git a/tools/testing/nvdimm/config_check.c b/tools/testing/nvdimm/config_check.c index cac891028cd1..3e3a5f518864 100644 --- a/tools/testing/nvdimm/config_check.c +++ b/tools/testing/nvdimm/config_check.c @@ -12,7 +12,8 @@ void check(void) BUILD_BUG_ON(!IS_MODULE(CONFIG_ND_BTT)); BUILD_BUG_ON(!IS_MODULE(CONFIG_ND_PFN)); BUILD_BUG_ON(!IS_MODULE(CONFIG_ND_BLK)); - BUILD_BUG_ON(!IS_MODULE(CONFIG_ACPI_NFIT)); + if (IS_ENABLED(CONFIG_ACPI_NFIT)) + BUILD_BUG_ON(!IS_MODULE(CONFIG_ACPI_NFIT)); BUILD_BUG_ON(!IS_MODULE(CONFIG_DEV_DAX)); BUILD_BUG_ON(!IS_MODULE(CONFIG_DEV_DAX_PMEM)); } diff --git a/tools/testing/nvdimm/test/Kbuild b/tools/testing/nvdimm/test/Kbuild index 75baebf8f4ba..197bcb2b7f35 100644 --- a/tools/testing/nvdimm/test/Kbuild +++ b/tools/testing/nvdimm/test/Kbuild @@ -5,5 +5,9 @@ ccflags-y += -I$(srctree)/drivers/acpi/nfit/ obj-m += nfit_test.o obj-m += nfit_test_iomap.o -nfit_test-y := nfit.o +ifeq ($(CONFIG_ACPI_NFIT),m) + nfit_test-y := nfit.o +else + nfit_test-y := ndtest.o +endif nfit_test_iomap-y := iomap.o diff --git a/tools/testing/nvdimm/test/ndtest.c b/tools/testing/nvdimm/test/ndtest.c new file mode 100644 index 000000000000..f89d74fdcdee --- /dev/null +++ b/tools/testing/nvdimm/test/ndtest.c @@ -0,0 +1,206 @@ +// SPDX-License-Identifier: GPL-2.0-only +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../watermark.h" +#include "nfit_test.h" +#include "ndtest.h" + +enum { + DIMM_SIZE = SZ_32M, + LABEL_SIZE = SZ_128K, + NUM_INSTANCES = 2, + NUM_DCR = 4, +}; + +static struct ndtest_priv *instances[NUM_INSTANCES]; +static struct class *ndtest_dimm_class; + +static inline struct ndtest_priv *to_ndtest_priv(struct device *dev) +{ + struct platform_device *pdev = to_platform_device(dev); + + return container_of(pdev, struct ndtest_priv, pdev); +} + +static int ndtest_ctl(struct nvdimm_bus_descriptor *nd_desc, + struct nvdimm *nvdimm, unsigned int cmd, void *buf, + unsigned int buf_len, int *cmd_rc) +{ + struct ndtest_dimm *dimm; + int _cmd_rc; + + if (!cmd_rc) + cmd_rc = &_cmd_rc; + + *cmd_rc = 0; + + if (!nvdimm) + return -EINVAL; + + dimm = nvdimm_provider_data(nvdimm); + if (!dimm) + return -EINVAL; + + switch (cmd) { + case ND_CMD_GET_CONFIG_SIZE: + case ND_CMD_GET_CONFIG_DATA: + case ND_CMD_SET_CONFIG_DATA: + default: + return -EINVAL; + } + + return 0; +} + +static int ndtest_bus_register(struct ndtest_priv *p) +{ + p->bus_desc.ndctl = ndtest_ctl; + p->bus_desc.module = THIS_MODULE; + p->bus_desc.provider_name = NULL; + + p->bus = nvdimm_bus_register(&p->pdev.dev, &p->bus_desc); + if (!p->bus) { + dev_err(&p->pdev.dev, "Error creating nvdimm bus %pOF\n", p->dn); + return -ENOMEM; + } + + return 0; +} + +static int ndtest_remove(struct platform_device *pdev) +{ + struct ndtest_priv *p = to_ndtest_priv(&pdev->dev); + + nvdimm_bus_unregister(p->bus); + return 0; +} + +static int ndtest_probe(struct platform_device *pdev) +{ + struct ndtest_priv *p; + + p = to_ndtest_priv(&pdev->dev); + if (ndtest_bus_register(p)) + return -ENOMEM; + + platform_set_drvdata(pdev, p); + + return 0; +} + +static const struct platform_device_id ndtest_id[] = { + { KBUILD_MODNAME }, + { }, +}; + +static struct platform_driver ndtest_driver = { + .probe = ndtest_probe, + .remove = ndtest_remove, + .driver = { + .name = KBUILD_MODNAME, + }, + .id_table = ndtest_id, +}; + +static void ndtest_release(struct device *dev) +{ + struct ndtest_priv *p = to_ndtest_priv(dev); + + kfree(p); +} + +static void cleanup_devices(void) +{ + int i; + + for (i = 0; i < NUM_INSTANCES; i++) + if (instances[i]) + platform_device_unregister(&instances[i]->pdev); + + nfit_test_teardown(); + + if (ndtest_dimm_class) + class_destroy(ndtest_dimm_class); +} + +static __init int ndtest_init(void) +{ + int rc, i; + + pmem_test(); + libnvdimm_test(); + device_dax_test(); + dax_pmem_test(); + dax_pmem_core_test(); +#ifdef CONFIG_DEV_DAX_PMEM_COMPAT + dax_pmem_compat_test(); +#endif + + ndtest_dimm_class = class_create(THIS_MODULE, "nfit_test_dimm"); + if (IS_ERR(ndtest_dimm_class)) { + rc = PTR_ERR(ndtest_dimm_class); + goto err_register; + } + + /* Each instance can be taken as a bus, which can have multiple dimms */ + for (i = 0; i < NUM_INSTANCES; i++) { + struct ndtest_priv *priv; + struct platform_device *pdev; + + priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) { + rc = -ENOMEM; + goto err_register; + } + + INIT_LIST_HEAD(&priv->resources); + pdev = &priv->pdev; + pdev->name = KBUILD_MODNAME; + pdev->id = i; + pdev->dev.release = ndtest_release; + rc = platform_device_register(pdev); + if (rc) { + put_device(&pdev->dev); + goto err_register; + } + get_device(&pdev->dev); + + instances[i] = priv; + } + + rc = platform_driver_register(&ndtest_driver); + if (rc) + goto err_register; + + return 0; + +err_register: + pr_err("Error registering platform device\n"); + cleanup_devices(); + + return rc; +} + +static __exit void ndtest_exit(void) +{ + cleanup_devices(); + platform_driver_unregister(&ndtest_driver); +} + +module_init(ndtest_init); +module_exit(ndtest_exit); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("IBM Corporation"); diff --git a/tools/testing/nvdimm/test/ndtest.h b/tools/testing/nvdimm/test/ndtest.h new file mode 100644 index 000000000000..831ac5c65f50 --- /dev/null +++ b/tools/testing/nvdimm/test/ndtest.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef NDTEST_H +#define NDTEST_H + +#include +#include + +struct ndtest_priv { + struct platform_device pdev; + struct device_node *dn; + struct list_head resources; + struct nvdimm_bus_descriptor bus_desc; + struct nvdimm_bus *bus; +}; + +#endif /* NDTEST_H */ From 107b04e970cae754100efb99a5312c321208ca03 Mon Sep 17 00:00:00 2001 From: Santosh Sivaraj Date: Tue, 22 Dec 2020 09:52:35 +0530 Subject: [PATCH 076/284] ndtest: Add compatability string to treat it as PAPR family Since this module is written to be platform agnostic, the module is made part of the PAPR_FAMILY. ndctl identifies the family using the compatible string inside of_node dir-entry. Signed-off-by: Santosh Sivaraj Link: https://lore.kernel.org/r/20201222042240.2983755-3-santosh@fossix.org Signed-off-by: Dan Williams --- tools/testing/nvdimm/test/ndtest.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/tools/testing/nvdimm/test/ndtest.c b/tools/testing/nvdimm/test/ndtest.c index f89d74fdcdee..001527b37a23 100644 --- a/tools/testing/nvdimm/test/ndtest.c +++ b/tools/testing/nvdimm/test/ndtest.c @@ -65,11 +65,34 @@ static int ndtest_ctl(struct nvdimm_bus_descriptor *nd_desc, return 0; } +static ssize_t compatible_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "nvdimm_test"); +} +static DEVICE_ATTR_RO(compatible); + +static struct attribute *of_node_attributes[] = { + &dev_attr_compatible.attr, + NULL +}; + +static const struct attribute_group of_node_attribute_group = { + .name = "of_node", + .attrs = of_node_attributes, +}; + +static const struct attribute_group *ndtest_attribute_groups[] = { + &of_node_attribute_group, + NULL, +}; + static int ndtest_bus_register(struct ndtest_priv *p) { p->bus_desc.ndctl = ndtest_ctl; p->bus_desc.module = THIS_MODULE; p->bus_desc.provider_name = NULL; + p->bus_desc.attr_groups = ndtest_attribute_groups; p->bus = nvdimm_bus_register(&p->pdev.dev, &p->bus_desc); if (!p->bus) { From 9399ab61ad82154911563dd8635c585e3f24b16a Mon Sep 17 00:00:00 2001 From: Santosh Sivaraj Date: Tue, 22 Dec 2020 09:52:36 +0530 Subject: [PATCH 077/284] ndtest: Add dimms to the two buses A config array is used to hold the dimms for each bus. These dimms are registered with nvdimm, and new nvdimms are created on the buses. Signed-off-by: Santosh Sivaraj Link: https://lore.kernel.org/r/20201222042240.2983755-4-santosh@fossix.org Signed-off-by: Dan Williams --- tools/testing/nvdimm/test/ndtest.c | 258 +++++++++++++++++++++++++++++ tools/testing/nvdimm/test/ndtest.h | 36 ++++ 2 files changed, 294 insertions(+) diff --git a/tools/testing/nvdimm/test/ndtest.c b/tools/testing/nvdimm/test/ndtest.c index 001527b37a23..a82790013f8a 100644 --- a/tools/testing/nvdimm/test/ndtest.c +++ b/tools/testing/nvdimm/test/ndtest.c @@ -25,8 +25,83 @@ enum { NUM_DCR = 4, }; +#define NDTEST_SCM_DIMM_CMD_MASK \ + ((1ul << ND_CMD_GET_CONFIG_SIZE) | \ + (1ul << ND_CMD_GET_CONFIG_DATA) | \ + (1ul << ND_CMD_SET_CONFIG_DATA) | \ + (1ul << ND_CMD_CALL)) + +#define NFIT_DIMM_HANDLE(node, socket, imc, chan, dimm) \ + (((node & 0xfff) << 16) | ((socket & 0xf) << 12) \ + | ((imc & 0xf) << 8) | ((chan & 0xf) << 4) | (dimm & 0xf)) + +static DEFINE_SPINLOCK(ndtest_lock); static struct ndtest_priv *instances[NUM_INSTANCES]; static struct class *ndtest_dimm_class; +static struct gen_pool *ndtest_pool; + +static struct ndtest_dimm dimm_group1[] = { + { + .size = DIMM_SIZE, + .handle = NFIT_DIMM_HANDLE(0, 0, 0, 0, 0), + .uuid_str = "1e5c75d2-b618-11ea-9aa3-507b9ddc0f72", + .physical_id = 0, + .num_formats = 2, + }, + { + .size = DIMM_SIZE, + .handle = NFIT_DIMM_HANDLE(0, 0, 0, 0, 1), + .uuid_str = "1c4d43ac-b618-11ea-be80-507b9ddc0f72", + .physical_id = 1, + .num_formats = 2, + }, + { + .size = DIMM_SIZE, + .handle = NFIT_DIMM_HANDLE(0, 0, 1, 0, 0), + .uuid_str = "a9f17ffc-b618-11ea-b36d-507b9ddc0f72", + .physical_id = 2, + .num_formats = 2, + }, + { + .size = DIMM_SIZE, + .handle = NFIT_DIMM_HANDLE(0, 0, 1, 0, 1), + .uuid_str = "b6b83b22-b618-11ea-8aae-507b9ddc0f72", + .physical_id = 3, + .num_formats = 2, + }, + { + .size = DIMM_SIZE, + .handle = NFIT_DIMM_HANDLE(0, 1, 0, 0, 0), + .uuid_str = "bf9baaee-b618-11ea-b181-507b9ddc0f72", + .physical_id = 4, + .num_formats = 2, + }, +}; + +static struct ndtest_dimm dimm_group2[] = { + { + .size = DIMM_SIZE, + .handle = NFIT_DIMM_HANDLE(1, 0, 0, 0, 0), + .uuid_str = "ca0817e2-b618-11ea-9db3-507b9ddc0f72", + .physical_id = 0, + .num_formats = 1, + }, +}; + +static struct ndtest_config bus_configs[NUM_INSTANCES] = { + /* bus 1 */ + { + .dimm_start = 0, + .dimm_count = ARRAY_SIZE(dimm_group1), + .dimms = dimm_group1, + }, + /* bus 2 */ + { + .dimm_start = ARRAY_SIZE(dimm_group1), + .dimm_count = ARRAY_SIZE(dimm_group2), + .dimms = dimm_group2, + }, +}; static inline struct ndtest_priv *to_ndtest_priv(struct device *dev) { @@ -65,6 +140,152 @@ static int ndtest_ctl(struct nvdimm_bus_descriptor *nd_desc, return 0; } +static void ndtest_release_resource(void *data) +{ + struct nfit_test_resource *res = data; + + spin_lock(&ndtest_lock); + list_del(&res->list); + spin_unlock(&ndtest_lock); + + if (resource_size(&res->res) >= DIMM_SIZE) + gen_pool_free(ndtest_pool, res->res.start, + resource_size(&res->res)); + vfree(res->buf); + kfree(res); +} + +static void *ndtest_alloc_resource(struct ndtest_priv *p, size_t size, + dma_addr_t *dma) +{ + dma_addr_t __dma; + void *buf; + struct nfit_test_resource *res; + struct genpool_data_align data = { + .align = SZ_128M, + }; + + res = kzalloc(sizeof(*res), GFP_KERNEL); + if (!res) + return NULL; + + buf = vmalloc(size); + if (size >= DIMM_SIZE) + __dma = gen_pool_alloc_algo(ndtest_pool, size, + gen_pool_first_fit_align, &data); + else + __dma = (unsigned long) buf; + + if (!__dma) + goto buf_err; + + INIT_LIST_HEAD(&res->list); + res->dev = &p->pdev.dev; + res->buf = buf; + res->res.start = __dma; + res->res.end = __dma + size - 1; + res->res.name = "NFIT"; + spin_lock_init(&res->lock); + INIT_LIST_HEAD(&res->requests); + spin_lock(&ndtest_lock); + list_add(&res->list, &p->resources); + spin_unlock(&ndtest_lock); + + if (dma) + *dma = __dma; + + if (!devm_add_action(&p->pdev.dev, ndtest_release_resource, res)) + return res->buf; + +buf_err: + if (__dma && size >= DIMM_SIZE) + gen_pool_free(ndtest_pool, __dma, size); + if (buf) + vfree(buf); + kfree(res); + + return NULL; +} + +static void put_dimms(void *data) +{ + struct ndtest_priv *p = data; + int i; + + for (i = 0; i < p->config->dimm_count; i++) + if (p->config->dimms[i].dev) { + device_unregister(p->config->dimms[i].dev); + p->config->dimms[i].dev = NULL; + } +} + +static int ndtest_dimm_register(struct ndtest_priv *priv, + struct ndtest_dimm *dimm, int id) +{ + struct device *dev = &priv->pdev.dev; + unsigned long dimm_flags = dimm->flags; + + if (dimm->num_formats > 1) { + set_bit(NDD_ALIASING, &dimm_flags); + set_bit(NDD_LABELING, &dimm_flags); + } + + dimm->nvdimm = nvdimm_create(priv->bus, dimm, NULL, dimm_flags, + NDTEST_SCM_DIMM_CMD_MASK, 0, NULL); + if (!dimm->nvdimm) { + dev_err(dev, "Error creating DIMM object for %pOF\n", priv->dn); + return -ENXIO; + } + + dimm->dev = device_create_with_groups(ndtest_dimm_class, + &priv->pdev.dev, + 0, dimm, NULL, + "test_dimm%d", id); + if (!dimm->dev) { + pr_err("Could not create dimm device attributes\n"); + return -ENOMEM; + } + + return 0; +} + +static int ndtest_nvdimm_init(struct ndtest_priv *p) +{ + struct ndtest_dimm *d; + void *res; + int i, id; + + for (i = 0; i < p->config->dimm_count; i++) { + d = &p->config->dimms[i]; + d->id = id = p->config->dimm_start + i; + res = ndtest_alloc_resource(p, LABEL_SIZE, NULL); + if (!res) + return -ENOMEM; + + d->label_area = res; + sprintf(d->label_area, "label%d", id); + d->config_size = LABEL_SIZE; + + if (!ndtest_alloc_resource(p, d->size, + &p->dimm_dma[id])) + return -ENOMEM; + + if (!ndtest_alloc_resource(p, LABEL_SIZE, + &p->label_dma[id])) + return -ENOMEM; + + if (!ndtest_alloc_resource(p, LABEL_SIZE, + &p->dcr_dma[id])) + return -ENOMEM; + + d->address = p->dimm_dma[id]; + + ndtest_dimm_register(p, d, id); + } + + return 0; +} + static ssize_t compatible_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -89,6 +310,8 @@ static const struct attribute_group *ndtest_attribute_groups[] = { static int ndtest_bus_register(struct ndtest_priv *p) { + p->config = &bus_configs[p->pdev.id]; + p->bus_desc.ndctl = ndtest_ctl; p->bus_desc.module = THIS_MODULE; p->bus_desc.provider_name = NULL; @@ -114,14 +337,34 @@ static int ndtest_remove(struct platform_device *pdev) static int ndtest_probe(struct platform_device *pdev) { struct ndtest_priv *p; + int rc; p = to_ndtest_priv(&pdev->dev); if (ndtest_bus_register(p)) return -ENOMEM; + p->dcr_dma = devm_kcalloc(&p->pdev.dev, NUM_DCR, + sizeof(dma_addr_t), GFP_KERNEL); + p->label_dma = devm_kcalloc(&p->pdev.dev, NUM_DCR, + sizeof(dma_addr_t), GFP_KERNEL); + p->dimm_dma = devm_kcalloc(&p->pdev.dev, NUM_DCR, + sizeof(dma_addr_t), GFP_KERNEL); + + rc = ndtest_nvdimm_init(p); + if (rc) + goto err; + + rc = devm_add_action_or_reset(&pdev->dev, put_dimms, p); + if (rc) + goto err; + platform_set_drvdata(pdev, p); return 0; + +err: + pr_err("%s:%d Failed nvdimm init\n", __func__, __LINE__); + return rc; } static const struct platform_device_id ndtest_id[] = { @@ -155,6 +398,10 @@ static void cleanup_devices(void) nfit_test_teardown(); + if (ndtest_pool) + gen_pool_destroy(ndtest_pool); + + if (ndtest_dimm_class) class_destroy(ndtest_dimm_class); } @@ -178,6 +425,17 @@ static __init int ndtest_init(void) goto err_register; } + ndtest_pool = gen_pool_create(ilog2(SZ_4M), NUMA_NO_NODE); + if (!ndtest_pool) { + rc = -ENOMEM; + goto err_register; + } + + if (gen_pool_add(ndtest_pool, SZ_4G, SZ_4G, NUMA_NO_NODE)) { + rc = -ENOMEM; + goto err_register; + } + /* Each instance can be taken as a bus, which can have multiple dimms */ for (i = 0; i < NUM_INSTANCES; i++) { struct ndtest_priv *priv; diff --git a/tools/testing/nvdimm/test/ndtest.h b/tools/testing/nvdimm/test/ndtest.h index 831ac5c65f50..e607d72ffed1 100644 --- a/tools/testing/nvdimm/test/ndtest.h +++ b/tools/testing/nvdimm/test/ndtest.h @@ -5,12 +5,48 @@ #include #include +struct ndtest_config; + struct ndtest_priv { struct platform_device pdev; struct device_node *dn; struct list_head resources; struct nvdimm_bus_descriptor bus_desc; struct nvdimm_bus *bus; + struct ndtest_config *config; + + dma_addr_t *dcr_dma; + dma_addr_t *label_dma; + dma_addr_t *dimm_dma; +}; + +struct ndtest_dimm { + struct device *dev; + struct nvdimm *nvdimm; + struct ndtest_blk_mmio *mmio; + struct nd_region *blk_region; + + dma_addr_t address; + unsigned long long flags; + unsigned long config_size; + void *label_area; + char *uuid_str; + + unsigned int size; + unsigned int handle; + unsigned int fail_cmd; + unsigned int physical_id; + unsigned int num_formats; + int id; + int fail_cmd_code; + u8 no_alias; +}; + +struct ndtest_config { + struct ndtest_dimm *dimms; + unsigned int dimm_count; + unsigned int dimm_start; + u8 num_regions; }; #endif /* NDTEST_H */ From 5e41396f723004a4e5710a0bb03259a443be1971 Mon Sep 17 00:00:00 2001 From: Santosh Sivaraj Date: Tue, 22 Dec 2020 09:52:37 +0530 Subject: [PATCH 078/284] ndtest: Add dimm attributes This patch adds sysfs attributes for nvdimm and the dimm device. Signed-off-by: Santosh Sivaraj Link: https://lore.kernel.org/r/20201222042240.2983755-5-santosh@fossix.org Signed-off-by: Dan Williams --- tools/testing/nvdimm/test/ndtest.c | 202 ++++++++++++++++++++++++++++- 1 file changed, 200 insertions(+), 2 deletions(-) diff --git a/tools/testing/nvdimm/test/ndtest.c b/tools/testing/nvdimm/test/ndtest.c index a82790013f8a..f7682e1d3efe 100644 --- a/tools/testing/nvdimm/test/ndtest.c +++ b/tools/testing/nvdimm/test/ndtest.c @@ -219,6 +219,203 @@ static void put_dimms(void *data) } } +static ssize_t handle_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct ndtest_dimm *dimm = dev_get_drvdata(dev); + + return sprintf(buf, "%#x\n", dimm->handle); +} +static DEVICE_ATTR_RO(handle); + +static ssize_t fail_cmd_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct ndtest_dimm *dimm = dev_get_drvdata(dev); + + return sprintf(buf, "%#x\n", dimm->fail_cmd); +} + +static ssize_t fail_cmd_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t size) +{ + struct ndtest_dimm *dimm = dev_get_drvdata(dev); + unsigned long val; + ssize_t rc; + + rc = kstrtol(buf, 0, &val); + if (rc) + return rc; + + dimm->fail_cmd = val; + + return size; +} +static DEVICE_ATTR_RW(fail_cmd); + +static ssize_t fail_cmd_code_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct ndtest_dimm *dimm = dev_get_drvdata(dev); + + return sprintf(buf, "%d\n", dimm->fail_cmd_code); +} + +static ssize_t fail_cmd_code_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t size) +{ + struct ndtest_dimm *dimm = dev_get_drvdata(dev); + unsigned long val; + ssize_t rc; + + rc = kstrtol(buf, 0, &val); + if (rc) + return rc; + + dimm->fail_cmd_code = val; + return size; +} +static DEVICE_ATTR_RW(fail_cmd_code); + +static struct attribute *dimm_attributes[] = { + &dev_attr_handle.attr, + &dev_attr_fail_cmd.attr, + &dev_attr_fail_cmd_code.attr, + NULL, +}; + +static struct attribute_group dimm_attribute_group = { + .attrs = dimm_attributes, +}; + +static const struct attribute_group *dimm_attribute_groups[] = { + &dimm_attribute_group, + NULL, +}; + +static ssize_t phys_id_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct nvdimm *nvdimm = to_nvdimm(dev); + struct ndtest_dimm *dimm = nvdimm_provider_data(nvdimm); + + return sprintf(buf, "%#x\n", dimm->physical_id); +} +static DEVICE_ATTR_RO(phys_id); + +static ssize_t vendor_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "0x1234567\n"); +} +static DEVICE_ATTR_RO(vendor); + +static ssize_t id_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct nvdimm *nvdimm = to_nvdimm(dev); + struct ndtest_dimm *dimm = nvdimm_provider_data(nvdimm); + + return sprintf(buf, "%04x-%02x-%04x-%08x", 0xabcd, + 0xa, 2016, ~(dimm->handle)); +} +static DEVICE_ATTR_RO(id); + +static ssize_t nvdimm_handle_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct nvdimm *nvdimm = to_nvdimm(dev); + struct ndtest_dimm *dimm = nvdimm_provider_data(nvdimm); + + return sprintf(buf, "%#x\n", dimm->handle); +} + +static struct device_attribute dev_attr_nvdimm_show_handle = { + .attr = { .name = "handle", .mode = 0444 }, + .show = nvdimm_handle_show, +}; + +static ssize_t subsystem_vendor_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "0x%04x\n", 0); +} +static DEVICE_ATTR_RO(subsystem_vendor); + +static ssize_t dirty_shutdown_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%d\n", 42); +} +static DEVICE_ATTR_RO(dirty_shutdown); + +static ssize_t formats_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct nvdimm *nvdimm = to_nvdimm(dev); + struct ndtest_dimm *dimm = nvdimm_provider_data(nvdimm); + + return sprintf(buf, "%d\n", dimm->num_formats); +} +static DEVICE_ATTR_RO(formats); + +static ssize_t format_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct nvdimm *nvdimm = to_nvdimm(dev); + struct ndtest_dimm *dimm = nvdimm_provider_data(nvdimm); + + if (dimm->num_formats > 1) + return sprintf(buf, "0x201\n"); + + return sprintf(buf, "0x101\n"); +} +static DEVICE_ATTR_RO(format); + +static ssize_t format1_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + return sprintf(buf, "0x301\n"); +} +static DEVICE_ATTR_RO(format1); + +static umode_t ndtest_nvdimm_attr_visible(struct kobject *kobj, + struct attribute *a, int n) +{ + struct device *dev = container_of(kobj, struct device, kobj); + struct nvdimm *nvdimm = to_nvdimm(dev); + struct ndtest_dimm *dimm = nvdimm_provider_data(nvdimm); + + if (a == &dev_attr_format1.attr && dimm->num_formats <= 1) + return 0; + + return a->mode; +} + +static struct attribute *ndtest_nvdimm_attributes[] = { + &dev_attr_nvdimm_show_handle.attr, + &dev_attr_vendor.attr, + &dev_attr_id.attr, + &dev_attr_phys_id.attr, + &dev_attr_subsystem_vendor.attr, + &dev_attr_dirty_shutdown.attr, + &dev_attr_formats.attr, + &dev_attr_format.attr, + &dev_attr_format1.attr, + NULL, +}; + +static const struct attribute_group ndtest_nvdimm_attribute_group = { + .name = "papr", + .attrs = ndtest_nvdimm_attributes, + .is_visible = ndtest_nvdimm_attr_visible, +}; + +static const struct attribute_group *ndtest_nvdimm_attribute_groups[] = { + &ndtest_nvdimm_attribute_group, + NULL, +}; + static int ndtest_dimm_register(struct ndtest_priv *priv, struct ndtest_dimm *dimm, int id) { @@ -230,7 +427,8 @@ static int ndtest_dimm_register(struct ndtest_priv *priv, set_bit(NDD_LABELING, &dimm_flags); } - dimm->nvdimm = nvdimm_create(priv->bus, dimm, NULL, dimm_flags, + dimm->nvdimm = nvdimm_create(priv->bus, dimm, + ndtest_nvdimm_attribute_groups, dimm_flags, NDTEST_SCM_DIMM_CMD_MASK, 0, NULL); if (!dimm->nvdimm) { dev_err(dev, "Error creating DIMM object for %pOF\n", priv->dn); @@ -239,7 +437,7 @@ static int ndtest_dimm_register(struct ndtest_priv *priv, dimm->dev = device_create_with_groups(ndtest_dimm_class, &priv->pdev.dev, - 0, dimm, NULL, + 0, dimm, dimm_attribute_groups, "test_dimm%d", id); if (!dimm->dev) { pr_err("Could not create dimm device attributes\n"); From 6fde2d4c8b25cec9589a4a58fd524b9d4e40c4b6 Mon Sep 17 00:00:00 2001 From: Santosh Sivaraj Date: Tue, 22 Dec 2020 09:52:38 +0530 Subject: [PATCH 079/284] ndtest: Add regions and mappings to the test buses The bus config array is used to hold the regions and the respective mappings. This config based interface enables to change the dimm/region/namespace layouts easily. Signed-off-by: Santosh Sivaraj Link: https://lore.kernel.org/r/20201222042240.2983755-6-santosh@fossix.org Signed-off-by: Dan Williams --- tools/testing/nvdimm/test/ndtest.c | 352 +++++++++++++++++++++++++++++ tools/testing/nvdimm/test/ndtest.h | 26 +++ 2 files changed, 378 insertions(+) diff --git a/tools/testing/nvdimm/test/ndtest.c b/tools/testing/nvdimm/test/ndtest.c index f7682e1d3efe..821296b59bdc 100644 --- a/tools/testing/nvdimm/test/ndtest.c +++ b/tools/testing/nvdimm/test/ndtest.c @@ -23,6 +23,7 @@ enum { LABEL_SIZE = SZ_128K, NUM_INSTANCES = 2, NUM_DCR = 4, + NDTEST_MAX_MAPPING = 6, }; #define NDTEST_SCM_DIMM_CMD_MASK \ @@ -88,18 +89,161 @@ static struct ndtest_dimm dimm_group2[] = { }, }; +static struct ndtest_mapping region0_mapping[] = { + { + .dimm = 0, + .position = 0, + .start = 0, + .size = SZ_16M, + }, + { + .dimm = 1, + .position = 1, + .start = 0, + .size = SZ_16M, + } +}; + +static struct ndtest_mapping region1_mapping[] = { + { + .dimm = 0, + .position = 0, + .start = SZ_16M, + .size = SZ_16M, + }, + { + .dimm = 1, + .position = 1, + .start = SZ_16M, + .size = SZ_16M, + }, + { + .dimm = 2, + .position = 2, + .start = SZ_16M, + .size = SZ_16M, + }, + { + .dimm = 3, + .position = 3, + .start = SZ_16M, + .size = SZ_16M, + }, +}; + +static struct ndtest_mapping region2_mapping[] = { + { + .dimm = 0, + .position = 0, + .start = 0, + .size = DIMM_SIZE, + }, +}; + +static struct ndtest_mapping region3_mapping[] = { + { + .dimm = 1, + .start = 0, + .size = DIMM_SIZE, + } +}; + +static struct ndtest_mapping region4_mapping[] = { + { + .dimm = 2, + .start = 0, + .size = DIMM_SIZE, + } +}; + +static struct ndtest_mapping region5_mapping[] = { + { + .dimm = 3, + .start = 0, + .size = DIMM_SIZE, + } +}; + +static struct ndtest_region bus0_regions[] = { + { + .type = ND_DEVICE_NAMESPACE_PMEM, + .num_mappings = ARRAY_SIZE(region0_mapping), + .mapping = region0_mapping, + .size = DIMM_SIZE, + .range_index = 1, + }, + { + .type = ND_DEVICE_NAMESPACE_PMEM, + .num_mappings = ARRAY_SIZE(region1_mapping), + .mapping = region1_mapping, + .size = DIMM_SIZE * 2, + .range_index = 2, + }, + { + .type = ND_DEVICE_NAMESPACE_BLK, + .num_mappings = ARRAY_SIZE(region2_mapping), + .mapping = region2_mapping, + .size = DIMM_SIZE, + .range_index = 3, + }, + { + .type = ND_DEVICE_NAMESPACE_BLK, + .num_mappings = ARRAY_SIZE(region3_mapping), + .mapping = region3_mapping, + .size = DIMM_SIZE, + .range_index = 4, + }, + { + .type = ND_DEVICE_NAMESPACE_BLK, + .num_mappings = ARRAY_SIZE(region4_mapping), + .mapping = region4_mapping, + .size = DIMM_SIZE, + .range_index = 5, + }, + { + .type = ND_DEVICE_NAMESPACE_BLK, + .num_mappings = ARRAY_SIZE(region5_mapping), + .mapping = region5_mapping, + .size = DIMM_SIZE, + .range_index = 6, + }, +}; + +static struct ndtest_mapping region6_mapping[] = { + { + .dimm = 0, + .position = 0, + .start = 0, + .size = DIMM_SIZE, + }, +}; + +static struct ndtest_region bus1_regions[] = { + { + .type = ND_DEVICE_NAMESPACE_IO, + .num_mappings = ARRAY_SIZE(region6_mapping), + .mapping = region6_mapping, + .size = DIMM_SIZE, + .range_index = 1, + }, +}; + static struct ndtest_config bus_configs[NUM_INSTANCES] = { /* bus 1 */ { .dimm_start = 0, .dimm_count = ARRAY_SIZE(dimm_group1), .dimms = dimm_group1, + .regions = bus0_regions, + .num_regions = ARRAY_SIZE(bus0_regions), }, /* bus 2 */ { .dimm_start = ARRAY_SIZE(dimm_group1), .dimm_count = ARRAY_SIZE(dimm_group2), .dimms = dimm_group2, + .regions = bus1_regions, + .num_regions = ARRAY_SIZE(bus1_regions), }, }; @@ -140,6 +284,95 @@ static int ndtest_ctl(struct nvdimm_bus_descriptor *nd_desc, return 0; } +static int ndtest_blk_do_io(struct nd_blk_region *ndbr, resource_size_t dpa, + void *iobuf, u64 len, int rw) +{ + struct ndtest_dimm *dimm = ndbr->blk_provider_data; + struct ndtest_blk_mmio *mmio = dimm->mmio; + struct nd_region *nd_region = &ndbr->nd_region; + unsigned int lane; + + if (!mmio) + return -ENOMEM; + + lane = nd_region_acquire_lane(nd_region); + if (rw) + memcpy(mmio->base + dpa, iobuf, len); + else { + memcpy(iobuf, mmio->base + dpa, len); + arch_invalidate_pmem(mmio->base + dpa, len); + } + + nd_region_release_lane(nd_region, lane); + + return 0; +} + +static int ndtest_blk_region_enable(struct nvdimm_bus *nvdimm_bus, + struct device *dev) +{ + struct nd_blk_region *ndbr = to_nd_blk_region(dev); + struct nvdimm *nvdimm; + struct ndtest_dimm *dimm; + struct ndtest_blk_mmio *mmio; + + nvdimm = nd_blk_region_to_dimm(ndbr); + dimm = nvdimm_provider_data(nvdimm); + + nd_blk_region_set_provider_data(ndbr, dimm); + dimm->blk_region = to_nd_region(dev); + + mmio = devm_kzalloc(dev, sizeof(struct ndtest_blk_mmio), GFP_KERNEL); + if (!mmio) + return -ENOMEM; + + mmio->base = (void __iomem *) devm_nvdimm_memremap( + dev, dimm->address, 12, nd_blk_memremap_flags(ndbr)); + if (!mmio->base) { + dev_err(dev, "%s failed to map blk dimm\n", nvdimm_name(nvdimm)); + return -ENOMEM; + } + mmio->size = dimm->size; + mmio->base_offset = 0; + + dimm->mmio = mmio; + + return 0; +} + +static struct nfit_test_resource *ndtest_resource_lookup(resource_size_t addr) +{ + int i; + + for (i = 0; i < NUM_INSTANCES; i++) { + struct nfit_test_resource *n, *nfit_res = NULL; + struct ndtest_priv *t = instances[i]; + + if (!t) + continue; + spin_lock(&ndtest_lock); + list_for_each_entry(n, &t->resources, list) { + if (addr >= n->res.start && (addr < n->res.start + + resource_size(&n->res))) { + nfit_res = n; + break; + } else if (addr >= (unsigned long) n->buf + && (addr < (unsigned long) n->buf + + resource_size(&n->res))) { + nfit_res = n; + break; + } + } + spin_unlock(&ndtest_lock); + if (nfit_res) + return nfit_res; + } + + pr_warn("Failed to get resource\n"); + + return NULL; +} + static void ndtest_release_resource(void *data) { struct nfit_test_resource *res = data; @@ -207,6 +440,119 @@ buf_err: return NULL; } +static ssize_t range_index_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct nd_region *nd_region = to_nd_region(dev); + struct ndtest_region *region = nd_region_provider_data(nd_region); + + return sprintf(buf, "%d\n", region->range_index); +} +static DEVICE_ATTR_RO(range_index); + +static struct attribute *ndtest_region_attributes[] = { + &dev_attr_range_index.attr, + NULL, +}; + +static const struct attribute_group ndtest_region_attribute_group = { + .name = "papr", + .attrs = ndtest_region_attributes, +}; + +static const struct attribute_group *ndtest_region_attribute_groups[] = { + &ndtest_region_attribute_group, + NULL, +}; + +static int ndtest_create_region(struct ndtest_priv *p, + struct ndtest_region *region) +{ + struct nd_mapping_desc mappings[NDTEST_MAX_MAPPING]; + struct nd_blk_region_desc ndbr_desc; + struct nd_interleave_set *nd_set; + struct nd_region_desc *ndr_desc; + struct resource res; + int i, ndimm = region->mapping[0].dimm; + u64 uuid[2]; + + memset(&res, 0, sizeof(res)); + memset(&mappings, 0, sizeof(mappings)); + memset(&ndbr_desc, 0, sizeof(ndbr_desc)); + ndr_desc = &ndbr_desc.ndr_desc; + + if (!ndtest_alloc_resource(p, region->size, &res.start)) + return -ENOMEM; + + res.end = res.start + region->size - 1; + ndr_desc->mapping = mappings; + ndr_desc->res = &res; + ndr_desc->provider_data = region; + ndr_desc->attr_groups = ndtest_region_attribute_groups; + + if (uuid_parse(p->config->dimms[ndimm].uuid_str, (uuid_t *)uuid)) { + pr_err("failed to parse UUID\n"); + return -ENXIO; + } + + nd_set = devm_kzalloc(&p->pdev.dev, sizeof(*nd_set), GFP_KERNEL); + if (!nd_set) + return -ENOMEM; + + nd_set->cookie1 = cpu_to_le64(uuid[0]); + nd_set->cookie2 = cpu_to_le64(uuid[1]); + nd_set->altcookie = nd_set->cookie1; + ndr_desc->nd_set = nd_set; + + if (region->type == ND_DEVICE_NAMESPACE_BLK) { + mappings[0].start = 0; + mappings[0].size = DIMM_SIZE; + mappings[0].nvdimm = p->config->dimms[ndimm].nvdimm; + + ndr_desc->mapping = &mappings[0]; + ndr_desc->num_mappings = 1; + ndr_desc->num_lanes = 1; + ndbr_desc.enable = ndtest_blk_region_enable; + ndbr_desc.do_io = ndtest_blk_do_io; + region->region = nvdimm_blk_region_create(p->bus, ndr_desc); + + goto done; + } + + for (i = 0; i < region->num_mappings; i++) { + ndimm = region->mapping[i].dimm; + mappings[i].start = region->mapping[i].start; + mappings[i].size = region->mapping[i].size; + mappings[i].position = region->mapping[i].position; + mappings[i].nvdimm = p->config->dimms[ndimm].nvdimm; + } + + ndr_desc->num_mappings = region->num_mappings; + region->region = nvdimm_pmem_region_create(p->bus, ndr_desc); + +done: + if (!region->region) { + dev_err(&p->pdev.dev, "Error registering region %pR\n", + ndr_desc->res); + return -ENXIO; + } + + return 0; +} + +static int ndtest_init_regions(struct ndtest_priv *p) +{ + int i, ret = 0; + + for (i = 0; i < p->config->num_regions; i++) { + ret = ndtest_create_region(p, &p->config->regions[i]); + if (ret) + return ret; + } + + return 0; +} + static void put_dimms(void *data) { struct ndtest_priv *p = data; @@ -552,6 +898,10 @@ static int ndtest_probe(struct platform_device *pdev) if (rc) goto err; + rc = ndtest_init_regions(p); + if (rc) + goto err; + rc = devm_add_action_or_reset(&pdev->dev, put_dimms, p); if (rc) goto err; @@ -617,6 +967,8 @@ static __init int ndtest_init(void) dax_pmem_compat_test(); #endif + nfit_test_setup(ndtest_resource_lookup, NULL); + ndtest_dimm_class = class_create(THIS_MODULE, "nfit_test_dimm"); if (IS_ERR(ndtest_dimm_class)) { rc = PTR_ERR(ndtest_dimm_class); diff --git a/tools/testing/nvdimm/test/ndtest.h b/tools/testing/nvdimm/test/ndtest.h index e607d72ffed1..8f27ad6f7319 100644 --- a/tools/testing/nvdimm/test/ndtest.h +++ b/tools/testing/nvdimm/test/ndtest.h @@ -20,6 +20,15 @@ struct ndtest_priv { dma_addr_t *dimm_dma; }; +struct ndtest_blk_mmio { + void __iomem *base; + u64 size; + u64 base_offset; + u32 line_size; + u32 num_lines; + u32 table_size; +}; + struct ndtest_dimm { struct device *dev; struct nvdimm *nvdimm; @@ -42,8 +51,25 @@ struct ndtest_dimm { u8 no_alias; }; +struct ndtest_mapping { + u64 start; + u64 size; + u8 position; + u8 dimm; +}; + +struct ndtest_region { + struct nd_region *region; + struct ndtest_mapping *mapping; + u64 size; + u8 type; + u8 num_mappings; + u8 range_index; +}; + struct ndtest_config { struct ndtest_dimm *dimms; + struct ndtest_region *regions; unsigned int dimm_count; unsigned int dimm_start; u8 num_regions; From 14ccef10e53e4c303570d2ee2d49e45be1118e99 Mon Sep 17 00:00:00 2001 From: Santosh Sivaraj Date: Tue, 22 Dec 2020 09:52:39 +0530 Subject: [PATCH 080/284] ndtest: Add nvdimm control functions Add functions to support ND_CMD_GET_CONFIG_SIZE, ND_CMD_SET_CONFIG_DATA and ND_CMD_GET_CONFIG_DATA. Signed-off-by: Santosh Sivaraj Link: https://lore.kernel.org/r/20201222042240.2983755-7-santosh@fossix.org Signed-off-by: Dan Williams --- tools/testing/nvdimm/test/ndtest.c | 51 ++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/tools/testing/nvdimm/test/ndtest.c b/tools/testing/nvdimm/test/ndtest.c index 821296b59bdc..dc1e3636616a 100644 --- a/tools/testing/nvdimm/test/ndtest.c +++ b/tools/testing/nvdimm/test/ndtest.c @@ -254,6 +254,45 @@ static inline struct ndtest_priv *to_ndtest_priv(struct device *dev) return container_of(pdev, struct ndtest_priv, pdev); } +static int ndtest_config_get(struct ndtest_dimm *p, unsigned int buf_len, + struct nd_cmd_get_config_data_hdr *hdr) +{ + unsigned int len; + + if ((hdr->in_offset + hdr->in_length) > LABEL_SIZE) + return -EINVAL; + + hdr->status = 0; + len = min(hdr->in_length, LABEL_SIZE - hdr->in_offset); + memcpy(hdr->out_buf, p->label_area + hdr->in_offset, len); + + return buf_len - len; +} + +static int ndtest_config_set(struct ndtest_dimm *p, unsigned int buf_len, + struct nd_cmd_set_config_hdr *hdr) +{ + unsigned int len; + + if ((hdr->in_offset + hdr->in_length) > LABEL_SIZE) + return -EINVAL; + + len = min(hdr->in_length, LABEL_SIZE - hdr->in_offset); + memcpy(p->label_area + hdr->in_offset, hdr->in_buf, len); + + return buf_len - len; +} + +static int ndtest_get_config_size(struct ndtest_dimm *dimm, unsigned int buf_len, + struct nd_cmd_get_config_size *size) +{ + size->status = 0; + size->max_xfer = 8; + size->config_size = dimm->config_size; + + return 0; +} + static int ndtest_ctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm, unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc) @@ -275,12 +314,24 @@ static int ndtest_ctl(struct nvdimm_bus_descriptor *nd_desc, switch (cmd) { case ND_CMD_GET_CONFIG_SIZE: + *cmd_rc = ndtest_get_config_size(dimm, buf_len, buf); + break; case ND_CMD_GET_CONFIG_DATA: + *cmd_rc = ndtest_config_get(dimm, buf_len, buf); + break; case ND_CMD_SET_CONFIG_DATA: + *cmd_rc = ndtest_config_set(dimm, buf_len, buf); + break; default: return -EINVAL; } + /* Failures for a DIMM can be injected using fail_cmd and + * fail_cmd_code, see the device attributes below + */ + if ((1 << cmd) & dimm->fail_cmd) + return dimm->fail_cmd_code ? dimm->fail_cmd_code : -EIO; + return 0; } From 50f558a5fe16b385cf1427b2a96149f4f68952d9 Mon Sep 17 00:00:00 2001 From: Santosh Sivaraj Date: Tue, 22 Dec 2020 09:52:40 +0530 Subject: [PATCH 081/284] ndtest: Add papr health related flags sysfs attibutes to show health related flags are added. Signed-off-by: Santosh Sivaraj Link: https://lore.kernel.org/r/20201222042240.2983755-8-santosh@fossix.org Signed-off-by: Dan Williams --- tools/testing/nvdimm/test/ndtest.c | 41 ++++++++++++++++++++++++++++++ tools/testing/nvdimm/test/ndtest.h | 31 ++++++++++++++++++++++ 2 files changed, 72 insertions(+) diff --git a/tools/testing/nvdimm/test/ndtest.c b/tools/testing/nvdimm/test/ndtest.c index dc1e3636616a..6862915f1fb0 100644 --- a/tools/testing/nvdimm/test/ndtest.c +++ b/tools/testing/nvdimm/test/ndtest.c @@ -86,6 +86,9 @@ static struct ndtest_dimm dimm_group2[] = { .uuid_str = "ca0817e2-b618-11ea-9db3-507b9ddc0f72", .physical_id = 0, .num_formats = 1, + .flags = PAPR_PMEM_UNARMED | PAPR_PMEM_EMPTY | + PAPR_PMEM_SAVE_FAILED | PAPR_PMEM_SHUTDOWN_DIRTY | + PAPR_PMEM_HEALTH_FATAL, }, }; @@ -789,6 +792,40 @@ static umode_t ndtest_nvdimm_attr_visible(struct kobject *kobj, return a->mode; } +static ssize_t flags_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct nvdimm *nvdimm = to_nvdimm(dev); + struct ndtest_dimm *dimm = nvdimm_provider_data(nvdimm); + struct seq_buf s; + u64 flags; + + flags = dimm->flags; + + seq_buf_init(&s, buf, PAGE_SIZE); + if (flags & PAPR_PMEM_UNARMED_MASK) + seq_buf_printf(&s, "not_armed "); + + if (flags & PAPR_PMEM_BAD_SHUTDOWN_MASK) + seq_buf_printf(&s, "flush_fail "); + + if (flags & PAPR_PMEM_BAD_RESTORE_MASK) + seq_buf_printf(&s, "restore_fail "); + + if (flags & PAPR_PMEM_SAVE_MASK) + seq_buf_printf(&s, "save_fail "); + + if (flags & PAPR_PMEM_SMART_EVENT_MASK) + seq_buf_printf(&s, "smart_notify "); + + + if (seq_buf_used(&s)) + seq_buf_printf(&s, "\n"); + + return seq_buf_used(&s); +} +static DEVICE_ATTR_RO(flags); + static struct attribute *ndtest_nvdimm_attributes[] = { &dev_attr_nvdimm_show_handle.attr, &dev_attr_vendor.attr, @@ -799,6 +836,7 @@ static struct attribute *ndtest_nvdimm_attributes[] = { &dev_attr_formats.attr, &dev_attr_format.attr, &dev_attr_format1.attr, + &dev_attr_flags.attr, NULL, }; @@ -824,6 +862,9 @@ static int ndtest_dimm_register(struct ndtest_priv *priv, set_bit(NDD_LABELING, &dimm_flags); } + if (dimm->flags & PAPR_PMEM_UNARMED_MASK) + set_bit(NDD_UNARMED, &dimm_flags); + dimm->nvdimm = nvdimm_create(priv->bus, dimm, ndtest_nvdimm_attribute_groups, dimm_flags, NDTEST_SCM_DIMM_CMD_MASK, 0, NULL); diff --git a/tools/testing/nvdimm/test/ndtest.h b/tools/testing/nvdimm/test/ndtest.h index 8f27ad6f7319..2c54c9cbb90c 100644 --- a/tools/testing/nvdimm/test/ndtest.h +++ b/tools/testing/nvdimm/test/ndtest.h @@ -5,6 +5,37 @@ #include #include +/* SCM device is unable to persist memory contents */ +#define PAPR_PMEM_UNARMED (1ULL << (63 - 0)) +/* SCM device failed to persist memory contents */ +#define PAPR_PMEM_SHUTDOWN_DIRTY (1ULL << (63 - 1)) +/* SCM device contents are not persisted from previous IPL */ +#define PAPR_PMEM_EMPTY (1ULL << (63 - 3)) +#define PAPR_PMEM_HEALTH_CRITICAL (1ULL << (63 - 4)) +/* SCM device will be garded off next IPL due to failure */ +#define PAPR_PMEM_HEALTH_FATAL (1ULL << (63 - 5)) +/* SCM contents cannot persist due to current platform health status */ +#define PAPR_PMEM_HEALTH_UNHEALTHY (1ULL << (63 - 6)) + +/* Bits status indicators for health bitmap indicating unarmed dimm */ +#define PAPR_PMEM_UNARMED_MASK (PAPR_PMEM_UNARMED | \ + PAPR_PMEM_HEALTH_UNHEALTHY) + +#define PAPR_PMEM_SAVE_FAILED (1ULL << (63 - 10)) + +/* Bits status indicators for health bitmap indicating unflushed dimm */ +#define PAPR_PMEM_BAD_SHUTDOWN_MASK (PAPR_PMEM_SHUTDOWN_DIRTY) + +/* Bits status indicators for health bitmap indicating unrestored dimm */ +#define PAPR_PMEM_BAD_RESTORE_MASK (PAPR_PMEM_EMPTY) + +/* Bit status indicators for smart event notification */ +#define PAPR_PMEM_SMART_EVENT_MASK (PAPR_PMEM_HEALTH_CRITICAL | \ + PAPR_PMEM_HEALTH_FATAL | \ + PAPR_PMEM_HEALTH_UNHEALTHY) + +#define PAPR_PMEM_SAVE_MASK (PAPR_PMEM_SAVE_FAILED) + struct ndtest_config; struct ndtest_priv { From 9efb069de4ba748d284f6129e71de239f801053a Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 28 Jan 2021 10:22:48 +0100 Subject: [PATCH 082/284] ovl: add warning on user_ns mismatch Currently there's no way to create an overlay filesystem outside of the current user namespace. Make sure that if this assumption changes it doesn't go unnoticed. Reported-by: "Eric W. Biederman" Signed-off-by: Miklos Szeredi --- fs/overlayfs/super.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 2bd570cbe8a4..82cd6d55a5a1 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -1923,6 +1923,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) unsigned int numlower; int err; + err = -EIO; + if (WARN_ON(sb->s_user_ns != current_user_ns())) + goto out; + sb->s_d_op = &ovl_dentry_operations; err = -ENOMEM; From 554677b97257b0b69378bd74e521edb7e94769ff Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 28 Jan 2021 10:22:48 +0100 Subject: [PATCH 083/284] ovl: perform vfs_getxattr() with mounter creds The vfs_getxattr() in ovl_xattr_set() is used to check whether an xattr exist on a lower layer file that is to be removed. If the xattr does not exist, then no need to copy up the file. This call of vfs_getxattr() wasn't wrapped in credential override, and this is probably okay. But for consitency wrap this instance as well. Reported-by: "Eric W. Biederman" Signed-off-by: Miklos Szeredi --- fs/overlayfs/inode.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c index d739e14c6814..cf41bcb664bc 100644 --- a/fs/overlayfs/inode.c +++ b/fs/overlayfs/inode.c @@ -352,7 +352,9 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name, goto out; if (!value && !upperdentry) { + old_cred = ovl_override_creds(dentry->d_sb); err = vfs_getxattr(realdentry, name, NULL, 0); + revert_creds(old_cred); if (err < 0) goto out_drop_write; } From f2b00be488730522d0fb7a8a5de663febdcefe0a Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 28 Jan 2021 10:22:48 +0100 Subject: [PATCH 084/284] cap: fix conversions on getxattr If a capability is stored on disk in v2 format cap_inode_getsecurity() will currently return in v2 format unconditionally. This is wrong: v2 cap should be equivalent to a v3 cap with zero rootid, and so the same conversions performed on it. If the rootid cannot be mapped, v3 is returned unconverted. Fix this so that both v2 and v3 return -EOVERFLOW if the rootid (or the owner of the fs user namespace in case of v2) cannot be mapped into the current user namespace. Signed-off-by: Miklos Szeredi Acked-by: "Eric W. Biederman" --- security/commoncap.c | 67 ++++++++++++++++++++++++++++---------------- 1 file changed, 43 insertions(+), 24 deletions(-) diff --git a/security/commoncap.c b/security/commoncap.c index bacc1111d871..26c1cb725dcb 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -371,10 +371,11 @@ int cap_inode_getsecurity(struct inode *inode, const char *name, void **buffer, { int size, ret; kuid_t kroot; + u32 nsmagic, magic; uid_t root, mappedroot; char *tmpbuf = NULL; struct vfs_cap_data *cap; - struct vfs_ns_cap_data *nscap; + struct vfs_ns_cap_data *nscap = NULL; struct dentry *dentry; struct user_namespace *fs_ns; @@ -396,46 +397,61 @@ int cap_inode_getsecurity(struct inode *inode, const char *name, void **buffer, fs_ns = inode->i_sb->s_user_ns; cap = (struct vfs_cap_data *) tmpbuf; if (is_v2header((size_t) ret, cap)) { - /* If this is sizeof(vfs_cap_data) then we're ok with the - * on-disk value, so return that. */ - if (alloc) - *buffer = tmpbuf; - else - kfree(tmpbuf); - return ret; - } else if (!is_v3header((size_t) ret, cap)) { - kfree(tmpbuf); - return -EINVAL; + root = 0; + } else if (is_v3header((size_t) ret, cap)) { + nscap = (struct vfs_ns_cap_data *) tmpbuf; + root = le32_to_cpu(nscap->rootid); + } else { + size = -EINVAL; + goto out_free; } - nscap = (struct vfs_ns_cap_data *) tmpbuf; - root = le32_to_cpu(nscap->rootid); kroot = make_kuid(fs_ns, root); /* If the root kuid maps to a valid uid in current ns, then return * this as a nscap. */ mappedroot = from_kuid(current_user_ns(), kroot); if (mappedroot != (uid_t)-1 && mappedroot != (uid_t)0) { + size = sizeof(struct vfs_ns_cap_data); if (alloc) { - *buffer = tmpbuf; + if (!nscap) { + /* v2 -> v3 conversion */ + nscap = kzalloc(size, GFP_ATOMIC); + if (!nscap) { + size = -ENOMEM; + goto out_free; + } + nsmagic = VFS_CAP_REVISION_3; + magic = le32_to_cpu(cap->magic_etc); + if (magic & VFS_CAP_FLAGS_EFFECTIVE) + nsmagic |= VFS_CAP_FLAGS_EFFECTIVE; + memcpy(&nscap->data, &cap->data, sizeof(__le32) * 2 * VFS_CAP_U32); + nscap->magic_etc = cpu_to_le32(nsmagic); + } else { + /* use allocated v3 buffer */ + tmpbuf = NULL; + } nscap->rootid = cpu_to_le32(mappedroot); - } else - kfree(tmpbuf); - return size; + *buffer = nscap; + } + goto out_free; } if (!rootid_owns_currentns(kroot)) { - kfree(tmpbuf); - return -EOPNOTSUPP; + size = -EOVERFLOW; + goto out_free; } /* This comes from a parent namespace. Return as a v2 capability */ size = sizeof(struct vfs_cap_data); if (alloc) { - *buffer = kmalloc(size, GFP_ATOMIC); - if (*buffer) { - struct vfs_cap_data *cap = *buffer; - __le32 nsmagic, magic; + if (nscap) { + /* v3 -> v2 conversion */ + cap = kzalloc(size, GFP_ATOMIC); + if (!cap) { + size = -ENOMEM; + goto out_free; + } magic = VFS_CAP_REVISION_2; nsmagic = le32_to_cpu(nscap->magic_etc); if (nsmagic & VFS_CAP_FLAGS_EFFECTIVE) @@ -443,9 +459,12 @@ int cap_inode_getsecurity(struct inode *inode, const char *name, void **buffer, memcpy(&cap->data, &nscap->data, sizeof(__le32) * 2 * VFS_CAP_U32); cap->magic_etc = cpu_to_le32(magic); } else { - size = -ENOMEM; + /* use unconverted v2 */ + tmpbuf = NULL; } + *buffer = cap; } +out_free: kfree(tmpbuf); return size; } From b854cc659dcb80f172cb35dbedc15d39d49c383f Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Tue, 5 Jan 2021 08:36:11 +0800 Subject: [PATCH 085/284] ovl: avoid deadlock on directory ioctl The function ovl_dir_real_file() currently uses the inode lock to serialize writes to the od->upperfile field. However, this function will get called by ovl_ioctl_set_flags(), which utilizes the inode lock too. In this case ovl_dir_real_file() will try to claim a lock that is owned by a function in its call stack, which won't get released before ovl_dir_real_file() returns. Fix by replacing the open coded compare and exchange by an explicit atomic op. Fixes: 61536bed2149 ("ovl: support [S|G]ETFLAGS and FS[S|G]ETXATTR ioctls for directories") Cc: stable@vger.kernel.org # v5.10 Reported-by: Icenowy Zheng Tested-by: Icenowy Zheng Signed-off-by: Miklos Szeredi --- fs/overlayfs/readdir.c | 23 +++++++---------------- 1 file changed, 7 insertions(+), 16 deletions(-) diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c index 01620ebae1bd..60d751f28fea 100644 --- a/fs/overlayfs/readdir.c +++ b/fs/overlayfs/readdir.c @@ -865,7 +865,7 @@ struct file *ovl_dir_real_file(const struct file *file, bool want_upper) struct ovl_dir_file *od = file->private_data; struct dentry *dentry = file->f_path.dentry; - struct file *realfile = od->realfile; + struct file *old, *realfile = od->realfile; if (!OVL_TYPE_UPPER(ovl_path_type(dentry))) return want_upper ? NULL : realfile; @@ -874,29 +874,20 @@ struct file *ovl_dir_real_file(const struct file *file, bool want_upper) * Need to check if we started out being a lower dir, but got copied up */ if (!od->is_upper) { - struct inode *inode = file_inode(file); - realfile = READ_ONCE(od->upperfile); if (!realfile) { struct path upperpath; ovl_path_upper(dentry, &upperpath); realfile = ovl_dir_open_realfile(file, &upperpath); + if (IS_ERR(realfile)) + return realfile; - inode_lock(inode); - if (!od->upperfile) { - if (IS_ERR(realfile)) { - inode_unlock(inode); - return realfile; - } - smp_store_release(&od->upperfile, realfile); - } else { - /* somebody has beaten us to it */ - if (!IS_ERR(realfile)) - fput(realfile); - realfile = od->upperfile; + old = cmpxchg_release(&od->upperfile, NULL, realfile); + if (old) { + fput(realfile); + realfile = old; } - inode_unlock(inode); } } From e04527fefba6e4e66492f122cf8cc6314f3cf3bf Mon Sep 17 00:00:00 2001 From: Liangyan Date: Tue, 22 Dec 2020 11:06:26 +0800 Subject: [PATCH 086/284] ovl: fix dentry leak in ovl_get_redirect MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We need to lock d_parent->d_lock before dget_dlock, or this may have d_lockref updated parallelly like calltrace below which will cause dentry->d_lockref leak and risk a crash. CPU 0 CPU 1 ovl_set_redirect lookup_fast ovl_get_redirect __d_lookup dget_dlock //no lock protection here spin_lock(&dentry->d_lock) dentry->d_lockref.count++ dentry->d_lockref.count++ [   49.799059] PGD 800000061fed7067 P4D 800000061fed7067 PUD 61fec5067 PMD 0 [   49.799689] Oops: 0002 [#1] SMP PTI [   49.800019] CPU: 2 PID: 2332 Comm: node Not tainted 4.19.24-7.20.al7.x86_64 #1 [   49.800678] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8a46cfe 04/01/2014 [   49.801380] RIP: 0010:_raw_spin_lock+0xc/0x20 [   49.803470] RSP: 0018:ffffac6fc5417e98 EFLAGS: 00010246 [   49.803949] RAX: 0000000000000000 RBX: ffff93b8da3446c0 RCX: 0000000a00000000 [   49.804600] RDX: 0000000000000001 RSI: 000000000000000a RDI: 0000000000000088 [   49.805252] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff993cf040 [   49.805898] R10: ffff93b92292e580 R11: ffffd27f188a4b80 R12: 0000000000000000 [   49.806548] R13: 00000000ffffff9c R14: 00000000fffffffe R15: ffff93b8da3446c0 [   49.807200] FS:  00007ffbedffb700(0000) GS:ffff93b927880000(0000) knlGS:0000000000000000 [   49.807935] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [   49.808461] CR2: 0000000000000088 CR3: 00000005e3f74006 CR4: 00000000003606a0 [   49.809113] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [   49.809758] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [   49.810410] Call Trace: [   49.810653]  d_delete+0x2c/0xb0 [   49.810951]  vfs_rmdir+0xfd/0x120 [   49.811264]  do_rmdir+0x14f/0x1a0 [   49.811573]  do_syscall_64+0x5b/0x190 [   49.811917]  entry_SYSCALL_64_after_hwframe+0x44/0xa9 [   49.812385] RIP: 0033:0x7ffbf505ffd7 [   49.814404] RSP: 002b:00007ffbedffada8 EFLAGS: 00000297 ORIG_RAX: 0000000000000054 [   49.815098] RAX: ffffffffffffffda RBX: 00007ffbedffb640 RCX: 00007ffbf505ffd7 [   49.815744] RDX: 0000000004449700 RSI: 0000000000000000 RDI: 0000000006c8cd50 [   49.816394] RBP: 00007ffbedffaea0 R08: 0000000000000000 R09: 0000000000017d0b [   49.817038] R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000012 [   49.817687] R13: 00000000072823d8 R14: 00007ffbedffb700 R15: 00000000072823d8 [   49.818338] Modules linked in: pvpanic cirrusfb button qemu_fw_cfg atkbd libps2 i8042 [   49.819052] CR2: 0000000000000088 [   49.819368] ---[ end trace 4e652b8aa299aa2d ]--- [   49.819796] RIP: 0010:_raw_spin_lock+0xc/0x20 [   49.821880] RSP: 0018:ffffac6fc5417e98 EFLAGS: 00010246 [   49.822363] RAX: 0000000000000000 RBX: ffff93b8da3446c0 RCX: 0000000a00000000 [   49.823008] RDX: 0000000000000001 RSI: 000000000000000a RDI: 0000000000000088 [   49.823658] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff993cf040 [   49.825404] R10: ffff93b92292e580 R11: ffffd27f188a4b80 R12: 0000000000000000 [   49.827147] R13: 00000000ffffff9c R14: 00000000fffffffe R15: ffff93b8da3446c0 [   49.828890] FS:  00007ffbedffb700(0000) GS:ffff93b927880000(0000) knlGS:0000000000000000 [   49.830725] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [   49.832359] CR2: 0000000000000088 CR3: 00000005e3f74006 CR4: 00000000003606a0 [   49.834085] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [   49.835792] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Cc: Fixes: a6c606551141 ("ovl: redirect on rename-dir") Signed-off-by: Liangyan Reviewed-by: Joseph Qi Suggested-by: Al Viro Signed-off-by: Miklos Szeredi --- fs/overlayfs/dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c index 28a075b5f5b2..d1efa3a5a503 100644 --- a/fs/overlayfs/dir.c +++ b/fs/overlayfs/dir.c @@ -992,8 +992,8 @@ static char *ovl_get_redirect(struct dentry *dentry, bool abs_redirect) buflen -= thislen; memcpy(&buf[buflen], name, thislen); - tmp = dget_dlock(d->d_parent); spin_unlock(&d->d_lock); + tmp = dget_parent(d); dput(d); d = tmp; From 03fedf93593c82538b18476d8c4f0e8f8435ea70 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Sat, 19 Dec 2020 12:16:08 +0200 Subject: [PATCH 087/284] ovl: skip getxattr of security labels When inode has no listxattr op of its own (e.g. squashfs) vfs_listxattr calls the LSM inode_listsecurity hooks to list the xattrs that LSMs will intercept in inode_getxattr hooks. When selinux LSM is installed but not initialized, it will list the security.selinux xattr in inode_listsecurity, but will not intercept it in inode_getxattr. This results in -ENODATA for a getxattr call for an xattr returned by listxattr. This situation was manifested as overlayfs failure to copy up lower files from squashfs when selinux is built-in but not initialized, because ovl_copy_xattr() iterates the lower inode xattrs by vfs_listxattr() and vfs_getxattr(). ovl_copy_xattr() skips copy up of security labels that are indentified by inode_copy_up_xattr LSM hooks, but it does that after vfs_getxattr(). Since we are not going to copy them, skip vfs_getxattr() of the security labels. Reported-by: Michael Labriola Tested-by: Michael Labriola Link: https://lore.kernel.org/linux-unionfs/2nv9d47zt7.fsf@aldarion.sourceruckus.org/ Signed-off-by: Amir Goldstein Signed-off-by: Miklos Szeredi --- fs/overlayfs/copy_up.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index e5b616c93e11..0fed532efa68 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -84,6 +84,14 @@ int ovl_copy_xattr(struct super_block *sb, struct dentry *old, if (ovl_is_private_xattr(sb, name)) continue; + + error = security_inode_copy_up_xattr(name); + if (error < 0 && error != -EOPNOTSUPP) + break; + if (error == 1) { + error = 0; + continue; /* Discard */ + } retry: size = vfs_getxattr(old, name, value, value_size); if (size == -ERANGE) @@ -107,13 +115,6 @@ retry: goto retry; } - error = security_inode_copy_up_xattr(name); - if (error < 0 && error != -EOPNOTSUPP) - break; - if (error == 1) { - error = 0; - continue; /* Discard */ - } error = vfs_setxattr(new, name, value, size, 0); if (error) { if (error != -EOPNOTSUPP || ovl_must_copy_xattr(name)) From 335d3fc57941e5c6164c69d439aec1cb7a800876 Mon Sep 17 00:00:00 2001 From: Sargun Dhillon Date: Thu, 7 Jan 2021 16:10:43 -0800 Subject: [PATCH 088/284] ovl: implement volatile-specific fsync error behaviour Overlayfs's volatile option allows the user to bypass all forced sync calls to the upperdir filesystem. This comes at the cost of safety. We can never ensure that the user's data is intact, but we can make a best effort to expose whether or not the data is likely to be in a bad state. The best way to handle this in the time being is that if an overlayfs's upperdir experiences an error after a volatile mount occurs, that error will be returned on fsync, fdatasync, sync, and syncfs. This is contradictory to the traditional behaviour of VFS which fails the call once, and only raises an error if a subsequent fsync error has occurred, and been raised by the filesystem. One awkward aspect of the patch is that we have to manually set the superblock's errseq_t after the sync_fs callback as opposed to just returning an error from syncfs. This is because the call chain looks something like this: sys_syncfs -> sync_filesystem -> __sync_filesystem -> /* The return value is ignored here sb->s_op->sync_fs(sb) _sync_blockdev /* Where the VFS fetches the error to raise to userspace */ errseq_check_and_advance Because of this we call errseq_set every time the sync_fs callback occurs. Due to the nature of this seen / unseen dichotomy, if the upperdir is an inconsistent state at the initial mount time, overlayfs will refuse to mount, as overlayfs cannot get a snapshot of the upperdir's errseq that will increment on error until the user calls syncfs. Signed-off-by: Sargun Dhillon Suggested-by: Amir Goldstein Reviewed-by: Amir Goldstein Fixes: c86243b090bc ("ovl: provide a mount option "volatile"") Cc: stable@vger.kernel.org Reviewed-by: Vivek Goyal Reviewed-by: Jeff Layton Signed-off-by: Miklos Szeredi --- Documentation/filesystems/overlayfs.rst | 8 ++++++ fs/overlayfs/file.c | 5 ++-- fs/overlayfs/overlayfs.h | 1 + fs/overlayfs/ovl_entry.h | 2 ++ fs/overlayfs/readdir.c | 5 ++-- fs/overlayfs/super.c | 34 ++++++++++++++++++++----- fs/overlayfs/util.c | 27 ++++++++++++++++++++ 7 files changed, 71 insertions(+), 11 deletions(-) diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst index 587a93973929..78240e29b0bb 100644 --- a/Documentation/filesystems/overlayfs.rst +++ b/Documentation/filesystems/overlayfs.rst @@ -586,6 +586,14 @@ without significant effort. The advantage of mounting with the "volatile" option is that all forms of sync calls to the upper filesystem are omitted. +In order to avoid a giving a false sense of safety, the syncfs (and fsync) +semantics of volatile mounts are slightly different than that of the rest of +VFS. If any writeback error occurs on the upperdir's filesystem after a +volatile mount takes place, all sync functions will return an error. Once this +condition is reached, the filesystem will not recover, and every subsequent sync +call will return an error, even if the upperdir has not experience a new error +since the last sync call. + When overlay is mounted with "volatile" option, the directory "$workdir/work/incompat/volatile" is created. During next mount, overlay checks for this directory and refuses to mount if present. This is a strong diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index bd9dd38347ae..077d3ad343f6 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -398,8 +398,9 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync) const struct cred *old_cred; int ret; - if (!ovl_should_sync(OVL_FS(file_inode(file)->i_sb))) - return 0; + ret = ovl_sync_status(OVL_FS(file_inode(file)->i_sb)); + if (ret <= 0) + return ret; ret = ovl_real_fdget_meta(file, &real, !datasync); if (ret) diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index b487e48c7fd4..cb4e2d60ecf9 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -324,6 +324,7 @@ int ovl_check_metacopy_xattr(struct ovl_fs *ofs, struct dentry *dentry); bool ovl_is_metacopy_dentry(struct dentry *dentry); char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry, int padding); +int ovl_sync_status(struct ovl_fs *ofs); static inline bool ovl_is_impuredir(struct super_block *sb, struct dentry *dentry) diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h index fbd5e27ce66b..63efee554f69 100644 --- a/fs/overlayfs/ovl_entry.h +++ b/fs/overlayfs/ovl_entry.h @@ -81,6 +81,8 @@ struct ovl_fs { atomic_long_t last_ino; /* Whiteout dentry cache */ struct dentry *whiteout; + /* r/o snapshot of upperdir sb's only taken on volatile mounts */ + errseq_t errseq; }; static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs) diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c index 60d751f28fea..f404a78e6b60 100644 --- a/fs/overlayfs/readdir.c +++ b/fs/overlayfs/readdir.c @@ -900,8 +900,9 @@ static int ovl_dir_fsync(struct file *file, loff_t start, loff_t end, struct file *realfile; int err; - if (!ovl_should_sync(OVL_FS(file->f_path.dentry->d_sb))) - return 0; + err = ovl_sync_status(OVL_FS(file->f_path.dentry->d_sb)); + if (err <= 0) + return err; realfile = ovl_dir_real_file(file, true); err = PTR_ERR_OR_ZERO(realfile); diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 82cd6d55a5a1..d58b8f2bf9d0 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -264,11 +264,20 @@ static int ovl_sync_fs(struct super_block *sb, int wait) struct super_block *upper_sb; int ret; - if (!ovl_upper_mnt(ofs)) - return 0; + ret = ovl_sync_status(ofs); + /* + * We have to always set the err, because the return value isn't + * checked in syncfs, and instead indirectly return an error via + * the sb's writeback errseq, which VFS inspects after this call. + */ + if (ret < 0) { + errseq_set(&sb->s_wb_err, -EIO); + return -EIO; + } + + if (!ret) + return ret; - if (!ovl_should_sync(ofs)) - return 0; /* * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC). * All the super blocks will be iterated, including upper_sb. @@ -1993,6 +2002,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) sb->s_op = &ovl_super_operations; if (ofs->config.upperdir) { + struct super_block *upper_sb; + if (!ofs->config.workdir) { pr_err("missing 'workdir'\n"); goto out_err; @@ -2002,6 +2013,16 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) if (err) goto out_err; + upper_sb = ovl_upper_mnt(ofs)->mnt_sb; + if (!ovl_should_sync(ofs)) { + ofs->errseq = errseq_sample(&upper_sb->s_wb_err); + if (errseq_check(&upper_sb->s_wb_err, ofs->errseq)) { + err = -EIO; + pr_err("Cannot mount volatile when upperdir has an unseen error. Sync upperdir fs to clear state.\n"); + goto out_err; + } + } + err = ovl_get_workdir(sb, ofs, &upperpath); if (err) goto out_err; @@ -2009,9 +2030,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) if (!ofs->workdir) sb->s_flags |= SB_RDONLY; - sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth; - sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran; - + sb->s_stack_depth = upper_sb->s_stack_depth; + sb->s_time_gran = upper_sb->s_time_gran; } oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers); err = PTR_ERR(oe); diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c index 6569031af3cd..9826b003f1d2 100644 --- a/fs/overlayfs/util.c +++ b/fs/overlayfs/util.c @@ -962,3 +962,30 @@ err_free: kfree(buf); return ERR_PTR(res); } + +/* + * ovl_sync_status() - Check fs sync status for volatile mounts + * + * Returns 1 if this is not a volatile mount and a real sync is required. + * + * Returns 0 if syncing can be skipped because mount is volatile, and no errors + * have occurred on the upperdir since the mount. + * + * Returns -errno if it is a volatile mount, and the error that occurred since + * the last mount. If the error code changes, it'll return the latest error + * code. + */ + +int ovl_sync_status(struct ovl_fs *ofs) +{ + struct vfsmount *mnt; + + if (ovl_should_sync(ofs)) + return 1; + + mnt = ovl_upper_mnt(ofs); + if (!mnt) + return 0; + + return errseq_check(&mnt->mnt_sb->s_wb_err, ofs->errseq); +} From 530fe6bf0f9ff91e5156f0423ae8db8d106d0159 Mon Sep 17 00:00:00 2001 From: Paul Kocialkowski Date: Fri, 15 Jan 2021 18:58:31 +0100 Subject: [PATCH 089/284] soc: sunxi: mbus: Remove DE2 display engine compatibles The DE2 display engine hardware takes physical addresses that do not need PHYS_BASE subtracted. As a result, they should not be present on the mbus driver match list. Remove them. This was tested on the A83T, along with the patch allowing the DMA range map to be non-NULL and restores a working display. Fixes: b4bdc4fbf8d0 ("soc: sunxi: Deal with the MBUS DMA offsets in a central place") Signed-off-by: Paul Kocialkowski Signed-off-by: Maxime Ripard Link: https://lore.kernel.org/r/20210115175831.1184260-2-paul.kocialkowski@bootlin.com --- drivers/soc/sunxi/sunxi_mbus.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/drivers/soc/sunxi/sunxi_mbus.c b/drivers/soc/sunxi/sunxi_mbus.c index e9925c8487d7..d90e4a264b6f 100644 --- a/drivers/soc/sunxi/sunxi_mbus.c +++ b/drivers/soc/sunxi/sunxi_mbus.c @@ -23,12 +23,7 @@ static const char * const sunxi_mbus_devices[] = { "allwinner,sun7i-a20-display-engine", "allwinner,sun8i-a23-display-engine", "allwinner,sun8i-a33-display-engine", - "allwinner,sun8i-a83t-display-engine", - "allwinner,sun8i-h3-display-engine", - "allwinner,sun8i-r40-display-engine", - "allwinner,sun8i-v3s-display-engine", "allwinner,sun9i-a80-display-engine", - "allwinner,sun50i-a64-display-engine", /* * And now we have the regular devices connected to the MBUS From 053b1b287ccf734cc3b5a40b3b17a63185758c61 Mon Sep 17 00:00:00 2001 From: Dmitry Baryshkov Date: Fri, 22 Jan 2021 02:33:01 +0300 Subject: [PATCH 090/284] drm/bridge/lontium-lt9611uxc: fix waiting for EDID to become available - Call wake_up() when EDID ready event is received to wake wait_event_interruptible_timeout() - Increase waiting timeout, reading EDID can take longer than 100ms, so let's be on a safe side. Signed-off-by: Dmitry Baryshkov Fixes: 0cbbd5b1a012 ("drm: bridge: add support for lontium LT9611UXC bridge") Reviewed-by: Bjorn Andersson Reviewed-by: Andrzej Hajda Signed-off-by: Andrzej Hajda Link: https://patchwork.freedesktop.org/patch/msgid/20210121233303.1221784-2-dmitry.baryshkov@linaro.org --- drivers/gpu/drm/bridge/lontium-lt9611uxc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/bridge/lontium-lt9611uxc.c b/drivers/gpu/drm/bridge/lontium-lt9611uxc.c index 0c98d27f84ac..a59e811f1705 100644 --- a/drivers/gpu/drm/bridge/lontium-lt9611uxc.c +++ b/drivers/gpu/drm/bridge/lontium-lt9611uxc.c @@ -145,8 +145,10 @@ static irqreturn_t lt9611uxc_irq_thread_handler(int irq, void *dev_id) lt9611uxc_unlock(lt9611uxc); - if (irq_status & BIT(0)) + if (irq_status & BIT(0)) { lt9611uxc->edid_read = !!(hpd_status & BIT(0)); + wake_up_all(<9611uxc->wq); + } if (irq_status & BIT(1)) { if (lt9611uxc->connector.dev) @@ -465,7 +467,7 @@ static enum drm_connector_status lt9611uxc_bridge_detect(struct drm_bridge *brid static int lt9611uxc_wait_for_edid(struct lt9611uxc *lt9611uxc) { return wait_event_interruptible_timeout(lt9611uxc->wq, lt9611uxc->edid_read, - msecs_to_jiffies(100)); + msecs_to_jiffies(500)); } static int lt9611uxc_get_edid_block(void *data, u8 *buf, unsigned int block, size_t len) From 1bb7ab402da44e09b4bb3f31cfe24695cdb1b7df Mon Sep 17 00:00:00 2001 From: Dmitry Baryshkov Date: Fri, 22 Jan 2021 02:33:02 +0300 Subject: [PATCH 091/284] drm/bridge/lontium-lt9611uxc: fix get_edid return code Return NULL pointer from get_edid() callback rather than ERR_PTR() pointer, as DRM code does NULL checks rather than IS_ERR(). Also while we are at it, return NULL if getting EDID timed out. Signed-off-by: Dmitry Baryshkov Fixes: 0cbbd5b1a012 ("drm: bridge: add support for lontium LT9611UXC bridge") Reviewed-by: Bjorn Andersson Reviewed-by: Andrzej Hajda Signed-off-by: Andrzej Hajda Link: https://patchwork.freedesktop.org/patch/msgid/20210121233303.1221784-3-dmitry.baryshkov@linaro.org --- drivers/gpu/drm/bridge/lontium-lt9611uxc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/bridge/lontium-lt9611uxc.c b/drivers/gpu/drm/bridge/lontium-lt9611uxc.c index a59e811f1705..b708700e182d 100644 --- a/drivers/gpu/drm/bridge/lontium-lt9611uxc.c +++ b/drivers/gpu/drm/bridge/lontium-lt9611uxc.c @@ -505,7 +505,10 @@ static struct edid *lt9611uxc_bridge_get_edid(struct drm_bridge *bridge, ret = lt9611uxc_wait_for_edid(lt9611uxc); if (ret < 0) { dev_err(lt9611uxc->dev, "wait for EDID failed: %d\n", ret); - return ERR_PTR(ret); + return NULL; + } else if (ret == 0) { + dev_err(lt9611uxc->dev, "wait for EDID timeout\n"); + return NULL; } return drm_do_get_edid(connector, lt9611uxc_get_edid_block, lt9611uxc); From bc6fa8676ebbf9c5285f80d7b831663aeabb90bb Mon Sep 17 00:00:00 2001 From: Dmitry Baryshkov Date: Fri, 22 Jan 2021 02:33:03 +0300 Subject: [PATCH 092/284] drm/bridge/lontium-lt9611uxc: move HPD notification out of IRQ handler drm hotplug handling code (drm_client_dev_hotplug()) can wait on mutex, thus delaying further lt9611uxc IRQ events processing. It was observed occasionally during bootups, when drm_client_modeset_probe() was waiting for EDID ready event, which was delayed because IRQ handler was stuck trying to deliver hotplug event. Move hotplug notifications from IRQ handler to separate work to be able to process IRQ events without delays. Signed-off-by: Dmitry Baryshkov Fixes: 0cbbd5b1a012 ("drm: bridge: add support for lontium LT9611UXC bridge") Reviewed-by: Andrzej Hajda Signed-off-by: Andrzej Hajda Link: https://patchwork.freedesktop.org/patch/msgid/20210121233303.1221784-4-dmitry.baryshkov@linaro.org --- drivers/gpu/drm/bridge/lontium-lt9611uxc.c | 46 +++++++++++++++++----- 1 file changed, 37 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/bridge/lontium-lt9611uxc.c b/drivers/gpu/drm/bridge/lontium-lt9611uxc.c index b708700e182d..fee27952ec6d 100644 --- a/drivers/gpu/drm/bridge/lontium-lt9611uxc.c +++ b/drivers/gpu/drm/bridge/lontium-lt9611uxc.c @@ -14,6 +14,7 @@ #include #include #include +#include #include @@ -36,6 +37,7 @@ struct lt9611uxc { struct mutex ocm_lock; struct wait_queue_head wq; + struct work_struct work; struct device_node *dsi0_node; struct device_node *dsi1_node; @@ -52,6 +54,8 @@ struct lt9611uxc { bool hpd_supported; bool edid_read; + /* can be accessed from different threads, so protect this with ocm_lock */ + bool hdmi_connected; uint8_t fw_version; }; @@ -143,23 +147,41 @@ static irqreturn_t lt9611uxc_irq_thread_handler(int irq, void *dev_id) if (irq_status) regmap_write(lt9611uxc->regmap, 0xb022, 0); - lt9611uxc_unlock(lt9611uxc); - if (irq_status & BIT(0)) { lt9611uxc->edid_read = !!(hpd_status & BIT(0)); wake_up_all(<9611uxc->wq); } if (irq_status & BIT(1)) { - if (lt9611uxc->connector.dev) - drm_kms_helper_hotplug_event(lt9611uxc->connector.dev); - else - drm_bridge_hpd_notify(<9611uxc->bridge, !!(hpd_status & BIT(1))); + lt9611uxc->hdmi_connected = hpd_status & BIT(1); + schedule_work(<9611uxc->work); } + lt9611uxc_unlock(lt9611uxc); + return IRQ_HANDLED; } +static void lt9611uxc_hpd_work(struct work_struct *work) +{ + struct lt9611uxc *lt9611uxc = container_of(work, struct lt9611uxc, work); + bool connected; + + if (lt9611uxc->connector.dev) + drm_kms_helper_hotplug_event(lt9611uxc->connector.dev); + else { + + mutex_lock(<9611uxc->ocm_lock); + connected = lt9611uxc->hdmi_connected; + mutex_unlock(<9611uxc->ocm_lock); + + drm_bridge_hpd_notify(<9611uxc->bridge, + connected ? + connector_status_connected : + connector_status_disconnected); + } +} + static void lt9611uxc_reset(struct lt9611uxc *lt9611uxc) { gpiod_set_value_cansleep(lt9611uxc->reset_gpio, 1); @@ -447,18 +469,21 @@ static enum drm_connector_status lt9611uxc_bridge_detect(struct drm_bridge *brid struct lt9611uxc *lt9611uxc = bridge_to_lt9611uxc(bridge); unsigned int reg_val = 0; int ret; - int connected = 1; + bool connected = true; + + lt9611uxc_lock(lt9611uxc); if (lt9611uxc->hpd_supported) { - lt9611uxc_lock(lt9611uxc); ret = regmap_read(lt9611uxc->regmap, 0xb023, ®_val); - lt9611uxc_unlock(lt9611uxc); if (ret) dev_err(lt9611uxc->dev, "failed to read hpd status: %d\n", ret); else connected = reg_val & BIT(1); } + lt9611uxc->hdmi_connected = connected; + + lt9611uxc_unlock(lt9611uxc); return connected ? connector_status_connected : connector_status_disconnected; @@ -931,6 +956,8 @@ retry: lt9611uxc->fw_version = ret; init_waitqueue_head(<9611uxc->wq); + INIT_WORK(<9611uxc->work, lt9611uxc_hpd_work); + ret = devm_request_threaded_irq(dev, client->irq, NULL, lt9611uxc_irq_thread_handler, IRQF_ONESHOT, "lt9611uxc", lt9611uxc); @@ -967,6 +994,7 @@ static int lt9611uxc_remove(struct i2c_client *client) struct lt9611uxc *lt9611uxc = i2c_get_clientdata(client); disable_irq(client->irq); + flush_scheduled_work(); lt9611uxc_audio_exit(lt9611uxc); drm_bridge_remove(<9611uxc->bridge); From 2b1b3e544f65f40df5eef99753e460a127910479 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= Date: Thu, 28 Jan 2021 10:53:46 +0100 Subject: [PATCH 093/284] drm/ttm: Use __GFP_NOWARN for huge pages in ttm_pool_alloc_page MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Without __GFP_NOWARN, attempts at allocating huge pages can trigger dmesg splats like below (which are essentially noise, since TTM falls back to normal pages if it can't get a huge one). [ 9556.710241] clinfo: page allocation failure: order:9, mode:0x194dc2(GFP_HIGHUSER|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_ZERO|__GFP_NOMEMALLOC), nodemask=(null),cpuset=user.slice,mems_allowed=0 [ 9556.710259] CPU: 1 PID: 470821 Comm: clinfo Tainted: G E 5.10.10+ #4 [ 9556.710264] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.OR 11/29/2019 [ 9556.710268] Call Trace: [ 9556.710281] dump_stack+0x6b/0x83 [ 9556.710288] warn_alloc.cold+0x7b/0xdf [ 9556.710297] ? __alloc_pages_direct_compact+0x137/0x150 [ 9556.710303] __alloc_pages_slowpath.constprop.0+0xc1b/0xc50 [ 9556.710312] __alloc_pages_nodemask+0x2ec/0x320 [ 9556.710325] ttm_pool_alloc+0x2e4/0x5e0 [ttm] [ 9556.710332] ? kvmalloc_node+0x46/0x80 [ 9556.710341] ttm_tt_populate+0x37/0xe0 [ttm] [ 9556.710350] ttm_bo_handle_move_mem+0x142/0x180 [ttm] [ 9556.710359] ttm_bo_validate+0x11d/0x190 [ttm] [ 9556.710391] ? drm_vma_offset_add+0x2f/0x60 [drm] [ 9556.710399] ttm_bo_init_reserved+0x2a7/0x320 [ttm] [ 9556.710529] amdgpu_bo_do_create+0x1b8/0x500 [amdgpu] [ 9556.710657] ? amdgpu_bo_subtract_pin_size+0x60/0x60 [amdgpu] [ 9556.710663] ? get_page_from_freelist+0x11f9/0x1450 [ 9556.710789] amdgpu_bo_create+0x40/0x270 [amdgpu] [ 9556.710797] ? _raw_spin_unlock+0x16/0x30 [ 9556.710927] amdgpu_gem_create_ioctl+0x123/0x310 [amdgpu] [ 9556.711062] ? amdgpu_gem_force_release+0x150/0x150 [amdgpu] [ 9556.711098] drm_ioctl_kernel+0xaa/0xf0 [drm] [ 9556.711133] drm_ioctl+0x20f/0x3a0 [drm] [ 9556.711267] ? amdgpu_gem_force_release+0x150/0x150 [amdgpu] [ 9556.711276] ? preempt_count_sub+0x9b/0xd0 [ 9556.711404] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [ 9556.711411] __x64_sys_ioctl+0x83/0xb0 [ 9556.711417] do_syscall_64+0x33/0x80 [ 9556.711421] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: bf9eee249ac2 ("drm/ttm: stop using GFP_TRANSHUGE_LIGHT") Signed-off-by: Michel Dänzer Reviewed-by: Christian König Signed-off-by: Christian König Link: https://patchwork.freedesktop.org/patch/416353/ --- drivers/gpu/drm/ttm/ttm_pool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c index 11e0313db0ea..74bf1c84b637 100644 --- a/drivers/gpu/drm/ttm/ttm_pool.c +++ b/drivers/gpu/drm/ttm/ttm_pool.c @@ -84,7 +84,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags, * put_page() on a TTM allocated page is illegal. */ if (order) - gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | + gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM; if (!pool->use_dma_alloc) { From 4d395c5e74398f664405819330e5a298da37f655 Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Mon, 26 Oct 2020 19:12:59 +0300 Subject: [PATCH 094/284] thunderbolt: Fix possible NULL pointer dereference in tb_acpi_add_link() When we walk up the device hierarchy in tb_acpi_add_link() make sure we break the loop if the device has no parent. Otherwise we may crash the kernel by dereferencing a NULL pointer. Fixes: b2be2b05cf3b ("thunderbolt: Create device links from ACPI description") Cc: stable@vger.kernel.org Signed-off-by: Mario Limonciello Acked-by: Yehezkel Bernat Signed-off-by: Mika Westerberg --- drivers/thunderbolt/acpi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/thunderbolt/acpi.c b/drivers/thunderbolt/acpi.c index a5f988a9f948..b5442f979b4d 100644 --- a/drivers/thunderbolt/acpi.c +++ b/drivers/thunderbolt/acpi.c @@ -56,7 +56,7 @@ static acpi_status tb_acpi_add_link(acpi_handle handle, u32 level, void *data, * managed with the xHCI and the SuperSpeed hub so we create the * link from xHCI instead. */ - while (!dev_is_pci(dev)) + while (dev && !dev_is_pci(dev)) dev = dev->parent; if (!dev) From ae000861b95cc4521c498430eb9c61ad62cea51c Mon Sep 17 00:00:00 2001 From: Yu Zhang Date: Thu, 28 Jan 2021 23:47:47 +0800 Subject: [PATCH 095/284] KVM: Documentation: Fix documentation for nested. Nested VMX was enabled by default in commit 1e58e5e59148 ("KVM: VMX: enable nested virtualization by default"), which was merged in Linux 4.20. This patch is to fix the documentation accordingly. Signed-off-by: Yu Zhang Message-Id: <20210128154747.4242-1-yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/nested-vmx.rst | 6 ++++-- Documentation/virt/kvm/running-nested-guests.rst | 2 +- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/Documentation/virt/kvm/nested-vmx.rst b/Documentation/virt/kvm/nested-vmx.rst index 6ab4e35cee23..ac2095d41f02 100644 --- a/Documentation/virt/kvm/nested-vmx.rst +++ b/Documentation/virt/kvm/nested-vmx.rst @@ -37,8 +37,10 @@ call L2. Running nested VMX ------------------ -The nested VMX feature is disabled by default. It can be enabled by giving -the "nested=1" option to the kvm-intel module. +The nested VMX feature is enabled by default since Linux kernel v4.20. For +older Linux kernel, it can be enabled by giving the "nested=1" option to the +kvm-intel module. + No modifications are required to user space (qemu). However, qemu's default emulated CPU type (qemu64) does not list the "VMX" CPU feature, so it must be diff --git a/Documentation/virt/kvm/running-nested-guests.rst b/Documentation/virt/kvm/running-nested-guests.rst index d0a1fc754c84..bd70c69468ae 100644 --- a/Documentation/virt/kvm/running-nested-guests.rst +++ b/Documentation/virt/kvm/running-nested-guests.rst @@ -74,7 +74,7 @@ few: Enabling "nested" (x86) ----------------------- -From Linux kernel v4.19 onwards, the ``nested`` KVM parameter is enabled +From Linux kernel v4.20 onwards, the ``nested`` KVM parameter is enabled by default for Intel and AMD. (Though your Linux distribution might override this default.) From 19a23da53932bc8011220bd8c410cb76012de004 Mon Sep 17 00:00:00 2001 From: Peter Gonda Date: Wed, 27 Jan 2021 08:15:24 -0800 Subject: [PATCH 096/284] Fix unsynchronized access to sev members through svm_register_enc_region Grab kvm->lock before pinning memory when registering an encrypted region; sev_pin_memory() relies on kvm->lock being held to ensure correctness when checking and updating the number of pinned pages. Add a lockdep assertion to help prevent future regressions. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Paolo Bonzini Cc: Joerg Roedel Cc: Tom Lendacky Cc: Brijesh Singh Cc: Sean Christopherson Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org Fixes: 1e80fdc09d12 ("KVM: SVM: Pin guest memory when SEV is active") Signed-off-by: Peter Gonda V2 - Fix up patch description - Correct file paths svm.c -> sev.c - Add unlock of kvm->lock on sev_pin_memory error V1 - https://lore.kernel.org/kvm/20210126185431.1824530-1-pgonda@google.com/ Message-Id: <20210127161524.2832400-1-pgonda@google.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/svm/sev.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index ac652bc476ae..48017fef1cd9 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -342,6 +342,8 @@ static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr, unsigned long first, last; int ret; + lockdep_assert_held(&kvm->lock); + if (ulen == 0 || uaddr + ulen < uaddr) return ERR_PTR(-EINVAL); @@ -1119,12 +1121,20 @@ int svm_register_enc_region(struct kvm *kvm, if (!region) return -ENOMEM; + mutex_lock(&kvm->lock); region->pages = sev_pin_memory(kvm, range->addr, range->size, ®ion->npages, 1); if (IS_ERR(region->pages)) { ret = PTR_ERR(region->pages); + mutex_unlock(&kvm->lock); goto e_free; } + region->uaddr = range->addr; + region->size = range->size; + + list_add_tail(®ion->list, &sev->regions_list); + mutex_unlock(&kvm->lock); + /* * The guest may change the memory encryption attribute from C=0 -> C=1 * or vice versa for this memory range. Lets make sure caches are @@ -1133,13 +1143,6 @@ int svm_register_enc_region(struct kvm *kvm, */ sev_clflush_pages(region->pages, region->npages); - region->uaddr = range->addr; - region->size = range->size; - - mutex_lock(&kvm->lock); - list_add_tail(®ion->list, &sev->regions_list); - mutex_unlock(&kvm->lock); - return ret; e_free: From 39d3454c3513840eb123b3913fda6903e45ce671 Mon Sep 17 00:00:00 2001 From: Russell King Date: Sun, 18 Oct 2020 09:39:21 +0100 Subject: [PATCH 097/284] ARM: footbridge: fix dc21285 PCI configuration accessors Building with gcc 4.9.2 reveals a latent bug in the PCI accessors for Footbridge platforms, which causes a fatal alignment fault while accessing IO memory. Fix this by making the assembly volatile. Cc: stable@vger.kernel.org Signed-off-by: Russell King --- arch/arm/mach-footbridge/dc21285.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/arm/mach-footbridge/dc21285.c b/arch/arm/mach-footbridge/dc21285.c index 416462e3f5d6..f9713dc561cf 100644 --- a/arch/arm/mach-footbridge/dc21285.c +++ b/arch/arm/mach-footbridge/dc21285.c @@ -65,15 +65,15 @@ dc21285_read_config(struct pci_bus *bus, unsigned int devfn, int where, if (addr) switch (size) { case 1: - asm("ldrb %0, [%1, %2]" + asm volatile("ldrb %0, [%1, %2]" : "=r" (v) : "r" (addr), "r" (where) : "cc"); break; case 2: - asm("ldrh %0, [%1, %2]" + asm volatile("ldrh %0, [%1, %2]" : "=r" (v) : "r" (addr), "r" (where) : "cc"); break; case 4: - asm("ldr %0, [%1, %2]" + asm volatile("ldr %0, [%1, %2]" : "=r" (v) : "r" (addr), "r" (where) : "cc"); break; } @@ -99,17 +99,17 @@ dc21285_write_config(struct pci_bus *bus, unsigned int devfn, int where, if (addr) switch (size) { case 1: - asm("strb %0, [%1, %2]" + asm volatile("strb %0, [%1, %2]" : : "r" (value), "r" (addr), "r" (where) : "cc"); break; case 2: - asm("strh %0, [%1, %2]" + asm volatile("strh %0, [%1, %2]" : : "r" (value), "r" (addr), "r" (where) : "cc"); break; case 4: - asm("str %0, [%1, %2]" + asm volatile("str %0, [%1, %2]" : : "r" (value), "r" (addr), "r" (where) : "cc"); break; From 538eea5362a1179dfa7770dd2b6607dc30cc50c6 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 15 Dec 2020 16:16:44 +0100 Subject: [PATCH 098/284] ARM: 9043/1: tegra: Fix misplaced tegra_uart_config in decompressor The tegra_uart_config of the DEBUG_LL code is now placed right at the start of the .text section after commit which enabled debug output in the decompressor. Tegra devices are not booting anymore if DEBUG_LL is enabled since tegra_uart_config data is executes as a code. Fix the misplaced tegra_uart_config storage by embedding it into the code. Cc: stable@vger.kernel.org Fixes: 2596a72d3384 ("ARM: 9009/1: uncompress: Enable debug in head.S") Reviewed-by: Linus Walleij Signed-off-by: Dmitry Osipenko Signed-off-by: Russell King --- arch/arm/include/debug/tegra.S | 54 +++++++++++++++++----------------- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/arch/arm/include/debug/tegra.S b/arch/arm/include/debug/tegra.S index 98daa7f48314..7454480d084b 100644 --- a/arch/arm/include/debug/tegra.S +++ b/arch/arm/include/debug/tegra.S @@ -149,7 +149,34 @@ .align 99: .word . +#if defined(ZIMAGE) + .word . + 4 +/* + * Storage for the state maintained by the macro. + * + * In the kernel proper, this data is located in arch/arm/mach-tegra/tegra.c. + * That's because this header is included from multiple files, and we only + * want a single copy of the data. In particular, the UART probing code above + * assumes it's running using physical addresses. This is true when this file + * is included from head.o, but not when included from debug.o. So we need + * to share the probe results between the two copies, rather than having + * to re-run the probing again later. + * + * In the decompressor, we put the storage right here, since common.c + * isn't included in the decompressor build. This storage data gets put in + * .text even though it's really data, since .data is discarded from the + * decompressor. Luckily, .text is writeable in the decompressor, unless + * CONFIG_ZBOOT_ROM. That dependency is handled in arch/arm/Kconfig.debug. + */ + /* Debug UART initialization required */ + .word 1 + /* Debug UART physical address */ + .word 0 + /* Debug UART virtual address */ + .word 0 +#else .word tegra_uart_config +#endif .ltorg /* Load previously selected UART address */ @@ -189,30 +216,3 @@ .macro waituarttxrdy,rd,rx .endm - -/* - * Storage for the state maintained by the macros above. - * - * In the kernel proper, this data is located in arch/arm/mach-tegra/tegra.c. - * That's because this header is included from multiple files, and we only - * want a single copy of the data. In particular, the UART probing code above - * assumes it's running using physical addresses. This is true when this file - * is included from head.o, but not when included from debug.o. So we need - * to share the probe results between the two copies, rather than having - * to re-run the probing again later. - * - * In the decompressor, we put the symbol/storage right here, since common.c - * isn't included in the decompressor build. This symbol gets put in .text - * even though it's really data, since .data is discarded from the - * decompressor. Luckily, .text is writeable in the decompressor, unless - * CONFIG_ZBOOT_ROM. That dependency is handled in arch/arm/Kconfig.debug. - */ -#if defined(ZIMAGE) -tegra_uart_config: - /* Debug UART initialization required */ - .word 1 - /* Debug UART physical address */ - .word 0 - /* Debug UART virtual address */ - .word 0 -#endif From c351bb64cbe67029c68dea3adbec1b9508c6ff0f Mon Sep 17 00:00:00 2001 From: Quanyang Wang Date: Fri, 29 Jan 2021 16:19:17 +0800 Subject: [PATCH 099/284] gpiolib: free device name on error path to fix kmemleak In gpiochip_add_data_with_key, we should check the return value of dev_set_name to ensure that device name is allocated successfully and then add a label on the error path to free device name to fix kmemleak as below: unreferenced object 0xc2d6fc40 (size 64): comm "kworker/0:1", pid 16, jiffies 4294937425 (age 65.120s) hex dump (first 32 bytes): 67 70 69 6f 63 68 69 70 30 00 1a c0 54 63 1a c0 gpiochip0...Tc.. 0c ed 84 c0 48 ed 84 c0 3c ee 84 c0 10 00 00 00 ....H...<....... backtrace: [<962810f7>] kobject_set_name_vargs+0x2c/0xa0 [] dev_set_name+0x2c/0x5c [<94abbca9>] gpiochip_add_data_with_key+0xfc/0xce8 [<5c4193e0>] omap_gpio_probe+0x33c/0x68c [<3402f137>] platform_probe+0x58/0xb8 [<7421e210>] really_probe+0xec/0x3b4 [<000f8ada>] driver_probe_device+0x58/0xb4 [<67e0f7f7>] bus_for_each_drv+0x80/0xd0 [<4de545dc>] __device_attach+0xe8/0x15c [<2e4431e7>] bus_probe_device+0x84/0x8c [] device_add+0x384/0x7c0 [<5aff2995>] of_platform_device_create_pdata+0x8c/0xb8 [<061c3483>] of_platform_bus_create+0x198/0x230 [<5ee6d42a>] of_platform_populate+0x60/0xb8 [<2647300f>] sysc_probe+0xd18/0x135c [<3402f137>] platform_probe+0x58/0xb8 Signed-off-by: Quanyang Wang Cc: stable@vger.kernel.org Signed-off-by: Bartosz Golaszewski --- drivers/gpio/gpiolib.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index b78a634cca24..3ba9c981f0b9 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -603,7 +603,11 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, ret = gdev->id; goto err_free_gdev; } - dev_set_name(&gdev->dev, GPIOCHIP_NAME "%d", gdev->id); + + ret = dev_set_name(&gdev->dev, GPIOCHIP_NAME "%d", gdev->id); + if (ret) + goto err_free_ida; + device_initialize(&gdev->dev); dev_set_drvdata(&gdev->dev, gdev); if (gc->parent && gc->parent->driver) @@ -617,7 +621,7 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, gdev->descs = kcalloc(gc->ngpio, sizeof(gdev->descs[0]), GFP_KERNEL); if (!gdev->descs) { ret = -ENOMEM; - goto err_free_ida; + goto err_free_dev_name; } if (gc->ngpio == 0) { @@ -768,6 +772,8 @@ err_free_label: kfree_const(gdev->label); err_free_descs: kfree(gdev->descs); +err_free_dev_name: + kfree(dev_name(&gdev->dev)); err_free_ida: ida_free(&gpio_ida, gdev->id); err_free_gdev: From 20bf2b378729c4a0366a53e2018a0b70ace94bcd Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Thu, 28 Jan 2021 15:52:19 -0600 Subject: [PATCH 100/284] x86/build: Disable CET instrumentation in the kernel With retpolines disabled, some configurations of GCC, and specifically the GCC versions 9 and 10 in Ubuntu will add Intel CET instrumentation to the kernel by default. That breaks certain tracing scenarios by adding a superfluous ENDBR64 instruction before the fentry call, for functions which can be called indirectly. CET instrumentation isn't currently necessary in the kernel, as CET is only supported in user space. Disable it unconditionally and move it into the x86's Makefile as CET/CFI... enablement should be a per-arch decision anyway. [ bp: Massage and extend commit message. ] Fixes: 29be86d7f9cb ("kbuild: add -fcf-protection=none when using retpoline flags") Reported-by: Nikolay Borisov Signed-off-by: Josh Poimboeuf Signed-off-by: Borislav Petkov Reviewed-by: Nikolay Borisov Tested-by: Nikolay Borisov Cc: Cc: Seth Forshee Cc: Masahiro Yamada Link: https://lkml.kernel.org/r/20210128215219.6kct3h2eiustncws@treble --- Makefile | 6 ------ arch/x86/Makefile | 3 +++ 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/Makefile b/Makefile index e0af7a4a5598..51c2bf34142d 100644 --- a/Makefile +++ b/Makefile @@ -948,12 +948,6 @@ KBUILD_CFLAGS += $(call cc-option,-Werror=designated-init) # change __FILE__ to the relative path from the srctree KBUILD_CPPFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=) -# ensure -fcf-protection is disabled when using retpoline as it is -# incompatible with -mindirect-branch=thunk-extern -ifdef CONFIG_RETPOLINE -KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none) -endif - # include additional Makefiles when needed include-y := scripts/Makefile.extrawarn include-$(CONFIG_KASAN) += scripts/Makefile.kasan diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 7116da3980be..5857917f83ee 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -120,6 +120,9 @@ else KBUILD_CFLAGS += -mno-red-zone KBUILD_CFLAGS += -mcmodel=kernel + + # Intel CET isn't enabled in the kernel + KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none) endif ifdef CONFIG_X86_X32 From 8c65830ae1629b03e5d65e9aafae7e2cf5f8b743 Mon Sep 17 00:00:00 2001 From: James Smart Date: Wed, 27 Jan 2021 14:16:01 -0800 Subject: [PATCH 101/284] scsi: lpfc: Fix EEH encountering oops with NVMe traffic In testing, in a configuration with Redfish and native NVMe multipath when an EEH is injected, a kernel oops is being encountered: (unreliable) lpfc_nvme_ls_req+0x328/0x720 [lpfc] __nvme_fc_send_ls_req.constprop.13+0x1d8/0x3d0 [nvme_fc] nvme_fc_create_association+0x224/0xd10 [nvme_fc] nvme_fc_reset_ctrl_work+0x110/0x154 [nvme_fc] process_one_work+0x304/0x5d the NBMe transport is issuing a Disconnect LS request, which the driver receives and tries to post but the work queue used by the driver is already being torn down by the eeh. Fix by validating the validity of the work queue before proceeding with the LS transmit. Link: https://lore.kernel.org/r/20210127221601.84878-1-jsmart2021@gmail.com Reviewed-by: Ewan D. Milne Signed-off-by: James Smart Signed-off-by: Martin K. Petersen --- drivers/scsi/lpfc/lpfc_nvme.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c index 1cb82fa6a60e..39d147e251bf 100644 --- a/drivers/scsi/lpfc/lpfc_nvme.c +++ b/drivers/scsi/lpfc/lpfc_nvme.c @@ -559,6 +559,9 @@ __lpfc_nvme_ls_req(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp, return -ENODEV; } + if (!vport->phba->sli4_hba.nvmels_wq) + return -ENOMEM; + /* * there are two dma buf in the request, actually there is one and * the second one is just the start address + cmd size. From 7e0a9220467dbcfdc5bc62825724f3e52e50ab31 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Fri, 29 Jan 2021 10:13:53 -0500 Subject: [PATCH 102/284] fgraph: Initialize tracing_graph_pause at task creation On some archs, the idle task can call into cpu_suspend(). The cpu_suspend() will disable or pause function graph tracing, as there's some paths in bringing down the CPU that can have issues with its return address being modified. The task_struct structure has a "tracing_graph_pause" atomic counter, that when set to something other than zero, the function graph tracer will not modify the return address. The problem is that the tracing_graph_pause counter is initialized when the function graph tracer is enabled. This can corrupt the counter for the idle task if it is suspended in these architectures. CPU 1 CPU 2 ----- ----- do_idle() cpu_suspend() pause_graph_tracing() task_struct->tracing_graph_pause++ (0 -> 1) start_graph_tracing() for_each_online_cpu(cpu) { ftrace_graph_init_idle_task(cpu) task-struct->tracing_graph_pause = 0 (1 -> 0) unpause_graph_tracing() task_struct->tracing_graph_pause-- (0 -> -1) The above should have gone from 1 to zero, and enabled function graph tracing again. But instead, it is set to -1, which keeps it disabled. There's no reason that the field tracing_graph_pause on the task_struct can not be initialized at boot up. Cc: stable@vger.kernel.org Fixes: 380c4b1411ccd ("tracing/function-graph-tracer: append the tracing_graph_flag") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211339 Reported-by: pierre.gondois@arm.com Signed-off-by: Steven Rostedt (VMware) --- init/init_task.c | 3 ++- kernel/trace/fgraph.c | 2 -- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/init/init_task.c b/init/init_task.c index 8a992d73e6fb..3711cdaafed2 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -198,7 +198,8 @@ struct task_struct init_task .lockdep_recursion = 0, #endif #ifdef CONFIG_FUNCTION_GRAPH_TRACER - .ret_stack = NULL, + .ret_stack = NULL, + .tracing_graph_pause = ATOMIC_INIT(0), #endif #if defined(CONFIG_TRACING) && defined(CONFIG_PREEMPTION) .trace_recursion = 0, diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c index 73edb9e4f354..29a6ebeebc9e 100644 --- a/kernel/trace/fgraph.c +++ b/kernel/trace/fgraph.c @@ -394,7 +394,6 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list) } if (t->ret_stack == NULL) { - atomic_set(&t->tracing_graph_pause, 0); atomic_set(&t->trace_overrun, 0); t->curr_ret_stack = -1; t->curr_ret_depth = -1; @@ -489,7 +488,6 @@ static DEFINE_PER_CPU(struct ftrace_ret_stack *, idle_ret_stack); static void graph_init_task(struct task_struct *t, struct ftrace_ret_stack *ret_stack) { - atomic_set(&t->tracing_graph_pause, 0); atomic_set(&t->trace_overrun, 0); t->ftrace_timestamp = 0; /* make curr_ret_stack visible before we add the ret_stack */ From da7f84cdf02fd5f66864041f45018b328911b722 Mon Sep 17 00:00:00 2001 From: Viktor Rosendahl Date: Tue, 19 Jan 2021 17:43:43 +0100 Subject: [PATCH 103/284] tracing: Use pause-on-trace with the latency tracers Eaerlier, tracing was disabled when reading the trace file. This behavior was changed with: commit 06e0a548bad0 ("tracing: Do not disable tracing when reading the trace file"). This doesn't seem to work with the latency tracers. The above mentioned commit dit not only change the behavior but also added an option to emulate the old behavior. The idea with this patch is to enable this pause-on-trace option when the latency tracers are used. Link: https://lkml.kernel.org/r/20210119164344.37500-2-Viktor.Rosendahl@bmw.de Cc: stable@vger.kernel.org Fixes: 06e0a548bad0 ("tracing: Do not disable tracing when reading the trace file") Signed-off-by: Viktor Rosendahl Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_irqsoff.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index d06aab4dcbb8..6756379b661f 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -562,6 +562,8 @@ static int __irqsoff_tracer_init(struct trace_array *tr) /* non overwrite screws up the latency tracers */ set_tracer_flag(tr, TRACE_ITER_OVERWRITE, 1); set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, 1); + /* without pause, we will produce garbage if another latency occurs */ + set_tracer_flag(tr, TRACE_ITER_PAUSE_ON_TRACE, 1); tr->max_latency = 0; irqsoff_trace = tr; @@ -583,11 +585,13 @@ static void __irqsoff_tracer_reset(struct trace_array *tr) { int lat_flag = save_flags & TRACE_ITER_LATENCY_FMT; int overwrite_flag = save_flags & TRACE_ITER_OVERWRITE; + int pause_flag = save_flags & TRACE_ITER_PAUSE_ON_TRACE; stop_irqsoff_tracer(tr, is_graph(tr)); set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, lat_flag); set_tracer_flag(tr, TRACE_ITER_OVERWRITE, overwrite_flag); + set_tracer_flag(tr, TRACE_ITER_PAUSE_ON_TRACE, pause_flag); ftrace_reset_array_ops(tr); irqsoff_busy = false; From 97c753e62e6c31a404183898d950d8c08d752dbd Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Thu, 28 Jan 2021 00:37:51 +0900 Subject: [PATCH 104/284] tracing/kprobe: Fix to support kretprobe events on unloaded modules Fix kprobe_on_func_entry() returns error code instead of false so that register_kretprobe() can return an appropriate error code. append_trace_kprobe() expects the kprobe registration returns -ENOENT when the target symbol is not found, and it checks whether the target module is unloaded or not. If the target module doesn't exist, it defers to probe the target symbol until the module is loaded. However, since register_kretprobe() returns -EINVAL instead of -ENOENT in that case, it always fail on putting the kretprobe event on unloaded modules. e.g. Kprobe event: /sys/kernel/debug/tracing # echo p xfs:xfs_end_io >> kprobe_events [ 16.515574] trace_kprobe: This probe might be able to register after target module is loaded. Continue. Kretprobe event: (p -> r) /sys/kernel/debug/tracing # echo r xfs:xfs_end_io >> kprobe_events sh: write error: Invalid argument /sys/kernel/debug/tracing # cat error_log [ 41.122514] trace_kprobe: error: Failed to register probe event Command: r xfs:xfs_end_io ^ To fix this bug, change kprobe_on_func_entry() to detect symbol lookup failure and return -ENOENT in that case. Otherwise it returns -EINVAL or 0 (succeeded, given address is on the entry). Link: https://lkml.kernel.org/r/161176187132.1067016.8118042342894378981.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: 59158ec4aef7 ("tracing/kprobes: Check the probe on unloaded module correctly") Reported-by: Jianlin Lv Signed-off-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (VMware) --- include/linux/kprobes.h | 2 +- kernel/kprobes.c | 34 +++++++++++++++++++++++++--------- kernel/trace/trace_kprobe.c | 10 ++++++---- 3 files changed, 32 insertions(+), 14 deletions(-) diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index b3a36b0cfc81..1883a4a9f16a 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -266,7 +266,7 @@ extern void kprobes_inc_nmissed_count(struct kprobe *p); extern bool arch_within_kprobe_blacklist(unsigned long addr); extern int arch_populate_kprobe_blacklist(void); extern bool arch_kprobe_on_func_entry(unsigned long offset); -extern bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset); +extern int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset); extern bool within_kprobe_blacklist(unsigned long addr); extern int kprobe_add_ksym_blacklist(unsigned long entry); diff --git a/kernel/kprobes.c b/kernel/kprobes.c index f7fb5d135930..1a5bc321e0a5 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -1954,29 +1954,45 @@ bool __weak arch_kprobe_on_func_entry(unsigned long offset) return !offset; } -bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset) +/** + * kprobe_on_func_entry() -- check whether given address is function entry + * @addr: Target address + * @sym: Target symbol name + * @offset: The offset from the symbol or the address + * + * This checks whether the given @addr+@offset or @sym+@offset is on the + * function entry address or not. + * This returns 0 if it is the function entry, or -EINVAL if it is not. + * And also it returns -ENOENT if it fails the symbol or address lookup. + * Caller must pass @addr or @sym (either one must be NULL), or this + * returns -EINVAL. + */ +int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset) { kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset); if (IS_ERR(kp_addr)) - return false; + return PTR_ERR(kp_addr); - if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset) || - !arch_kprobe_on_func_entry(offset)) - return false; + if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset)) + return -ENOENT; - return true; + if (!arch_kprobe_on_func_entry(offset)) + return -EINVAL; + + return 0; } int register_kretprobe(struct kretprobe *rp) { - int ret = 0; + int ret; struct kretprobe_instance *inst; int i; void *addr; - if (!kprobe_on_func_entry(rp->kp.addr, rp->kp.symbol_name, rp->kp.offset)) - return -EINVAL; + ret = kprobe_on_func_entry(rp->kp.addr, rp->kp.symbol_name, rp->kp.offset); + if (ret) + return ret; if (kretprobe_blacklist_size) { addr = kprobe_addr(&rp->kp); diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index e6fba1798771..56c7fbff7bd7 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -221,9 +221,9 @@ bool trace_kprobe_on_func_entry(struct trace_event_call *call) { struct trace_kprobe *tk = trace_kprobe_primary_from_call(call); - return tk ? kprobe_on_func_entry(tk->rp.kp.addr, + return tk ? (kprobe_on_func_entry(tk->rp.kp.addr, tk->rp.kp.addr ? NULL : tk->rp.kp.symbol_name, - tk->rp.kp.addr ? 0 : tk->rp.kp.offset) : false; + tk->rp.kp.addr ? 0 : tk->rp.kp.offset) == 0) : false; } bool trace_kprobe_error_injectable(struct trace_event_call *call) @@ -828,9 +828,11 @@ static int trace_kprobe_create(int argc, const char *argv[]) } if (is_return) flags |= TPARG_FL_RETURN; - if (kprobe_on_func_entry(NULL, symbol, offset)) + ret = kprobe_on_func_entry(NULL, symbol, offset); + if (ret == 0) flags |= TPARG_FL_FENTRY; - if (offset && is_return && !(flags & TPARG_FL_FENTRY)) { + /* Defer the ENOENT case until register kprobe */ + if (ret == -EINVAL && is_return) { trace_probe_log_err(0, BAD_RETPROBE); goto parse_error; } From ed4e9e615b7ec4992a4eba1643e62ec2d9d979db Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Wed, 13 Jan 2021 17:34:47 -0700 Subject: [PATCH 105/284] Documentation/llvm: Add a section about supported architectures The most common question around building the Linux kernel with clang is "does it work?" and the answer has always been "it depends on your architecture, configuration, and LLVM version" with no hard answers for users wanting to experiment. LLVM support has significantly improved over the past couple of years, resulting in more architectures and configurations supported, and continuous integration has made it easier to see what works and what does not. Add a section that goes over what architectures are supported in the current kernel version, how they should be built (with just clang or the LLVM utilities as well), and the level of support they receive. This will make it easier for people to try out building their kernel with LLVM and reporting issues that come about from it. Suggested-by: Miguel Ojeda Signed-off-by: Nathan Chancellor Reviewed-by: Nick Desaulniers Signed-off-by: Masahiro Yamada --- Documentation/kbuild/llvm.rst | 44 +++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/Documentation/kbuild/llvm.rst b/Documentation/kbuild/llvm.rst index 21c847890d03..b18401d2ba82 100644 --- a/Documentation/kbuild/llvm.rst +++ b/Documentation/kbuild/llvm.rst @@ -63,6 +63,50 @@ They can be enabled individually. The full list of the parameters: :: Currently, the integrated assembler is disabled by default. You can pass ``LLVM_IAS=1`` to enable it. +Supported Architectures +----------------------- + +LLVM does not target all of the architectures that Linux supports and +just because a target is supported in LLVM does not mean that the kernel +will build or work without any issues. Below is a general summary of +architectures that currently work with ``CC=clang`` or ``LLVM=1``. Level +of support corresponds to "S" values in the MAINTAINERS files. If an +architecture is not present, it either means that LLVM does not target +it or there are known issues. Using the latest stable version of LLVM or +even the development tree will generally yield the best results. +An architecture's ``defconfig`` is generally expected to work well, +certain configurations may have problems that have not been uncovered +yet. Bug reports are always welcome at the issue tracker below! + +.. list-table:: + :widths: 10 10 10 + :header-rows: 1 + + * - Architecture + - Level of support + - ``make`` command + * - arm + - Supported + - ``LLVM=1`` + * - arm64 + - Supported + - ``LLVM=1`` + * - mips + - Maintained + - ``CC=clang`` + * - powerpc + - Maintained + - ``CC=clang`` + * - riscv + - Maintained + - ``CC=clang`` + * - s390 + - Maintained + - ``CC=clang`` + * - x86 + - Supported + - ``LLVM=1`` + Getting Help ------------ From 0188b87899ffc4a1d36a0badbe77d56c92fd91dc Mon Sep 17 00:00:00 2001 From: Wang ShaoBo Date: Thu, 28 Jan 2021 20:44:27 +0800 Subject: [PATCH 106/284] kretprobe: Avoid re-registration of the same kretprobe earlier Our system encountered a re-init error when re-registering same kretprobe, where the kretprobe_instance in rp->free_instances is illegally accessed after re-init. Implementation to avoid re-registration has been introduced for kprobe before, but lags for register_kretprobe(). We must check if kprobe has been re-registered before re-initializing kretprobe, otherwise it will destroy the data struct of kretprobe registered, which can lead to memory leak, system crash, also some unexpected behaviors. We use check_kprobe_rereg() to check if kprobe has been re-registered before running register_kretprobe()'s body, for giving a warning message and terminate registration process. Link: https://lkml.kernel.org/r/20210128124427.2031088-1-bobo.shaobowang@huawei.com Cc: stable@vger.kernel.org Fixes: 1f0ab40976460 ("kprobes: Prevent re-registration of the same kprobe") [ The above commit should have been done for kretprobes too ] Acked-by: Naveen N. Rao Acked-by: Ananth N Mavinakayanahalli Acked-by: Masami Hiramatsu Signed-off-by: Wang ShaoBo Signed-off-by: Cheng Jian Signed-off-by: Steven Rostedt (VMware) --- kernel/kprobes.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 1a5bc321e0a5..d5a3eb74a657 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -1994,6 +1994,10 @@ int register_kretprobe(struct kretprobe *rp) if (ret) return ret; + /* If only rp->kp.addr is specified, check reregistering kprobes */ + if (rp->kp.addr && check_kprobe_rereg(&rp->kp)) + return -EINVAL; + if (kretprobe_blacklist_size) { addr = kprobe_addr(&rp->kp); if (IS_ERR(addr)) From 4c457e8cb75eda91906a4f89fc39bde3f9a43922 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Sat, 23 Jan 2021 12:27:59 +0000 Subject: [PATCH 107/284] genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set When MSI_FLAG_ACTIVATE_EARLY is set (which is the case for PCI), __msi_domain_alloc_irqs() performs the activation of the interrupt (which in the case of PCI results in the endpoint being programmed) as soon as the interrupt is allocated. But it appears that this is only done for the first vector, introducing an inconsistent behaviour for PCI Multi-MSI. Fix it by iterating over the number of vectors allocated to each MSI descriptor. This is easily achieved by introducing a new "for_each_msi_vector" iterator, together with a tiny bit of refactoring. Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early") Reported-by: Shameer Kolothum Signed-off-by: Marc Zyngier Signed-off-by: Thomas Gleixner Tested-by: Shameer Kolothum Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210123122759.1781359-1-maz@kernel.org --- include/linux/msi.h | 6 ++++++ kernel/irq/msi.c | 44 ++++++++++++++++++++------------------------ 2 files changed, 26 insertions(+), 24 deletions(-) diff --git a/include/linux/msi.h b/include/linux/msi.h index 360a0a7e7341..aef35fd1cf11 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -178,6 +178,12 @@ struct msi_desc { list_for_each_entry((desc), dev_to_msi_list((dev)), list) #define for_each_msi_entry_safe(desc, tmp, dev) \ list_for_each_entry_safe((desc), (tmp), dev_to_msi_list((dev)), list) +#define for_each_msi_vector(desc, __irq, dev) \ + for_each_msi_entry((desc), (dev)) \ + if ((desc)->irq) \ + for (__irq = (desc)->irq; \ + __irq < ((desc)->irq + (desc)->nvec_used); \ + __irq++) #ifdef CONFIG_IRQ_MSI_IOMMU static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c index dc0e2d7fbdfd..b338d622f26e 100644 --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -436,22 +436,22 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, can_reserve = msi_check_reservation_mode(domain, info, dev); - for_each_msi_entry(desc, dev) { - virq = desc->irq; - if (desc->nvec_used == 1) - dev_dbg(dev, "irq %d for MSI\n", virq); - else + /* + * This flag is set by the PCI layer as we need to activate + * the MSI entries before the PCI layer enables MSI in the + * card. Otherwise the card latches a random msi message. + */ + if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY)) + goto skip_activate; + + for_each_msi_vector(desc, i, dev) { + if (desc->irq == i) { + virq = desc->irq; dev_dbg(dev, "irq [%d-%d] for MSI\n", virq, virq + desc->nvec_used - 1); - /* - * This flag is set by the PCI layer as we need to activate - * the MSI entries before the PCI layer enables MSI in the - * card. Otherwise the card latches a random msi message. - */ - if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY)) - continue; + } - irq_data = irq_domain_get_irq_data(domain, desc->irq); + irq_data = irq_domain_get_irq_data(domain, i); if (!can_reserve) { irqd_clr_can_reserve(irq_data); if (domain->flags & IRQ_DOMAIN_MSI_NOMASK_QUIRK) @@ -462,28 +462,24 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, goto cleanup; } +skip_activate: /* * If these interrupts use reservation mode, clear the activated bit * so request_irq() will assign the final vector. */ if (can_reserve) { - for_each_msi_entry(desc, dev) { - irq_data = irq_domain_get_irq_data(domain, desc->irq); + for_each_msi_vector(desc, i, dev) { + irq_data = irq_domain_get_irq_data(domain, i); irqd_clr_activated(irq_data); } } return 0; cleanup: - for_each_msi_entry(desc, dev) { - struct irq_data *irqd; - - if (desc->irq == virq) - break; - - irqd = irq_domain_get_irq_data(domain, desc->irq); - if (irqd_is_activated(irqd)) - irq_domain_deactivate_irq(irqd); + for_each_msi_vector(desc, i, dev) { + irq_data = irq_domain_get_irq_data(domain, i); + if (irqd_is_activated(irq_data)) + irq_domain_deactivate_irq(irq_data); } msi_domain_free_irqs(domain, dev); return ret; From 344717a14cd7272f88346022a77742323346299e Mon Sep 17 00:00:00 2001 From: Ravi Bangoria Date: Fri, 29 Jan 2021 12:47:45 +0530 Subject: [PATCH 108/284] powerpc/sstep: Fix array out of bound warning Compiling kernel with -Warray-bounds throws below warning: In function 'emulate_vsx_store': warning: array subscript is above array bounds [-Warray-bounds] buf.d[2] = byterev_8(reg->d[1]); ~~~~~^~~ buf.d[3] = byterev_8(reg->d[0]); ~~~~~^~~ Fix it by using temporary array variable 'union vsx_reg buf32[]' in that code block. Also, with element_size = 32, 'union vsx_reg *reg' is an array of size 2. So, use 'reg' as an array instead of pointer in the same code block. Fixes: af99da74333b ("powerpc/sstep: Support VSX vector paired storage access instructions") Suggested-by: Naveen N. Rao Signed-off-by: Ravi Bangoria Tested-by: Naveen N. Rao Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20210129071745.111466-1-ravi.bangoria@linux.ibm.com --- arch/powerpc/lib/sstep.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index bf7a7d62ae8b..ede093e96234 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -818,13 +818,15 @@ void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg, break; if (rev) { /* reverse 32 bytes */ - buf.d[0] = byterev_8(reg->d[3]); - buf.d[1] = byterev_8(reg->d[2]); - buf.d[2] = byterev_8(reg->d[1]); - buf.d[3] = byterev_8(reg->d[0]); - reg = &buf; + union vsx_reg buf32[2]; + buf32[0].d[0] = byterev_8(reg[1].d[1]); + buf32[0].d[1] = byterev_8(reg[1].d[0]); + buf32[1].d[0] = byterev_8(reg[0].d[1]); + buf32[1].d[1] = byterev_8(reg[0].d[0]); + memcpy(mem, buf32, size); + } else { + memcpy(mem, reg, size); } - memcpy(mem, reg, size); break; case 16: /* stxv, stxvx, stxvl, stxvll */ From bce74491c3008e27dd6e8f79a83b4faa77a08f7e Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Thu, 24 Dec 2020 02:11:41 +0900 Subject: [PATCH 109/284] powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o vgettimeofday.o is unnecessarily rebuilt. Adding it to 'targets' is not enough to fix the issue. Kbuild is correctly rebuilding it because the command line is changed. PowerPC builds each vdso directory twice; first in vdso_prepare to generate vdso{32,64}-offsets.h, second as part of the ordinary build process to embed vdso{32,64}.so.dbg into the kernel. The problem shows up when CONFIG_PPC_WERROR=y due to the following line in arch/powerpc/Kbuild: subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror In the preparation stage, Kbuild directly visits the vdso directories, hence it does not inherit subdir-ccflags-y. In the second descend, Kbuild adds -Werror, which results in the command line flipping with/without -Werror. It implies a potential danger; if a more critical flag that would impact the resulted vdso, the offsets recorded in the headers might be different from real offsets in the embedded vdso images. Removing the unneeded second descend solves the problem. Reported-by: Michael Ellerman Signed-off-by: Masahiro Yamada Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/linuxppc-dev/87tuslxhry.fsf@mpe.ellerman.id.au/ Link: https://lore.kernel.org/r/20201223171142.707053-1-masahiroy@kernel.org --- arch/powerpc/kernel/Makefile | 4 ++-- arch/powerpc/kernel/vdso32/Makefile | 5 +---- arch/powerpc/kernel/{vdso32 => }/vdso32_wrapper.S | 0 arch/powerpc/kernel/vdso64/Makefile | 6 +----- arch/powerpc/kernel/{vdso64 => }/vdso64_wrapper.S | 0 5 files changed, 4 insertions(+), 11 deletions(-) rename arch/powerpc/kernel/{vdso32 => }/vdso32_wrapper.S (100%) rename arch/powerpc/kernel/{vdso64 => }/vdso64_wrapper.S (100%) diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index fe2ef598e2ea..79ee7750937d 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -51,7 +51,7 @@ obj-y += ptrace/ obj-$(CONFIG_PPC64) += setup_64.o \ paca.o nvram_64.o note.o syscall_64.o obj-$(CONFIG_COMPAT) += sys_ppc32.o signal_32.o -obj-$(CONFIG_VDSO32) += vdso32/ +obj-$(CONFIG_VDSO32) += vdso32_wrapper.o obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o obj-$(CONFIG_PPC_DAWR) += dawr.o @@ -60,7 +60,7 @@ obj-$(CONFIG_PPC_BOOK3S_64) += cpu_setup_power.o obj-$(CONFIG_PPC_BOOK3S_64) += mce.o mce_power.o obj-$(CONFIG_PPC_BOOK3E_64) += exceptions-64e.o idle_book3e.o obj-$(CONFIG_PPC_BARRIER_NOSPEC) += security.o -obj-$(CONFIG_PPC64) += vdso64/ +obj-$(CONFIG_PPC64) += vdso64_wrapper.o obj-$(CONFIG_ALTIVEC) += vecemu.o obj-$(CONFIG_PPC_BOOK3S_IDLE) += idle_book3s.o procfs-y := proc_powerpc.o diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile index 9cb6f524854b..7d9a6fee0e3d 100644 --- a/arch/powerpc/kernel/vdso32/Makefile +++ b/arch/powerpc/kernel/vdso32/Makefile @@ -30,7 +30,7 @@ CC32FLAGS += -m32 KBUILD_CFLAGS := $(filter-out -mcmodel=medium -mabi=elfv1 -mabi=elfv2 -mcall-aixdesc,$(KBUILD_CFLAGS)) endif -targets := $(obj-vdso32) vdso32.so.dbg +targets := $(obj-vdso32) vdso32.so.dbg vgettimeofday.o obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32)) GCOV_PROFILE := n @@ -46,9 +46,6 @@ obj-y += vdso32_wrapper.o targets += vdso32.lds CPPFLAGS_vdso32.lds += -P -C -Upowerpc -# Force dependency (incbin is bad) -$(obj)/vdso32_wrapper.o : $(obj)/vdso32.so.dbg - # link rule for the .so file, .lds has to be first $(obj)/vdso32.so.dbg: $(src)/vdso32.lds $(obj-vdso32) $(obj)/vgettimeofday.o FORCE $(call if_changed,vdso32ld_and_check) diff --git a/arch/powerpc/kernel/vdso32/vdso32_wrapper.S b/arch/powerpc/kernel/vdso32_wrapper.S similarity index 100% rename from arch/powerpc/kernel/vdso32/vdso32_wrapper.S rename to arch/powerpc/kernel/vdso32_wrapper.S diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile index bf363ff37152..a06fb9b461fc 100644 --- a/arch/powerpc/kernel/vdso64/Makefile +++ b/arch/powerpc/kernel/vdso64/Makefile @@ -17,7 +17,7 @@ endif # Build rules -targets := $(obj-vdso64) vdso64.so.dbg +targets := $(obj-vdso64) vdso64.so.dbg vgettimeofday.o obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64)) GCOV_PROFILE := n @@ -29,15 +29,11 @@ ccflags-y := -shared -fno-common -fno-builtin -nostdlib \ -Wl,-soname=linux-vdso64.so.1 -Wl,--hash-style=both asflags-y := -D__VDSO64__ -s -obj-y += vdso64_wrapper.o targets += vdso64.lds CPPFLAGS_vdso64.lds += -P -C -U$(ARCH) $(obj)/vgettimeofday.o: %.o: %.c FORCE -# Force dependency (incbin is bad) -$(obj)/vdso64_wrapper.o : $(obj)/vdso64.so.dbg - # link rule for the .so file, .lds has to be first $(obj)/vdso64.so.dbg: $(src)/vdso64.lds $(obj-vdso64) $(obj)/vgettimeofday.o FORCE $(call if_changed,vdso64ld_and_check) diff --git a/arch/powerpc/kernel/vdso64/vdso64_wrapper.S b/arch/powerpc/kernel/vdso64_wrapper.S similarity index 100% rename from arch/powerpc/kernel/vdso64/vdso64_wrapper.S rename to arch/powerpc/kernel/vdso64_wrapper.S From 66f0a9e058fad50e569ad752be72e52701991fd5 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Thu, 24 Dec 2020 02:11:42 +0900 Subject: [PATCH 110/284] powerpc/vdso64: remove meaningless vgettimeofday.o build rule VDSO64 is only built for the 64-bit kernel, hence vgettimeofday.o is built by the generic rule in scripts/Makefile.build. This line does not provide anything useful. Signed-off-by: Masahiro Yamada Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20201223171142.707053-2-masahiroy@kernel.org --- arch/powerpc/kernel/vdso64/Makefile | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile index a06fb9b461fc..2813e3f98db6 100644 --- a/arch/powerpc/kernel/vdso64/Makefile +++ b/arch/powerpc/kernel/vdso64/Makefile @@ -32,8 +32,6 @@ asflags-y := -D__VDSO64__ -s targets += vdso64.lds CPPFLAGS_vdso64.lds += -P -C -U$(ARCH) -$(obj)/vgettimeofday.o: %.o: %.c FORCE - # link rule for the .so file, .lds has to be first $(obj)/vdso64.so.dbg: $(src)/vdso64.lds $(obj-vdso64) $(obj)/vgettimeofday.o FORCE $(call if_changed,vdso64ld_and_check) From aa880c6f3ee6dbd0d5ab02026a514ff8ea0a3328 Mon Sep 17 00:00:00 2001 From: Zyta Szpak Date: Thu, 21 Jan 2021 16:52:37 +0100 Subject: [PATCH 111/284] arm64: dts: ls1046a: fix dcfg address range Dcfg was overlapping with clockgen address space which resulted in failure in memory allocation for dcfg. According regs description dcfg size should not be bigger than 4KB. Signed-off-by: Zyta Szpak Fixes: 8126d88162a5 ("arm64: dts: add QorIQ LS1046A SoC support") Signed-off-by: Shawn Guo --- arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi index 025e1f587662..565934cbfa28 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi @@ -385,7 +385,7 @@ dcfg: dcfg@1ee0000 { compatible = "fsl,ls1046a-dcfg", "syscon"; - reg = <0x0 0x1ee0000 0x0 0x10000>; + reg = <0x0 0x1ee0000 0x0 0x1000>; big-endian; }; From 3e1f4a2e1184ae6ad7f4caf682ced9554141a0f4 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Thu, 28 Jan 2021 12:33:42 +0300 Subject: [PATCH 112/284] USB: gadget: legacy: fix an error code in eth_bind() This code should return -ENOMEM if the allocation fails but it currently returns success. Fixes: 9b95236eebdb ("usb: gadget: ether: allocate and init otg descriptor by otg capabilities") Signed-off-by: Dan Carpenter Link: https://lore.kernel.org/r/YBKE9rqVuJEOUWpW@mwanda Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/usb/gadget/legacy/ether.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/usb/gadget/legacy/ether.c b/drivers/usb/gadget/legacy/ether.c index 30313b233680..99c7fc0d1d59 100644 --- a/drivers/usb/gadget/legacy/ether.c +++ b/drivers/usb/gadget/legacy/ether.c @@ -403,8 +403,10 @@ static int eth_bind(struct usb_composite_dev *cdev) struct usb_descriptor_header *usb_desc; usb_desc = usb_otg_descriptor_alloc(gadget); - if (!usb_desc) + if (!usb_desc) { + status = -ENOMEM; goto fail1; + } usb_otg_descriptor_init(gadget, usb_desc); otg_desc[0] = usb_desc; otg_desc[1] = NULL; From 215164bfb7144c5890dd8021ff06e486939862d4 Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Fri, 29 Jan 2021 11:26:54 -0600 Subject: [PATCH 113/284] platform/x86: dell-wmi-sysman: fix a NULL pointer dereference An upcoming Dell platform is causing a NULL pointer dereference in dell-wmi-sysman initialization. Validate that the input from BIOS matches correct ACPI types and abort module initialization if it fails. Signed-off-by: Mario Limonciello Tested-by: Perry Yuan Link: https://lore.kernel.org/r/20210129172654.2326751-1-mario.limonciello@dell.com [hdegoede@redhat.com: Drop redundant release_attributes_data() call] Signed-off-by: Hans de Goede --- drivers/platform/x86/dell-wmi-sysman/sysman.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/platform/x86/dell-wmi-sysman/sysman.c b/drivers/platform/x86/dell-wmi-sysman/sysman.c index dc6dd531c996..cb81010ba1a2 100644 --- a/drivers/platform/x86/dell-wmi-sysman/sysman.c +++ b/drivers/platform/x86/dell-wmi-sysman/sysman.c @@ -419,13 +419,17 @@ static int init_bios_attributes(int attr_type, const char *guid) return retval; /* need to use specific instance_id and guid combination to get right data */ obj = get_wmiobj_pointer(instance_id, guid); - if (!obj) + if (!obj || obj->type != ACPI_TYPE_PACKAGE) return -ENODEV; elements = obj->package.elements; mutex_lock(&wmi_priv.mutex); while (elements) { /* sanity checking */ + if (elements[ATTR_NAME].type != ACPI_TYPE_STRING) { + pr_debug("incorrect element type\n"); + goto nextobj; + } if (strlen(elements[ATTR_NAME].string.pointer) == 0) { pr_debug("empty attribute found\n"); goto nextobj; From d8d2d38275c1b2d3936c0d809e0559e88912fbb5 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Mon, 1 Feb 2021 10:00:24 +0900 Subject: [PATCH 114/284] kbuild: remove PYTHON variable Python retired in 2020, and some distributions do not provide the 'python' command any more. As in commit 51839e29cb59 ("scripts: switch explicitly to Python 3"), we need to use more specific 'python3' to invoke scripts even if they are written in a way compatible with both Python 2 and 3. This commit removes the variable 'PYTHON', and switches the existing users to 'PYTHON3'. BTW, PEP 394 (https://www.python.org/dev/peps/pep-0394/) is a helpful material. Signed-off-by: Masahiro Yamada --- Documentation/Makefile | 2 +- Documentation/kbuild/makefiles.rst | 2 +- Makefile | 3 +-- arch/ia64/Makefile | 2 +- arch/ia64/scripts/unwcheck.py | 2 +- scripts/jobserver-exec | 2 +- 6 files changed, 6 insertions(+), 7 deletions(-) diff --git a/Documentation/Makefile b/Documentation/Makefile index 61a7310b49e0..9c42dde97671 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -75,7 +75,7 @@ quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4) cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/userspace-api/media $2 && \ PYTHONDONTWRITEBYTECODE=1 \ BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \ - $(PYTHON) $(srctree)/scripts/jobserver-exec \ + $(PYTHON3) $(srctree)/scripts/jobserver-exec \ $(SHELL) $(srctree)/Documentation/sphinx/parallel-wrapper.sh \ $(SPHINXBUILD) \ -b $2 \ diff --git a/Documentation/kbuild/makefiles.rst b/Documentation/kbuild/makefiles.rst index 9f6a11881951..300d8edcb994 100644 --- a/Documentation/kbuild/makefiles.rst +++ b/Documentation/kbuild/makefiles.rst @@ -755,7 +755,7 @@ more details, with real examples. bits on the scripts nonetheless. Kbuild provides variables $(CONFIG_SHELL), $(AWK), $(PERL), - $(PYTHON) and $(PYTHON3) to refer to interpreters for the respective + and $(PYTHON3) to refer to interpreters for the respective scripts. Example:: diff --git a/Makefile b/Makefile index b0e4767735dc..89217e4e68c6 100644 --- a/Makefile +++ b/Makefile @@ -452,7 +452,6 @@ AWK = awk INSTALLKERNEL := installkernel DEPMOD = depmod PERL = perl -PYTHON = python PYTHON3 = python3 CHECK = sparse BASH = bash @@ -508,7 +507,7 @@ CLANG_FLAGS := export ARCH SRCARCH CONFIG_SHELL BASH HOSTCC KBUILD_HOSTCFLAGS CROSS_COMPILE LD CC export CPP AR NM STRIP OBJCOPY OBJDUMP READELF PAHOLE RESOLVE_BTFIDS LEX YACC AWK INSTALLKERNEL -export PERL PYTHON PYTHON3 CHECK CHECKFLAGS MAKE UTS_MACHINE HOSTCXX +export PERL PYTHON3 CHECK CHECKFLAGS MAKE UTS_MACHINE HOSTCXX export KGZIP KBZIP2 KLZOP LZMA LZ4 XZ ZSTD export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE diff --git a/arch/ia64/Makefile b/arch/ia64/Makefile index 703b1c4f6d12..45d5368d6a99 100644 --- a/arch/ia64/Makefile +++ b/arch/ia64/Makefile @@ -69,7 +69,7 @@ vmlinux.bin: vmlinux FORCE $(call if_changed,objcopy) unwcheck: vmlinux - -$(Q)READELF=$(READELF) $(PYTHON) $(srctree)/arch/ia64/scripts/unwcheck.py $< + -$(Q)READELF=$(READELF) $(PYTHON3) $(srctree)/arch/ia64/scripts/unwcheck.py $< archclean: diff --git a/arch/ia64/scripts/unwcheck.py b/arch/ia64/scripts/unwcheck.py index bfd1b671e35f..9581742f0db2 100644 --- a/arch/ia64/scripts/unwcheck.py +++ b/arch/ia64/scripts/unwcheck.py @@ -1,4 +1,4 @@ -#!/usr/bin/env python +#!/usr/bin/env python3 # SPDX-License-Identifier: GPL-2.0 # # Usage: unwcheck.py FILE diff --git a/scripts/jobserver-exec b/scripts/jobserver-exec index 0fdb31a790a8..48d141e3ec56 100755 --- a/scripts/jobserver-exec +++ b/scripts/jobserver-exec @@ -1,4 +1,4 @@ -#!/usr/bin/env python +#!/usr/bin/env python3 # SPDX-License-Identifier: GPL-2.0+ # # This determines how many parallel tasks "make" is expecting, as it is From f92e04f764b86e55e522988e6f4b6082d19a2721 Mon Sep 17 00:00:00 2001 From: Fengnan Chang Date: Sat, 23 Jan 2021 11:32:31 +0800 Subject: [PATCH 115/284] mmc: core: Limit retries when analyse of SDIO tuples fails When analysing tuples fails we may loop indefinitely to retry. Let's avoid this by using a 10s timeout and bail if not completed earlier. Signed-off-by: Fengnan Chang Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210123033230.36442-1-fengnanchang@gmail.com Signed-off-by: Ulf Hansson --- drivers/mmc/core/sdio_cis.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/mmc/core/sdio_cis.c b/drivers/mmc/core/sdio_cis.c index 44bea5e4aeda..b23773583179 100644 --- a/drivers/mmc/core/sdio_cis.c +++ b/drivers/mmc/core/sdio_cis.c @@ -20,6 +20,8 @@ #include "sdio_cis.h" #include "sdio_ops.h" +#define SDIO_READ_CIS_TIMEOUT_MS (10 * 1000) /* 10s */ + static int cistpl_vers_1(struct mmc_card *card, struct sdio_func *func, const unsigned char *buf, unsigned size) { @@ -274,6 +276,8 @@ static int sdio_read_cis(struct mmc_card *card, struct sdio_func *func) do { unsigned char tpl_code, tpl_link; + unsigned long timeout = jiffies + + msecs_to_jiffies(SDIO_READ_CIS_TIMEOUT_MS); ret = mmc_io_rw_direct(card, 0, 0, ptr++, 0, &tpl_code); if (ret) @@ -326,6 +330,8 @@ static int sdio_read_cis(struct mmc_card *card, struct sdio_func *func) prev = &this->next; if (ret == -ENOENT) { + if (time_after(jiffies, timeout)) + break; /* warn about unknown tuples */ pr_warn_ratelimited("%s: queuing unknown" " CIS tuple 0x%02x (%u bytes)\n", From d7fb9c24209556478e65211d7a1f056f2d43cceb Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Tue, 26 Jan 2021 10:43:13 +0100 Subject: [PATCH 116/284] mmc: sdhci-pltfm: Fix linking err for sdhci-brcmstb The implementation of sdhci_pltfm_suspend() is only available when CONFIG_PM_SLEEP is set, which triggers a linking error: "undefined symbol: sdhci_pltfm_suspend" when building sdhci-brcmstb.c. Fix this by implementing the missing stubs when CONFIG_PM_SLEEP is unset. Reported-by: Arnd Bergmann Suggested-by: Florian Fainelli Fixes: 5b191dcba719 ("mmc: sdhci-brcmstb: Fix mmc timeout errors on S5 suspend") Cc: stable@vger.kernel.org Tested-By: Nicolas Schichan Acked-by: Arnd Bergmann Acked-by: Florian Fainelli Signed-off-by: Ulf Hansson --- drivers/mmc/host/sdhci-pltfm.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/sdhci-pltfm.h b/drivers/mmc/host/sdhci-pltfm.h index 6301b81cf573..9bd717ff784b 100644 --- a/drivers/mmc/host/sdhci-pltfm.h +++ b/drivers/mmc/host/sdhci-pltfm.h @@ -111,8 +111,13 @@ static inline void *sdhci_pltfm_priv(struct sdhci_pltfm_host *host) return host->private; } +extern const struct dev_pm_ops sdhci_pltfm_pmops; +#ifdef CONFIG_PM_SLEEP int sdhci_pltfm_suspend(struct device *dev); int sdhci_pltfm_resume(struct device *dev); -extern const struct dev_pm_ops sdhci_pltfm_pmops; +#else +static inline int sdhci_pltfm_suspend(struct device *dev) { return 0; } +static inline int sdhci_pltfm_resume(struct device *dev) { return 0; } +#endif #endif /* _DRIVERS_MMC_SDHCI_PLTFM_H */ From c07ea8d0b170c0cf6592a53981841c7973e142ea Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Tue, 5 Jan 2021 11:59:14 +0100 Subject: [PATCH 117/284] gpio: gpiolib: remove shadowed variable After refactoring, we had two variables for the same thing. Remove the second declaration, one is enough here. Found by cppcheck. drivers/gpio/gpiolib.c:2551:17: warning: Local variable 'ret' shadows outer variable [shadowVariable] Fixes: d377f56f34f5 ("gpio: gpiolib: Normalize return code variable name") Signed-off-by: Wolfram Sang Signed-off-by: Bartosz Golaszewski --- drivers/gpio/gpiolib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 3ba9c981f0b9..97eec8d8dbdc 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -2557,7 +2557,7 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, struct gpio_chip *gc = desc_array[i]->gdev->chip; unsigned long fastpath[2 * BITS_TO_LONGS(FASTPATH_NGPIO)]; unsigned long *mask, *bits; - int first, j, ret; + int first, j; if (likely(gc->ngpio <= FASTPATH_NGPIO)) { mask = fastpath; From 9917f0e3cdba7b9f1a23f70e3f70b1a106be54a8 Mon Sep 17 00:00:00 2001 From: Yoshihiro Shimoda Date: Mon, 1 Feb 2021 21:47:20 +0900 Subject: [PATCH 118/284] usb: renesas_usbhs: Clear pipe running flag in usbhs_pkt_pop() Should clear the pipe running flag in usbhs_pkt_pop(). Otherwise, we cannot use this pipe after dequeue was called while the pipe was running. Fixes: 8355b2b3082d ("usb: renesas_usbhs: fix the behavior of some usbhs_pkt_handle") Reported-by: Tho Vu Signed-off-by: Yoshihiro Shimoda Link: https://lore.kernel.org/r/1612183640-8898-1-git-send-email-yoshihiro.shimoda.uh@renesas.com Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/usb/renesas_usbhs/fifo.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/usb/renesas_usbhs/fifo.c b/drivers/usb/renesas_usbhs/fifo.c index ac9a81ae8216..e6fa13701808 100644 --- a/drivers/usb/renesas_usbhs/fifo.c +++ b/drivers/usb/renesas_usbhs/fifo.c @@ -126,6 +126,7 @@ struct usbhs_pkt *usbhs_pkt_pop(struct usbhs_pipe *pipe, struct usbhs_pkt *pkt) } usbhs_pipe_clear_without_sequence(pipe, 0, 0); + usbhs_pipe_running(pipe, 0); __usbhsf_pkt_del(pkt); } From 54f6a8af372213a254af6609758d99f7c0b6b5ad Mon Sep 17 00:00:00 2001 From: Chunfeng Yun Date: Mon, 1 Feb 2021 13:57:44 +0800 Subject: [PATCH 119/284] usb: xhci-mtk: skip dropping bandwidth of unchecked endpoints For those unchecked endpoints, we don't allocate bandwidth for them, so no need free the bandwidth, otherwise will decrease the allocated bandwidth. Meanwhile use xhci_dbg() instead of dev_dbg() to print logs and rename bw_ep_list_new as bw_ep_chk_list. Fixes: 1d69f9d901ef ("usb: xhci-mtk: fix unreleased bandwidth data") Cc: stable Reviewed-and-tested-by: Ikjoon Jang Signed-off-by: Chunfeng Yun Link: https://lore.kernel.org/r/1612159064-28413-1-git-send-email-chunfeng.yun@mediatek.com Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-mtk-sch.c | 61 ++++++++++++++++++--------------- drivers/usb/host/xhci-mtk.h | 4 ++- 2 files changed, 36 insertions(+), 29 deletions(-) diff --git a/drivers/usb/host/xhci-mtk-sch.c b/drivers/usb/host/xhci-mtk-sch.c index a313e75ff1c6..dee8a329076d 100644 --- a/drivers/usb/host/xhci-mtk-sch.c +++ b/drivers/usb/host/xhci-mtk-sch.c @@ -200,6 +200,7 @@ static struct mu3h_sch_ep_info *create_sch_ep(struct usb_device *udev, sch_ep->sch_tt = tt; sch_ep->ep = ep; + INIT_LIST_HEAD(&sch_ep->endpoint); INIT_LIST_HEAD(&sch_ep->tt_endpoint); return sch_ep; @@ -374,6 +375,7 @@ static void update_bus_bw(struct mu3h_sch_bw_info *sch_bw, sch_ep->bw_budget_table[j]; } } + sch_ep->allocated = used; } static int check_sch_tt(struct usb_device *udev, @@ -542,6 +544,22 @@ static int check_sch_bw(struct usb_device *udev, return 0; } +static void destroy_sch_ep(struct usb_device *udev, + struct mu3h_sch_bw_info *sch_bw, struct mu3h_sch_ep_info *sch_ep) +{ + /* only release ep bw check passed by check_sch_bw() */ + if (sch_ep->allocated) + update_bus_bw(sch_bw, sch_ep, 0); + + list_del(&sch_ep->endpoint); + + if (sch_ep->sch_tt) { + list_del(&sch_ep->tt_endpoint); + drop_tt(udev); + } + kfree(sch_ep); +} + static bool need_bw_sch(struct usb_host_endpoint *ep, enum usb_device_speed speed, int has_tt) { @@ -584,7 +602,7 @@ int xhci_mtk_sch_init(struct xhci_hcd_mtk *mtk) mtk->sch_array = sch_array; - INIT_LIST_HEAD(&mtk->bw_ep_list_new); + INIT_LIST_HEAD(&mtk->bw_ep_chk_list); return 0; } @@ -636,29 +654,12 @@ int xhci_mtk_add_ep_quirk(struct usb_hcd *hcd, struct usb_device *udev, setup_sch_info(udev, ep_ctx, sch_ep); - list_add_tail(&sch_ep->endpoint, &mtk->bw_ep_list_new); + list_add_tail(&sch_ep->endpoint, &mtk->bw_ep_chk_list); return 0; } EXPORT_SYMBOL_GPL(xhci_mtk_add_ep_quirk); -static void xhci_mtk_drop_ep(struct xhci_hcd_mtk *mtk, struct usb_device *udev, - struct mu3h_sch_ep_info *sch_ep) -{ - struct xhci_hcd *xhci = hcd_to_xhci(mtk->hcd); - int bw_index = get_bw_index(xhci, udev, sch_ep->ep); - struct mu3h_sch_bw_info *sch_bw = &mtk->sch_array[bw_index]; - - update_bus_bw(sch_bw, sch_ep, 0); - list_del(&sch_ep->endpoint); - - if (sch_ep->sch_tt) { - list_del(&sch_ep->tt_endpoint); - drop_tt(udev); - } - kfree(sch_ep); -} - void xhci_mtk_drop_ep_quirk(struct usb_hcd *hcd, struct usb_device *udev, struct usb_host_endpoint *ep) { @@ -688,9 +689,8 @@ void xhci_mtk_drop_ep_quirk(struct usb_hcd *hcd, struct usb_device *udev, sch_bw = &sch_array[bw_index]; list_for_each_entry_safe(sch_ep, tmp, &sch_bw->bw_ep_list, endpoint) { - if (sch_ep->ep == ep) { - xhci_mtk_drop_ep(mtk, udev, sch_ep); - } + if (sch_ep->ep == ep) + destroy_sch_ep(udev, sch_bw, sch_ep); } } EXPORT_SYMBOL_GPL(xhci_mtk_drop_ep_quirk); @@ -704,9 +704,9 @@ int xhci_mtk_check_bandwidth(struct usb_hcd *hcd, struct usb_device *udev) struct mu3h_sch_ep_info *sch_ep, *tmp; int bw_index, ret; - dev_dbg(&udev->dev, "%s\n", __func__); + xhci_dbg(xhci, "%s() udev %s\n", __func__, dev_name(&udev->dev)); - list_for_each_entry(sch_ep, &mtk->bw_ep_list_new, endpoint) { + list_for_each_entry(sch_ep, &mtk->bw_ep_chk_list, endpoint) { bw_index = get_bw_index(xhci, udev, sch_ep->ep); sch_bw = &mtk->sch_array[bw_index]; @@ -717,7 +717,7 @@ int xhci_mtk_check_bandwidth(struct usb_hcd *hcd, struct usb_device *udev) } } - list_for_each_entry_safe(sch_ep, tmp, &mtk->bw_ep_list_new, endpoint) { + list_for_each_entry_safe(sch_ep, tmp, &mtk->bw_ep_chk_list, endpoint) { struct xhci_ep_ctx *ep_ctx; struct usb_host_endpoint *ep = sch_ep->ep; unsigned int ep_index = xhci_get_endpoint_index(&ep->desc); @@ -746,12 +746,17 @@ EXPORT_SYMBOL_GPL(xhci_mtk_check_bandwidth); void xhci_mtk_reset_bandwidth(struct usb_hcd *hcd, struct usb_device *udev) { struct xhci_hcd_mtk *mtk = hcd_to_mtk(hcd); + struct xhci_hcd *xhci = hcd_to_xhci(hcd); + struct mu3h_sch_bw_info *sch_bw; struct mu3h_sch_ep_info *sch_ep, *tmp; + int bw_index; - dev_dbg(&udev->dev, "%s\n", __func__); + xhci_dbg(xhci, "%s() udev %s\n", __func__, dev_name(&udev->dev)); - list_for_each_entry_safe(sch_ep, tmp, &mtk->bw_ep_list_new, endpoint) { - xhci_mtk_drop_ep(mtk, udev, sch_ep); + list_for_each_entry_safe(sch_ep, tmp, &mtk->bw_ep_chk_list, endpoint) { + bw_index = get_bw_index(xhci, udev, sch_ep->ep); + sch_bw = &mtk->sch_array[bw_index]; + destroy_sch_ep(udev, sch_bw, sch_ep); } xhci_reset_bandwidth(hcd, udev); diff --git a/drivers/usb/host/xhci-mtk.h b/drivers/usb/host/xhci-mtk.h index 577f431c5c93..cbb09dfea62e 100644 --- a/drivers/usb/host/xhci-mtk.h +++ b/drivers/usb/host/xhci-mtk.h @@ -59,6 +59,7 @@ struct mu3h_sch_bw_info { * @ep_type: endpoint type * @maxpkt: max packet size of endpoint * @ep: address of usb_host_endpoint struct + * @allocated: the bandwidth is aready allocated from bus_bw * @offset: which uframe of the interval that transfer should be * scheduled first time within the interval * @repeat: the time gap between two uframes that transfers are @@ -86,6 +87,7 @@ struct mu3h_sch_ep_info { u32 ep_type; u32 maxpkt; void *ep; + bool allocated; /* * mtk xHCI scheduling information put into reserved DWs * in ep context @@ -130,8 +132,8 @@ struct mu3c_ippc_regs { struct xhci_hcd_mtk { struct device *dev; struct usb_hcd *hcd; - struct list_head bw_ep_list_new; struct mu3h_sch_bw_info *sch_array; + struct list_head bw_ep_chk_list; struct mu3c_ippc_regs __iomem *ippc_regs; bool has_ippc; int num_u2_ports; From 9ad22e165994ccb64d85b68499eaef97342c175b Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Thu, 28 Jan 2021 22:16:27 +0100 Subject: [PATCH 120/284] x86/debug: Fix DR6 handling Tom reported that one of the GDB test-cases failed, and Boris bisected it to commit: d53d9bc0cf78 ("x86/debug: Change thread.debugreg6 to thread.virtual_dr6") The debugging session led us to commit: 6c0aca288e72 ("x86: Ignore trap bits on single step exceptions") It turns out that TF and data breakpoints are both traps and will be merged, while instruction breakpoints are faults and will not be merged. This means 6c0aca288e72 is wrong, only TF and instruction breakpoints need to be excluded while TF and data breakpoints can be merged. [ bp: Massage commit message. ] Fixes: d53d9bc0cf78 ("x86/debug: Change thread.debugreg6 to thread.virtual_dr6") Fixes: 6c0aca288e72 ("x86: Ignore trap bits on single step exceptions") Reported-by: Tom de Vries Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Borislav Petkov Cc: Link: https://lkml.kernel.org/r/YBMAbQGACujjfz%2Bi@hirez.programming.kicks-ass.net Link: https://lkml.kernel.org/r/20210128211627.GB4348@worktop.programming.kicks-ass.net --- arch/x86/kernel/hw_breakpoint.c | 43 +++++++++++++++------------------ 1 file changed, 20 insertions(+), 23 deletions(-) diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c index 03aa33b58165..6694c0f8e6c1 100644 --- a/arch/x86/kernel/hw_breakpoint.c +++ b/arch/x86/kernel/hw_breakpoint.c @@ -491,15 +491,12 @@ static int hw_breakpoint_handler(struct die_args *args) struct perf_event *bp; unsigned long *dr6_p; unsigned long dr6; + bool bpx; /* The DR6 value is pointed by args->err */ dr6_p = (unsigned long *)ERR_PTR(args->err); dr6 = *dr6_p; - /* If it's a single step, TRAP bits are random */ - if (dr6 & DR_STEP) - return NOTIFY_DONE; - /* Do an early return if no trap bits are set in DR6 */ if ((dr6 & DR_TRAP_BITS) == 0) return NOTIFY_DONE; @@ -509,28 +506,29 @@ static int hw_breakpoint_handler(struct die_args *args) if (likely(!(dr6 & (DR_TRAP0 << i)))) continue; - /* - * The counter may be concurrently released but that can only - * occur from a call_rcu() path. We can then safely fetch - * the breakpoint, use its callback, touch its counter - * while we are in an rcu_read_lock() path. - */ - rcu_read_lock(); - bp = this_cpu_read(bp_per_reg[i]); + if (!bp) + continue; + + bpx = bp->hw.info.type == X86_BREAKPOINT_EXECUTE; + + /* + * TF and data breakpoints are traps and can be merged, however + * instruction breakpoints are faults and will be raised + * separately. + * + * However DR6 can indicate both TF and instruction + * breakpoints. In that case take TF as that has precedence and + * delay the instruction breakpoint for the next exception. + */ + if (bpx && (dr6 & DR_STEP)) + continue; + /* * Reset the 'i'th TRAP bit in dr6 to denote completion of * exception handling */ (*dr6_p) &= ~(DR_TRAP0 << i); - /* - * bp can be NULL due to lazy debug register switching - * or due to concurrent perf counter removing. - */ - if (!bp) { - rcu_read_unlock(); - break; - } perf_bp_event(bp, args->regs); @@ -538,11 +536,10 @@ static int hw_breakpoint_handler(struct die_args *args) * Set up resume flag to avoid breakpoint recursion when * returning back to origin. */ - if (bp->hw.info.type == X86_BREAKPOINT_EXECUTE) + if (bpx) args->regs->flags |= X86_EFLAGS_RF; - - rcu_read_unlock(); } + /* * Further processing in do_debug() is needed for a) user-space * breakpoints (to generate signals) and b) when the system has From bad4c6eb5eaa8300e065bd4426727db5141d687d Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 31 Jan 2021 16:16:23 -0500 Subject: [PATCH 121/284] SUNRPC: Fix NFS READs that start at non-page-aligned offsets Anj Duvnjak reports that the Kodi.tv NFS client is not able to read video files from a v5.10.11 Linux NFS server. The new sendpage-based TCP sendto logic was not attentive to non- zero page_base values. nfsd_splice_read() sets that field when a READ payload starts in the middle of a page. The Linux NFS client rarely emits an NFS READ that is not page- aligned. All of my testing so far has been with Linux clients, so I missed this one. Reported-by: A. Duvnjak BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=211471 Fixes: 4a85a6a3320b ("SUNRPC: Handle TCP socket sends with kernel_sendpage() again") Signed-off-by: Chuck Lever Tested-by: A. Duvnjak --- net/sunrpc/svcsock.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index c9766d07eb81..5a809c64dc7b 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1113,14 +1113,15 @@ static int svc_tcp_sendmsg(struct socket *sock, struct msghdr *msg, unsigned int offset, len, remaining; struct bio_vec *bvec; - bvec = xdr->bvec; - offset = xdr->page_base; + bvec = xdr->bvec + (xdr->page_base >> PAGE_SHIFT); + offset = offset_in_page(xdr->page_base); remaining = xdr->page_len; flags = MSG_MORE | MSG_SENDPAGE_NOTLAST; while (remaining > 0) { if (remaining <= PAGE_SIZE && tail->iov_len == 0) flags = 0; - len = min(remaining, bvec->bv_len); + + len = min(remaining, bvec->bv_len - offset); ret = kernel_sendpage(sock, bvec->bv_page, bvec->bv_offset + offset, len, flags); From 7131636e7ea5b50ca910f8953f6365ef2d1f741c Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Thu, 28 Jan 2021 11:45:00 -0500 Subject: [PATCH 122/284] KVM: x86: Allow guests to see MSR_IA32_TSX_CTRL even if tsx=off Userspace that does not know about KVM_GET_MSR_FEATURE_INDEX_LIST will generally use the default value for MSR_IA32_ARCH_CAPABILITIES. When this happens and the host has tsx=on, it is possible to end up with virtual machines that have HLE and RTM disabled, but TSX_CTRL available. If the fleet is then switched to tsx=off, kvm_get_arch_capabilities() will clear the ARCH_CAP_TSX_CTRL_MSR bit and it will not be possible to use the tsx=off hosts as migration destinations, even though the guests do not have TSX enabled. To allow this migration, allow guests to write to their TSX_CTRL MSR, while keeping the host MSR unchanged for the entire life of the guests. This ensures that TSX remains disabled and also saves MSR reads and writes, and it's okay to do because with tsx=off we know that guests will not have the HLE and RTM features in their CPUID. (If userspace sets bogus CPUID data, we do not expect HLE and RTM to work in guests anyway). Cc: stable@vger.kernel.org Fixes: cbbaa2727aa3 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES") Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/vmx.c | 17 +++++++++++++---- arch/x86/kvm/x86.c | 26 +++++++++++++++++--------- 2 files changed, 30 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index cc60b1fc3ee7..eb69fef57485 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6860,11 +6860,20 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu) switch (index) { case MSR_IA32_TSX_CTRL: /* - * No need to pass TSX_CTRL_CPUID_CLEAR through, so - * let's avoid changing CPUID bits under the host - * kernel's feet. + * TSX_CTRL_CPUID_CLEAR is handled in the CPUID + * interception. Keep the host value unchanged to avoid + * changing CPUID bits under the host kernel's feet. + * + * hle=0, rtm=0, tsx_ctrl=1 can be found with some + * combinations of new kernel and old userspace. If + * those guests run on a tsx=off host, do allow guests + * to use TSX_CTRL, but do not change the value on the + * host so that TSX remains always disabled. */ - vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR; + if (boot_cpu_has(X86_FEATURE_RTM)) + vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR; + else + vmx->guest_uret_msrs[j].mask = 0; break; default: vmx->guest_uret_msrs[j].mask = -1ull; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 76bce832cade..b05a1fe9dae9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1394,16 +1394,24 @@ static u64 kvm_get_arch_capabilities(void) if (!boot_cpu_has_bug(X86_BUG_MDS)) data |= ARCH_CAP_MDS_NO; - /* - * On TAA affected systems: - * - nothing to do if TSX is disabled on the host. - * - we emulate TSX_CTRL if present on the host. - * This lets the guest use VERW to clear CPU buffers. - */ - if (!boot_cpu_has(X86_FEATURE_RTM)) - data &= ~(ARCH_CAP_TAA_NO | ARCH_CAP_TSX_CTRL_MSR); - else if (!boot_cpu_has_bug(X86_BUG_TAA)) + if (!boot_cpu_has(X86_FEATURE_RTM)) { + /* + * If RTM=0 because the kernel has disabled TSX, the host might + * have TAA_NO or TSX_CTRL. Clear TAA_NO (the guest sees RTM=0 + * and therefore knows that there cannot be TAA) but keep + * TSX_CTRL: some buggy userspaces leave it set on tsx=on hosts, + * and we want to allow migrating those guests to tsx=off hosts. + */ + data &= ~ARCH_CAP_TAA_NO; + } else if (!boot_cpu_has_bug(X86_BUG_TAA)) { data |= ARCH_CAP_TAA_NO; + } else { + /* + * Nothing to do here; we emulate TSX_CTRL if present on the + * host so the guest can choose between disabling TSX or + * using VERW to clear CPU buffers. + */ + } return data; } From b66f9bab1279c281c83dea077c5e808527e3ef69 Mon Sep 17 00:00:00 2001 From: Zheng Zhan Liang Date: Mon, 1 Feb 2021 13:53:10 +0800 Subject: [PATCH 123/284] KVM/x86: assign hva with the right value to vm_munmap the pages Cc: Paolo Bonzini Cc: Wanpeng Li Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Zheng Zhan Liang Message-Id: <20210201055310.267029-1-zhengzhanliang@huorong.cn> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b05a1fe9dae9..42b28d0f0311 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10502,7 +10502,7 @@ void __user * __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, return 0; old_npages = slot->npages; - hva = 0; + hva = slot->userspace_addr; } for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { From 4683d758f48e6ae87d3d3493ffa00aceb955ee16 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Mon, 1 Feb 2021 15:28:43 +0100 Subject: [PATCH 124/284] KVM: x86: Supplement __cr4_reserved_bits() with X86_FEATURE_PCID check Commit 7a873e455567 ("KVM: selftests: Verify supported CR4 bits can be set before KVM_SET_CPUID2") reveals that KVM allows to set X86_CR4_PCIDE even when PCID support is missing: ==== Test Assertion Failure ==== x86_64/set_sregs_test.c:41: rc pid=6956 tid=6956 - Invalid argument 1 0x000000000040177d: test_cr4_feature_bit at set_sregs_test.c:41 2 0x00000000004014fc: main at set_sregs_test.c:119 3 0x00007f2d9346d041: ?? ??:0 4 0x000000000040164d: _start at ??:? KVM allowed unsupported CR4 bit (0x20000) Add X86_FEATURE_PCID feature check to __cr4_reserved_bits() to make kvm_is_valid_cr4() fail. Signed-off-by: Vitaly Kuznetsov Message-Id: <20210201142843.108190-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index c5ee0f5ce0f1..0f727b50bd3d 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -425,6 +425,8 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type); __reserved_bits |= X86_CR4_UMIP; \ if (!__cpu_has(__c, X86_FEATURE_VMX)) \ __reserved_bits |= X86_CR4_VMXE; \ + if (!__cpu_has(__c, X86_FEATURE_PCID)) \ + __reserved_bits |= X86_CR4_PCIDE; \ __reserved_bits; \ }) From 0f347aa07f15b346a001e557f4a0a45069f7fa3d Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Mon, 1 Feb 2021 17:34:19 +0100 Subject: [PATCH 125/284] ACPI: scan: Fix battery devices sometimes never binding With the new 2 step scanning process, which defers instantiating some ACPI-devices based on their _DEP to the second step, the following may happen: 1. During the first acpi_walk_namespace(acpi_bus_check_add) call acpi_scan_check_dep() gets called on the Battery ACPI dev handle and adds one or more deps for this handle to the acpi_dep_list 2. During the first acpi_bus_attach() call one or more of the suppliers of these deps get their driver attached and acpi_walk_dep_device_list(supplier_handle) gets called. At this point acpi_bus_get_device(dep->consumer) get called, but since the battery has DEPs it has not been instantiated during the first acpi_walk_namespace(acpi_bus_check_add), so the acpi_bus_get_device(dep->consumer) call fails. Before this commit, acpi_walk_dep_device_list() would now continue *without* removing the acpi_dep_data entry for this supplier,consumer pair from the acpi_dep_list. 3. During the second acpi_walk_namespace(acpi_bus_check_add) call an acpi_device gets instantiated for the battery and acpi_scan_dep_init() gets called to initialize its dep_unmet val. Before this commit, the dep_unmet count would include DEPs for suppliers for which acpi_walk_dep_device_list(supplier_handle) has already been called, so it will never become 0 and the ACPI battery driver will never get attached / bind. Fix the ACPI battery driver never binding in this scenario by making acpi_walk_dep_device_list() always remove matching acpi_dep_data entries independent of the acpi_bus_get_device(dep->consumer) call succeeding or not. Fixes: 71da201f38df ("ACPI: scan: Defer enumeration of devices with _DEP lists") Signed-off-by: Hans de Goede Signed-off-by: Rafael J. Wysocki --- drivers/acpi/scan.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index 1db063b02f63..22566b4b3150 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -2123,12 +2123,12 @@ void acpi_walk_dep_device_list(acpi_handle handle) list_for_each_entry_safe(dep, tmp, &acpi_dep_list, node) { if (dep->supplier == handle) { acpi_bus_get_device(dep->consumer, &adev); - if (!adev) - continue; - adev->dep_unmet--; - if (!adev->dep_unmet) - acpi_bus_attach(adev, true); + if (adev) { + adev->dep_unmet--; + if (!adev->dep_unmet) + acpi_bus_attach(adev, true); + } list_del(&dep->node); kfree(dep); From 8acf417805a5f5c69e9ff66f14cab022c2755161 Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Mon, 1 Feb 2021 19:00:07 +0000 Subject: [PATCH 126/284] x86/split_lock: Enable the split lock feature on another Alder Lake CPU Add Alder Lake mobile processor to CPU list to enumerate and enable the split lock feature. Signed-off-by: Fenghua Yu Signed-off-by: Borislav Petkov Reviewed-by: Tony Luck Link: https://lkml.kernel.org/r/20210201190007.4031869-1-fenghua.yu@intel.com --- arch/x86/kernel/cpu/intel.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 59a1e3ce3f14..816fdbec795a 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -1159,6 +1159,7 @@ static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = { X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE, 1), X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, 1), X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, 1), + X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, 1), {} }; From 7018c897c2f243d4b5f1b94bc6b4831a7eab80fb Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Mon, 1 Feb 2021 16:20:40 -0800 Subject: [PATCH 127/284] libnvdimm/dimm: Avoid race between probe and available_slots_show() Richard reports that the following test: (while true; do cat /sys/bus/nd/devices/nmem*/available_slots 2>&1 > /dev/null done) & while true; do for i in $(seq 0 4); do echo nmem$i > /sys/bus/nd/drivers/nvdimm/bind done for i in $(seq 0 4); do echo nmem$i > /sys/bus/nd/drivers/nvdimm/unbind done done ...fails with a crash signature like: divide error: 0000 [#1] SMP KASAN PTI RIP: 0010:nd_label_nfree+0x134/0x1a0 [libnvdimm] [..] Call Trace: available_slots_show+0x4e/0x120 [libnvdimm] dev_attr_show+0x42/0x80 ? memset+0x20/0x40 sysfs_kf_seq_show+0x218/0x410 The root cause is that available_slots_show() consults driver-data, but fails to synchronize against device-unbind setting up a TOCTOU race to access uninitialized memory. Validate driver-data under the device-lock. Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure") Cc: Cc: Vishal Verma Cc: Dave Jiang Cc: Ira Weiny Cc: Coly Li Reported-by: Richard Palethorpe Acked-by: Richard Palethorpe Signed-off-by: Dan Williams --- drivers/nvdimm/dimm_devs.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c index b59032e0859b..9d208570d059 100644 --- a/drivers/nvdimm/dimm_devs.c +++ b/drivers/nvdimm/dimm_devs.c @@ -335,16 +335,16 @@ static ssize_t state_show(struct device *dev, struct device_attribute *attr, } static DEVICE_ATTR_RO(state); -static ssize_t available_slots_show(struct device *dev, - struct device_attribute *attr, char *buf) +static ssize_t __available_slots_show(struct nvdimm_drvdata *ndd, char *buf) { - struct nvdimm_drvdata *ndd = dev_get_drvdata(dev); + struct device *dev; ssize_t rc; u32 nfree; if (!ndd) return -ENXIO; + dev = ndd->dev; nvdimm_bus_lock(dev); nfree = nd_label_nfree(ndd); if (nfree - 1 > nfree) { @@ -356,6 +356,18 @@ static ssize_t available_slots_show(struct device *dev, nvdimm_bus_unlock(dev); return rc; } + +static ssize_t available_slots_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + ssize_t rc; + + nd_device_lock(dev); + rc = __available_slots_show(dev_get_drvdata(dev), buf); + nd_device_unlock(dev); + + return rc; +} static DEVICE_ATTR_RO(available_slots); __weak ssize_t security_show(struct device *dev, From 8d8d1dbefc423d42d626cf5b81aac214870ebaab Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 1 Feb 2021 20:36:54 -0600 Subject: [PATCH 128/284] smb3: Fix out-of-bounds bug in SMB2_negotiate() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit While addressing some warnings generated by -Warray-bounds, I found this bug that was introduced back in 2017: CC [M] fs/cifs/smb2pdu.o fs/cifs/smb2pdu.c: In function ‘SMB2_negotiate’: fs/cifs/smb2pdu.c:822:16: warning: array subscript 1 is above array bounds of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds] 822 | req->Dialects[1] = cpu_to_le16(SMB30_PROT_ID); | ~~~~~~~~~~~~~^~~ fs/cifs/smb2pdu.c:823:16: warning: array subscript 2 is above array bounds of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds] 823 | req->Dialects[2] = cpu_to_le16(SMB302_PROT_ID); | ~~~~~~~~~~~~~^~~ fs/cifs/smb2pdu.c:824:16: warning: array subscript 3 is above array bounds of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds] 824 | req->Dialects[3] = cpu_to_le16(SMB311_PROT_ID); | ~~~~~~~~~~~~~^~~ fs/cifs/smb2pdu.c:816:16: warning: array subscript 1 is above array bounds of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds] 816 | req->Dialects[1] = cpu_to_le16(SMB302_PROT_ID); | ~~~~~~~~~~~~~^~~ At the time, the size of array _Dialects_ was changed from 1 to 3 in struct validate_negotiate_info_req, and then in 2019 it was changed from 3 to 4, but those changes were never made in struct smb2_negotiate_req, which has led to a 3 and a half years old out-of-bounds bug in function SMB2_negotiate() (fs/cifs/smb2pdu.c). Fix this by increasing the size of array _Dialects_ in struct smb2_negotiate_req to 4. Fixes: 9764c02fcbad ("SMB3: Add support for multidialect negotiate (SMB2.1 and later)") Fixes: d5c7076b772a ("smb3: add smb3.1.1 to default dialect list") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva Signed-off-by: Steve French --- fs/cifs/smb2pdu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/cifs/smb2pdu.h b/fs/cifs/smb2pdu.h index d85edf5d1429..a5a9e33c0d73 100644 --- a/fs/cifs/smb2pdu.h +++ b/fs/cifs/smb2pdu.h @@ -286,7 +286,7 @@ struct smb2_negotiate_req { __le32 NegotiateContextOffset; /* SMB3.1.1 only. MBZ earlier */ __le16 NegotiateContextCount; /* SMB3.1.1 only. MBZ earlier */ __le16 Reserved2; - __le16 Dialects[1]; /* One dialect (vers=) at a time for now */ + __le16 Dialects[4]; /* BB expand this if autonegotiate > 4 dialects */ } __packed; /* Dialects */ From eaf5bfe37db871031232d2bf2535b6ca92afbad8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= Date: Thu, 28 Jan 2021 17:59:44 +0200 Subject: [PATCH 129/284] drm/i915: Skip vswing programming for TBT MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In thunderbolt mode the PHY is owned by the thunderbolt controller. We are not supposed to touch it. So skip the vswing programming as well (we already skipped the other steps not applicable to TBT). Touching this stuff could supposedly interfere with the PHY programming done by the thunderbolt controller. Cc: stable@vger.kernel.org Signed-off-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20210128155948.13678-1-ville.syrjala@linux.intel.com Reviewed-by: Imre Deak (cherry picked from commit f8c6b615b921d8a1bcd74870f9105e62b0bceff3) Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/display/intel_ddi.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c b/drivers/gpu/drm/i915/display/intel_ddi.c index bf17365857ca..e1e3ac12f979 100644 --- a/drivers/gpu/drm/i915/display/intel_ddi.c +++ b/drivers/gpu/drm/i915/display/intel_ddi.c @@ -2754,6 +2754,9 @@ static void icl_mg_phy_ddi_vswing_sequence(struct intel_encoder *encoder, int n_entries, ln; u32 val; + if (enc_to_dig_port(encoder)->tc_mode == TC_PORT_TBT_ALT) + return; + ddi_translations = icl_get_mg_buf_trans(encoder, crtc_state, &n_entries); if (level >= n_entries) { drm_dbg_kms(&dev_priv->drm, @@ -2890,6 +2893,9 @@ tgl_dkl_phy_ddi_vswing_sequence(struct intel_encoder *encoder, u32 val, dpcnt_mask, dpcnt_val; int n_entries, ln; + if (enc_to_dig_port(encoder)->tc_mode == TC_PORT_TBT_ALT) + return; + ddi_translations = tgl_get_dkl_buf_trans(encoder, crtc_state, &n_entries); if (level >= n_entries) From 425cbd1fce10d4d68188123404d1a302a6939e0a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= Date: Thu, 28 Jan 2021 17:59:45 +0200 Subject: [PATCH 130/284] drm/i915: Extract intel_ddi_power_up_lanes() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Reduce the copypasta by pulling the combo PHY lane power up stuff into a helper. We'll have a third user soon. Cc: stable@vger.kernel.org Signed-off-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20210128155948.13678-2-ville.syrjala@linux.intel.com Reviewed-by: Imre Deak (cherry picked from commit 5cdf706fb91a6e4e6af799bb957c4d598e6a067b) Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/display/intel_ddi.c | 35 +++++++++++++----------- 1 file changed, 19 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c b/drivers/gpu/drm/i915/display/intel_ddi.c index e1e3ac12f979..e39ef68a480e 100644 --- a/drivers/gpu/drm/i915/display/intel_ddi.c +++ b/drivers/gpu/drm/i915/display/intel_ddi.c @@ -3537,6 +3537,23 @@ static void intel_ddi_disable_fec_state(struct intel_encoder *encoder, intel_de_posting_read(dev_priv, dp_tp_ctl_reg(encoder, crtc_state)); } +static void intel_ddi_power_up_lanes(struct intel_encoder *encoder, + const struct intel_crtc_state *crtc_state) +{ + struct drm_i915_private *i915 = to_i915(encoder->base.dev); + struct intel_digital_port *dig_port = enc_to_dig_port(encoder); + enum phy phy = intel_port_to_phy(i915, encoder->port); + + if (intel_phy_is_combo(i915, phy)) { + bool lane_reversal = + dig_port->saved_port_bits & DDI_BUF_PORT_REVERSAL; + + intel_combo_phy_power_up_lanes(i915, phy, false, + crtc_state->lane_count, + lane_reversal); + } +} + static void tgl_ddi_pre_enable_dp(struct intel_atomic_state *state, struct intel_encoder *encoder, const struct intel_crtc_state *crtc_state, @@ -3626,14 +3643,7 @@ static void tgl_ddi_pre_enable_dp(struct intel_atomic_state *state, * 7.f Combo PHY: Configure PORT_CL_DW10 Static Power Down to power up * the used lanes of the DDI. */ - if (intel_phy_is_combo(dev_priv, phy)) { - bool lane_reversal = - dig_port->saved_port_bits & DDI_BUF_PORT_REVERSAL; - - intel_combo_phy_power_up_lanes(dev_priv, phy, false, - crtc_state->lane_count, - lane_reversal); - } + intel_ddi_power_up_lanes(encoder, crtc_state); /* * 7.g Configure and enable DDI_BUF_CTL @@ -3718,14 +3728,7 @@ static void hsw_ddi_pre_enable_dp(struct intel_atomic_state *state, else intel_prepare_dp_ddi_buffers(encoder, crtc_state); - if (intel_phy_is_combo(dev_priv, phy)) { - bool lane_reversal = - dig_port->saved_port_bits & DDI_BUF_PORT_REVERSAL; - - intel_combo_phy_power_up_lanes(dev_priv, phy, false, - crtc_state->lane_count, - lane_reversal); - } + intel_ddi_power_up_lanes(encoder, crtc_state); intel_ddi_init_dp_buf_reg(encoder, crtc_state); if (!is_mst) From fad9bae9ee5d578afbe6380c82e4715efaddf118 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= Date: Thu, 28 Jan 2021 17:59:46 +0200 Subject: [PATCH 131/284] drm/i915: Power up combo PHY lanes for for HDMI as well MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Currently we only explicitly power up the combo PHY lanes for DP. The spec says we should do it for HDMI as well. Cc: stable@vger.kernel.org Signed-off-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20210128155948.13678-3-ville.syrjala@linux.intel.com Reviewed-by: Imre Deak (cherry picked from commit 1e0cb7bef35f0d1aed383bf69a209df218b807c9) Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/display/intel_ddi.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c b/drivers/gpu/drm/i915/display/intel_ddi.c index e39ef68a480e..dc13d1814d95 100644 --- a/drivers/gpu/drm/i915/display/intel_ddi.c +++ b/drivers/gpu/drm/i915/display/intel_ddi.c @@ -4214,6 +4214,8 @@ static void intel_enable_ddi_hdmi(struct intel_atomic_state *state, intel_de_write(dev_priv, reg, val); } + intel_ddi_power_up_lanes(encoder, crtc_state); + /* In HDMI/DVI mode, the port width, and swing/emphasis values * are ignored so nothing special needs to be done besides * enabling the port. From 538e4a8c571efdf131834431e0c14808bcfb1004 Mon Sep 17 00:00:00 2001 From: Thorsten Leemhuis Date: Fri, 29 Jan 2021 06:24:42 +0100 Subject: [PATCH 132/284] nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs Some Kingston A2000 NVMe SSDs sooner or later get confused and stop working when they use the deepest APST sleep while running Linux. The system then crashes and one has to cold boot it to get the SSD working again. Kingston seems to known about this since at least mid-September 2020: https://bbs.archlinux.org/viewtopic.php?pid=1926994#p1926994 Someone working for a German company representing Kingston to the German press confirmed to me Kingston engineering is aware of the issue and investigating; the person stated that to their current knowledge only the deepest APST sleep state causes trouble. Therefore, make Linux avoid it for now by applying the NVME_QUIRK_NO_DEEPEST_PS to this SSD. I have two such SSDs, but it seems the problem doesn't occur with them. I hence couldn't verify if this patch really fixes the problem, but all the data in front of me suggests it should. This patch can easily be reverted or improved upon if a better solution surfaces. FWIW, there are many reports about the issue scattered around the web; most of the users disabled APST completely to make things work, some just made Linux avoid the deepest sleep state: https://bugzilla.kernel.org/show_bug.cgi?id=195039#c65 https://bugzilla.kernel.org/show_bug.cgi?id=195039#c73 https://bugzilla.kernel.org/show_bug.cgi?id=195039#c74 https://bugzilla.kernel.org/show_bug.cgi?id=195039#c78 https://bugzilla.kernel.org/show_bug.cgi?id=195039#c79 https://bugzilla.kernel.org/show_bug.cgi?id=195039#c80 https://askubuntu.com/questions/1222049/nvmekingston-a2000-sometimes-stops-giving-response-in-ubuntu-18-04dell-inspir https://community.acer.com/en/discussion/604326/m-2-nvme-ssd-aspire-517-51g-issue-compatibility-kingston-a2000-linux-ubuntu For the record, some data from 'nvme id-ctrl /dev/nvme0' NVME Identify Controller: vid : 0x2646 ssvid : 0x2646 mn : KINGSTON SA2000M81000G fr : S5Z42105 [...] ps 0 : mp:9.00W operational enlat:0 exlat:0 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- ps 1 : mp:4.60W operational enlat:0 exlat:0 rrt:1 rrl:1 rwt:1 rwl:1 idle_power:- active_power:- ps 2 : mp:3.80W operational enlat:0 exlat:0 rrt:2 rrl:2 rwt:2 rwl:2 idle_power:- active_power:- ps 3 : mp:0.0450W non-operational enlat:2000 exlat:2000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- ps 4 : mp:0.0040W non-operational enlat:15000 exlat:15000 rrt:4 rrl:4 rwt:4 rwl:4 idle_power:- active_power:- Cc: stable@vger.kernel.org # 4.14+ Signed-off-by: Thorsten Leemhuis Signed-off-by: Christoph Hellwig --- drivers/nvme/host/pci.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 81e6389b2042..7aa33bb3692f 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3259,6 +3259,8 @@ static const struct pci_device_id nvme_id_table[] = { .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, { PCI_DEVICE(0x1d97, 0x2263), /* SPCC */ .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, + { PCI_DEVICE(0x2646, 0x2263), /* KINGSTON A2000 NVMe SSD */ + .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001), .driver_data = NVME_QUIRK_SINGLE_VECTOR }, { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2003) }, From c9e95c39280530200cdd0bbd2670e6334a81970b Mon Sep 17 00:00:00 2001 From: Claus Stovgaard Date: Mon, 1 Feb 2021 22:08:22 +0100 Subject: [PATCH 133/284] nvme-pci: ignore the subsysem NQN on Phison E16 Tested both with Corsairs firmware 11.3 and 13.0 for the Corsairs MP600 and both have the issue as reported by the kernel. nvme nvme0: missing or invalid SUBNQN field. Signed-off-by: Claus Stovgaard Signed-off-by: Christoph Hellwig --- drivers/nvme/host/pci.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 7aa33bb3692f..6bad4d4dcdf0 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3242,6 +3242,8 @@ static const struct pci_device_id nvme_id_table[] = { { PCI_DEVICE(0x144d, 0xa822), /* Samsung PM1725a */ .driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY | NVME_QUIRK_IGNORE_DEV_SUBNQN, }, + { PCI_DEVICE(0x1987, 0x5016), /* Phison E16 */ + .driver_data = NVME_QUIRK_IGNORE_DEV_SUBNQN, }, { PCI_DEVICE(0x1d1d, 0x1f1f), /* LighNVM qemu device */ .driver_data = NVME_QUIRK_LIGHTNVM, }, { PCI_DEVICE(0x1d1d, 0x2807), /* CNEX WL */ From 46121fa7c2dc55bcbb729b6a2ab323aa1e8986cf Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Mon, 1 Feb 2021 11:48:44 -0800 Subject: [PATCH 134/284] update the email address for Keith Bush Redirect my older email addresses that are in the git logs. Signed-off-by: Keith Busch Signed-off-by: Christoph Hellwig --- .mailmap | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.mailmap b/.mailmap index 632700cee55c..c52c6ca7aa95 100644 --- a/.mailmap +++ b/.mailmap @@ -180,6 +180,8 @@ Kees Cook Kees Cook Kees Cook Kees Cook +Keith Busch +Keith Busch Kenneth W Chen Konstantin Khlebnikov Konstantin Khlebnikov From 00f9a08fbc3c703b71842a5425c1eb82053c8a70 Mon Sep 17 00:00:00 2001 From: Andres Calderon Jaramillo Date: Tue, 2 Feb 2021 10:45:53 +0200 Subject: [PATCH 135/284] drm/i915/display: Prevent double YUV range correction on HDR planes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Prevent the ICL HDR plane pipeline from performing YUV color range correction twice when the input is in limited range. This is done by removing the limited-range code from icl_program_input_csc(). Before this patch the following could happen: user space gives us a YUV buffer in limited range; per the pipeline in [1], the plane would first go through a "YUV Range correct" stage that expands the range; the plane would then go through the "Input CSC" stage which would also expand the range because icl_program_input_csc() would use a matrix and an offset that assume limited-range input; this would ultimately cause dark and light colors to appear darker and lighter than they should respectively. This is an issue because if a buffer switches between being scanned out and being composited with the GPU, the user will see a color difference. If this switching happens quickly and frequently, the user will perceive this as a flickering. [1] https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-icllp-vol12-displayengine_0.pdf#page=281 Cc: stable@vger.kernel.org Signed-off-by: Andres Calderon Jaramillo Signed-off-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20201215224219.3896256-1-andrescj@google.com (cherry picked from commit fed387572040e84ead53852a7820e30a30e515d0) Signed-off-by: Jani Nikula Link: https://patchwork.freedesktop.org/patch/msgid/20210202084553.30691-1-ville.syrjala@linux.intel.com --- drivers/gpu/drm/i915/display/intel_display.c | 2 + drivers/gpu/drm/i915/display/intel_sprite.c | 65 +++----------------- 2 files changed, 12 insertions(+), 55 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 53a00cf3fa32..39396248f388 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -4807,6 +4807,8 @@ u32 glk_plane_color_ctl(const struct intel_crtc_state *crtc_state, plane_color_ctl |= PLANE_COLOR_YUV_RANGE_CORRECTION_DISABLE; } else if (fb->format->is_yuv) { plane_color_ctl |= PLANE_COLOR_INPUT_CSC_ENABLE; + if (plane_state->hw.color_range == DRM_COLOR_YCBCR_FULL_RANGE) + plane_color_ctl |= PLANE_COLOR_YUV_RANGE_CORRECTION_DISABLE; } return plane_color_ctl; diff --git a/drivers/gpu/drm/i915/display/intel_sprite.c b/drivers/gpu/drm/i915/display/intel_sprite.c index 019a2d6d807a..3da2544fa1c0 100644 --- a/drivers/gpu/drm/i915/display/intel_sprite.c +++ b/drivers/gpu/drm/i915/display/intel_sprite.c @@ -618,13 +618,19 @@ skl_program_scaler(struct intel_plane *plane, /* Preoffset values for YUV to RGB Conversion */ #define PREOFF_YUV_TO_RGB_HI 0x1800 -#define PREOFF_YUV_TO_RGB_ME 0x1F00 +#define PREOFF_YUV_TO_RGB_ME 0x0000 #define PREOFF_YUV_TO_RGB_LO 0x1800 #define ROFF(x) (((x) & 0xffff) << 16) #define GOFF(x) (((x) & 0xffff) << 0) #define BOFF(x) (((x) & 0xffff) << 16) +/* + * Programs the input color space conversion stage for ICL HDR planes. + * Note that it is assumed that this stage always happens after YUV + * range correction. Thus, the input to this stage is assumed to be + * in full-range YCbCr. + */ static void icl_program_input_csc(struct intel_plane *plane, const struct intel_crtc_state *crtc_state, @@ -672,52 +678,7 @@ icl_program_input_csc(struct intel_plane *plane, 0x0, 0x7800, 0x7F10, }, }; - - /* Matrix for Limited Range to Full Range Conversion */ - static const u16 input_csc_matrix_lr[][9] = { - /* - * BT.601 Limted range YCbCr -> full range RGB - * The matrix required is : - * [1.164384, 0.000, 1.596027, - * 1.164384, -0.39175, -0.812813, - * 1.164384, 2.017232, 0.0000] - */ - [DRM_COLOR_YCBCR_BT601] = { - 0x7CC8, 0x7950, 0x0, - 0x8D00, 0x7950, 0x9C88, - 0x0, 0x7950, 0x6810, - }, - /* - * BT.709 Limited range YCbCr -> full range RGB - * The matrix required is : - * [1.164384, 0.000, 1.792741, - * 1.164384, -0.213249, -0.532909, - * 1.164384, 2.112402, 0.0000] - */ - [DRM_COLOR_YCBCR_BT709] = { - 0x7E58, 0x7950, 0x0, - 0x8888, 0x7950, 0xADA8, - 0x0, 0x7950, 0x6870, - }, - /* - * BT.2020 Limited range YCbCr -> full range RGB - * The matrix required is : - * [1.164, 0.000, 1.678, - * 1.164, -0.1873, -0.6504, - * 1.164, 2.1417, 0.0000] - */ - [DRM_COLOR_YCBCR_BT2020] = { - 0x7D70, 0x7950, 0x0, - 0x8A68, 0x7950, 0xAC00, - 0x0, 0x7950, 0x6890, - }, - }; - const u16 *csc; - - if (plane_state->hw.color_range == DRM_COLOR_YCBCR_FULL_RANGE) - csc = input_csc_matrix[plane_state->hw.color_encoding]; - else - csc = input_csc_matrix_lr[plane_state->hw.color_encoding]; + const u16 *csc = input_csc_matrix[plane_state->hw.color_encoding]; intel_de_write_fw(dev_priv, PLANE_INPUT_CSC_COEFF(pipe, plane_id, 0), ROFF(csc[0]) | GOFF(csc[1])); @@ -734,14 +695,8 @@ icl_program_input_csc(struct intel_plane *plane, intel_de_write_fw(dev_priv, PLANE_INPUT_CSC_PREOFF(pipe, plane_id, 0), PREOFF_YUV_TO_RGB_HI); - if (plane_state->hw.color_range == DRM_COLOR_YCBCR_FULL_RANGE) - intel_de_write_fw(dev_priv, - PLANE_INPUT_CSC_PREOFF(pipe, plane_id, 1), - 0); - else - intel_de_write_fw(dev_priv, - PLANE_INPUT_CSC_PREOFF(pipe, plane_id, 1), - PREOFF_YUV_TO_RGB_ME); + intel_de_write_fw(dev_priv, PLANE_INPUT_CSC_PREOFF(pipe, plane_id, 1), + PREOFF_YUV_TO_RGB_ME); intel_de_write_fw(dev_priv, PLANE_INPUT_CSC_PREOFF(pipe, plane_id, 2), PREOFF_YUV_TO_RGB_LO); intel_de_write_fw(dev_priv, From 24321ac668e452a4942598533d267805f291fdc9 Mon Sep 17 00:00:00 2001 From: Raoni Fassina Firmino Date: Mon, 1 Feb 2021 17:05:05 -0300 Subject: [PATCH 136/284] powerpc/64/signal: Fix regression in __kernel_sigtramp_rt64() semantics Commit 0138ba5783ae ("powerpc/64/signal: Balance return predictor stack in signal trampoline") changed __kernel_sigtramp_rt64() VDSO and trampoline code, and introduced a regression in the way glibc's backtrace()[1] detects the signal-handler stack frame. Apart from the practical implications, __kernel_sigtramp_rt64() was a VDSO function with the semantics that it is a function you can call from userspace to end a signal handling. Now this semantics are no longer valid. I believe the aforementioned change affects all releases since 5.9. This patch tries to fix both the semantics and practical aspect of __kernel_sigtramp_rt64() returning it to the previous code, whilst keeping the intended behaviour of 0138ba5783ae by adding a new symbol to serve as the jump target from the kernel to the trampoline. Now the trampoline has two parts, a new entry point and the old return point. [1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-January/223194.html Fixes: 0138ba5783ae ("powerpc/64/signal: Balance return predictor stack in signal trampoline") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Raoni Fassina Firmino Acked-by: Nicholas Piggin [mpe: Minor tweaks to change log formatting, add stable tag] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20210201200505.iz46ubcizipnkcxe@work-tp --- arch/powerpc/kernel/vdso64/sigtramp.S | 11 ++++++++++- arch/powerpc/kernel/vdso64/vdso64.lds.S | 2 +- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/vdso64/sigtramp.S b/arch/powerpc/kernel/vdso64/sigtramp.S index bbf68cd01088..2d4067561293 100644 --- a/arch/powerpc/kernel/vdso64/sigtramp.S +++ b/arch/powerpc/kernel/vdso64/sigtramp.S @@ -15,11 +15,20 @@ .text +/* + * __kernel_start_sigtramp_rt64 and __kernel_sigtramp_rt64 together + * are one function split in two parts. The kernel jumps to the former + * and the signal handler indirectly (by blr) returns to the latter. + * __kernel_sigtramp_rt64 needs to point to the return address so + * glibc can correctly identify the trampoline stack frame. + */ .balign 8 .balign IFETCH_ALIGN_BYTES -V_FUNCTION_BEGIN(__kernel_sigtramp_rt64) +V_FUNCTION_BEGIN(__kernel_start_sigtramp_rt64) .Lsigrt_start: bctrl /* call the handler */ +V_FUNCTION_END(__kernel_start_sigtramp_rt64) +V_FUNCTION_BEGIN(__kernel_sigtramp_rt64) addi r1, r1, __SIGNAL_FRAMESIZE li r0,__NR_rt_sigreturn sc diff --git a/arch/powerpc/kernel/vdso64/vdso64.lds.S b/arch/powerpc/kernel/vdso64/vdso64.lds.S index 6164d1a1ba11..2f3c359cacd3 100644 --- a/arch/powerpc/kernel/vdso64/vdso64.lds.S +++ b/arch/powerpc/kernel/vdso64/vdso64.lds.S @@ -131,4 +131,4 @@ VERSION /* * Make the sigreturn code visible to the kernel. */ -VDSO_sigtramp_rt64 = __kernel_sigtramp_rt64; +VDSO_sigtramp_rt64 = __kernel_start_sigtramp_rt64; From 9f5dc9974298aea9690c7a0f7007f1af37198230 Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Thu, 26 Nov 2020 14:04:07 +0000 Subject: [PATCH 137/284] drm/i915/gt: Move the breadcrumb to the signaler if completed upon cancel If while we are cancelling the breadcrumb signaling, we find that the request is already completed, move it to the irq signaler and let it be signaled. v2: Tweak reference counting so that we only acquire a new reference on adding to a signal list, as opposed to a hidden i915_request_put of the caller's reference. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-5-chris@chris-wilson.co.uk (cherry picked from commit 85cc2917a3965a3a747a6407d6e3028cfeb1534e) Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 41 +++++++++++---------- 1 file changed, 22 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index 0625cbb3b431..484c74d71739 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -187,18 +187,6 @@ static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl) intel_engine_add_retire(b->irq_engine, tl); } -static bool __signal_request(struct i915_request *rq) -{ - GEM_BUG_ON(test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)); - - if (!__dma_fence_signal(&rq->fence)) { - i915_request_put(rq); - return false; - } - - return true; -} - static struct llist_node * slist_add(struct llist_node *node, struct llist_node *head) { @@ -269,9 +257,11 @@ static void signal_irq_work(struct irq_work *work) release = remove_signaling_context(b, ce); spin_unlock(&ce->signal_lock); - if (__signal_request(rq)) + if (__dma_fence_signal(&rq->fence)) /* We own signal_node now, xfer to local list */ signal = slist_add(&rq->signal_node, signal); + else + i915_request_put(rq); if (release) { add_retire(b, ce->timeline); @@ -358,6 +348,17 @@ void intel_breadcrumbs_free(struct intel_breadcrumbs *b) kfree(b); } +static void irq_signal_request(struct i915_request *rq, + struct intel_breadcrumbs *b) +{ + if (!__dma_fence_signal(&rq->fence)) + return; + + i915_request_get(rq); + if (llist_add(&rq->signal_node, &b->signaled_requests)) + irq_work_queue(&b->irq_work); +} + static void insert_breadcrumb(struct i915_request *rq) { struct intel_breadcrumbs *b = READ_ONCE(rq->engine)->breadcrumbs; @@ -367,17 +368,13 @@ static void insert_breadcrumb(struct i915_request *rq) if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) return; - i915_request_get(rq); - /* * If the request is already completed, we can transfer it * straight onto a signaled list, and queue the irq worker for * its signal completion. */ if (__i915_request_is_complete(rq)) { - if (__signal_request(rq) && - llist_add(&rq->signal_node, &b->signaled_requests)) - irq_work_queue(&b->irq_work); + irq_signal_request(rq, b); return; } @@ -408,6 +405,8 @@ static void insert_breadcrumb(struct i915_request *rq) break; } } + + i915_request_get(rq); list_add_rcu(&rq->signal_link, pos); GEM_BUG_ON(!check_signal_order(ce, rq)); GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags)); @@ -448,6 +447,7 @@ bool i915_request_enable_breadcrumb(struct i915_request *rq) void i915_request_cancel_breadcrumb(struct i915_request *rq) { + struct intel_breadcrumbs *b = READ_ONCE(rq->engine)->breadcrumbs; struct intel_context *ce = rq->context; bool release; @@ -456,11 +456,14 @@ void i915_request_cancel_breadcrumb(struct i915_request *rq) spin_lock(&ce->signal_lock); list_del_rcu(&rq->signal_link); - release = remove_signaling_context(rq->engine->breadcrumbs, ce); + release = remove_signaling_context(b, ce); spin_unlock(&ce->signal_lock); if (release) intel_context_put(ce); + if (__i915_request_is_complete(rq)) + irq_signal_request(rq, b); + i915_request_put(rq); } From e4747cb3ec3c232d65c84cbe77633abd5871fda3 Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Tue, 19 Jan 2021 16:20:57 +0000 Subject: [PATCH 138/284] drm/i915/gt: Close race between enable_breadcrumbs and cancel_breadcrumbs If we enable_breadcrumbs for a request while that request is being removed from HW; we may see that the request is active as we take the ce->signal_lock and proceed to attach the request to ce->signals. However, during unsubmission after marking the request as inactive, we see that the request has not yet been added to ce->signals and so skip the removal. Pull the check during cancel_breadcrumbs under the same spinlock as enabling so that we the two tests are consistent in enable/cancel. Otherwise, we may insert a request onto ce->signals that we expect should not be there: intel_context_remove_breadcrumbs:488 GEM_BUG_ON(!__i915_request_is_complete(rq)) While updating, we can note that we are always called with irqs-disabled, due to the engine->active.lock being held at the single caller, and so remove the irqsave/restore making it symmetric to enable_breadcrumbs. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2931 Fixes: c18636f76344 ("drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs") Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Cc: Andi Shyti Cc: # v5.10+ Reviewed-by: Tvrtko Ursulin Link: https://patchwork.freedesktop.org/patch/msgid/20210119162057.31097-1-chris@chris-wilson.co.uk (cherry picked from commit e7004ea4f5f528f5a5018f0b70cab36d25315498) Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index 484c74d71739..1d1757584f49 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -451,10 +451,12 @@ void i915_request_cancel_breadcrumb(struct i915_request *rq) struct intel_context *ce = rq->context; bool release; - if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) - return; - spin_lock(&ce->signal_lock); + if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) { + spin_unlock(&ce->signal_lock); + return; + } + list_del_rcu(&rq->signal_link); release = remove_signaling_context(b, ce); spin_unlock(&ce->signal_lock); From 761c70a52586a9214b29026d384d2c01b73661a8 Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Mon, 25 Jan 2021 13:21:58 +0000 Subject: [PATCH 139/284] drm/i915/gem: Drop lru bumping on display unpinning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Simplify the frontbuffer unpin by removing the lock requirement. The LRU bumping was primarily to protect the GTT from being evicted and from frontbuffers being eagerly shrunk. Now we protect frontbuffers from the shrinker, and we avoid accidentally evicting from the GTT, so the benefit from bumping LRU is no more, and we can save more time by not. Reported-and-tested-by: Matti Hämäläinen Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2905 Fixes: c1793ba86a41 ("drm/i915: Add ww locking to pin_to_display_plane, v2.") Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld Link: https://patchwork.freedesktop.org/patch/msgid/20210119214336.1463-6-chris@chris-wilson.co.uk (cherry picked from commit 14ca83eece9565a2d2177291ceb122982dc38420) Cc: Joonas Lahtinen Cc: Jani Nikula Cc: # v5.10+ Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/display/intel_display.c | 7 +-- drivers/gpu/drm/i915/display/intel_overlay.c | 4 +- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 45 -------------------- drivers/gpu/drm/i915/gem/i915_gem_object.h | 1 - 4 files changed, 4 insertions(+), 53 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 39396248f388..61be6bed9162 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -2309,7 +2309,7 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, */ ret = i915_vma_pin_fence(vma); if (ret != 0 && INTEL_GEN(dev_priv) < 4) { - i915_gem_object_unpin_from_display_plane(vma); + i915_vma_unpin(vma); vma = ERR_PTR(ret); goto err; } @@ -2327,12 +2327,9 @@ err: void intel_unpin_fb_vma(struct i915_vma *vma, unsigned long flags) { - i915_gem_object_lock(vma->obj, NULL); if (flags & PLANE_HAS_FENCE) i915_vma_unpin_fence(vma); - i915_gem_object_unpin_from_display_plane(vma); - i915_gem_object_unlock(vma->obj); - + i915_vma_unpin(vma); i915_vma_put(vma); } diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c b/drivers/gpu/drm/i915/display/intel_overlay.c index 52b4f6193b4c..0095c8cac9b4 100644 --- a/drivers/gpu/drm/i915/display/intel_overlay.c +++ b/drivers/gpu/drm/i915/display/intel_overlay.c @@ -359,7 +359,7 @@ static void intel_overlay_release_old_vma(struct intel_overlay *overlay) intel_frontbuffer_flip_complete(overlay->i915, INTEL_FRONTBUFFER_OVERLAY(overlay->crtc->pipe)); - i915_gem_object_unpin_from_display_plane(vma); + i915_vma_unpin(vma); i915_vma_put(vma); } @@ -860,7 +860,7 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay, return 0; out_unpin: - i915_gem_object_unpin_from_display_plane(vma); + i915_vma_unpin(vma); out_pin_section: atomic_dec(&dev_priv->gpu_error.pending_fb_pin); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index fcce6909f201..3d435bfff764 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -387,48 +387,6 @@ err: return vma; } -static void i915_gem_object_bump_inactive_ggtt(struct drm_i915_gem_object *obj) -{ - struct drm_i915_private *i915 = to_i915(obj->base.dev); - struct i915_vma *vma; - - if (list_empty(&obj->vma.list)) - return; - - mutex_lock(&i915->ggtt.vm.mutex); - spin_lock(&obj->vma.lock); - for_each_ggtt_vma(vma, obj) { - if (!drm_mm_node_allocated(&vma->node)) - continue; - - GEM_BUG_ON(vma->vm != &i915->ggtt.vm); - list_move_tail(&vma->vm_link, &vma->vm->bound_list); - } - spin_unlock(&obj->vma.lock); - mutex_unlock(&i915->ggtt.vm.mutex); - - if (i915_gem_object_is_shrinkable(obj)) { - unsigned long flags; - - spin_lock_irqsave(&i915->mm.obj_lock, flags); - - if (obj->mm.madv == I915_MADV_WILLNEED && - !atomic_read(&obj->mm.shrink_pin)) - list_move_tail(&obj->mm.link, &i915->mm.shrink_list); - - spin_unlock_irqrestore(&i915->mm.obj_lock, flags); - } -} - -void -i915_gem_object_unpin_from_display_plane(struct i915_vma *vma) -{ - /* Bump the LRU to try and avoid premature eviction whilst flipping */ - i915_gem_object_bump_inactive_ggtt(vma->obj); - - i915_vma_unpin(vma); -} - /** * Moves a single object to the CPU read, and possibly write domain. * @obj: object to act on @@ -569,9 +527,6 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, else err = i915_gem_object_set_to_cpu_domain(obj, write_domain); - /* And bump the LRU for this access */ - i915_gem_object_bump_inactive_ggtt(obj); - i915_gem_object_unlock(obj); if (write_domain) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index be14486f63a7..4556afe18f16 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -486,7 +486,6 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, u32 alignment, const struct i915_ggtt_view *view, unsigned int flags); -void i915_gem_object_unpin_from_display_plane(struct i915_vma *vma); void i915_gem_object_make_unshrinkable(struct drm_i915_gem_object *obj); void i915_gem_object_make_shrinkable(struct drm_i915_gem_object *obj); From c8b186a8d54d7e12d28e9f9686cb00ff18fc2ab2 Mon Sep 17 00:00:00 2001 From: Alexey Kardashevskiy Date: Tue, 2 Feb 2021 18:23:26 +1100 Subject: [PATCH 140/284] tracepoint: Fix race between tracing and removing tracepoint When executing a tracepoint, the tracepoint's func is dereferenced twice - in __DO_TRACE() (where the returned pointer is checked) and later on in __traceiter_##_name where the returned pointer is dereferenced without checking which leads to races against tracepoint_removal_sync() and crashes. This adds a check before referencing the pointer in tracepoint_ptr_deref. Link: https://lkml.kernel.org/r/20210202072326.120557-1-aik@ozlabs.ru Cc: stable@vger.kernel.org Fixes: d25e37d89dd2f ("tracepoint: Optimize using static_call()") Acked-by: Peter Zijlstra (Intel) Signed-off-by: Alexey Kardashevskiy Signed-off-by: Steven Rostedt (VMware) --- include/linux/tracepoint.h | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 0f21617f1a66..966ed8980327 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -307,11 +307,13 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) \ it_func_ptr = \ rcu_dereference_raw((&__tracepoint_##_name)->funcs); \ - do { \ - it_func = (it_func_ptr)->func; \ - __data = (it_func_ptr)->data; \ - ((void(*)(void *, proto))(it_func))(__data, args); \ - } while ((++it_func_ptr)->func); \ + if (it_func_ptr) { \ + do { \ + it_func = (it_func_ptr)->func; \ + __data = (it_func_ptr)->data; \ + ((void(*)(void *, proto))(it_func))(__data, args); \ + } while ((++it_func_ptr)->func); \ + } \ return 0; \ } \ DEFINE_STATIC_CALL(tp_func_##_name, __traceiter_##_name); From 4c9fb5d9140802db4db9f66c23887f43174e113c Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Tue, 2 Feb 2021 15:54:19 +0100 Subject: [PATCH 141/284] iommu: Check dev->iommu in dev_iommu_priv_get() before dereferencing it The dev_iommu_priv_get() needs a similar check to dev_iommu_fwspec_get() to make sure no NULL-ptr is dereferenced. Fixes: 05a0542b456e1 ("iommu/amd: Store dev_data as device iommu private data") Cc: stable@vger.kernel.org # v5.8+ Link: https://lore.kernel.org/r/20210202145419.29143-1-joro@8bytes.org Reference: https://bugzilla.kernel.org/show_bug.cgi?id=211241 Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index b3f0e2018c62..efa96263b81b 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -616,7 +616,10 @@ static inline void dev_iommu_fwspec_set(struct device *dev, static inline void *dev_iommu_priv_get(struct device *dev) { - return dev->iommu->priv; + if (dev->iommu) + return dev->iommu->priv; + else + return NULL; } static inline void dev_iommu_priv_set(struct device *dev, void *priv) From 83404d581471775f37f85e5261ec0d09407d8bed Mon Sep 17 00:00:00 2001 From: Imre Deak Date: Mon, 25 Jan 2021 19:36:35 +0200 Subject: [PATCH 142/284] drm/dp/mst: Export drm_dp_get_vc_payload_bw() This function will be needed by the next patch where the driver calculates the BW based on driver specific parameters, so export it. At the same time sanitize the function params, passing the more natural link rate instead of the encoding of the same rate. v2: - Fix function documentation. (Lyude) Cc: Lyude Paul Cc: Ville Syrjala Cc: Cc: dri-devel@lists.freedesktop.org Signed-off-by: Imre Deak Reviewed-by: Lyude Paul Link: https://patchwork.freedesktop.org/patch/msgid/20210125173636.1733812-1-imre.deak@intel.com (cherry picked from commit a321fc2b4e60fc1b39517d26c8104351636a6062) Signed-off-by: Jani Nikula --- drivers/gpu/drm/drm_dp_mst_topology.c | 24 ++++++++++++++++++------ include/drm/drm_dp_mst_helper.h | 1 + 2 files changed, 19 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 0401b2f47500..8781deefeae3 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -3629,14 +3629,26 @@ static int drm_dp_send_up_ack_reply(struct drm_dp_mst_topology_mgr *mgr, return 0; } -static int drm_dp_get_vc_payload_bw(u8 dp_link_bw, u8 dp_link_count) +/** + * drm_dp_get_vc_payload_bw - get the VC payload BW for an MST link + * @link_rate: link rate in 10kbits/s units + * @link_lane_count: lane count + * + * Calculate the total bandwidth of a MultiStream Transport link. The returned + * value is in units of PBNs/(timeslots/1 MTP). This value can be used to + * convert the number of PBNs required for a given stream to the number of + * timeslots this stream requires in each MTP. + */ +int drm_dp_get_vc_payload_bw(int link_rate, int link_lane_count) { - if (dp_link_bw == 0 || dp_link_count == 0) - DRM_DEBUG_KMS("invalid link bandwidth in DPCD: %x (link count: %d)\n", - dp_link_bw, dp_link_count); + if (link_rate == 0 || link_lane_count == 0) + DRM_DEBUG_KMS("invalid link rate/lane count: (%d / %d)\n", + link_rate, link_lane_count); - return dp_link_bw * dp_link_count / 2; + /* See DP v2.0 2.6.4.2, VCPayload_Bandwidth_for_OneTimeSlotPer_MTP_Allocation */ + return link_rate * link_lane_count / 54000; } +EXPORT_SYMBOL(drm_dp_get_vc_payload_bw); /** * drm_dp_read_mst_cap() - check whether or not a sink supports MST @@ -3692,7 +3704,7 @@ int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms goto out_unlock; } - mgr->pbn_div = drm_dp_get_vc_payload_bw(mgr->dpcd[1], + mgr->pbn_div = drm_dp_get_vc_payload_bw(drm_dp_bw_code_to_link_rate(mgr->dpcd[1]), mgr->dpcd[2] & DP_MAX_LANE_COUNT_MASK); if (mgr->pbn_div == 0) { ret = -EINVAL; diff --git a/include/drm/drm_dp_mst_helper.h b/include/drm/drm_dp_mst_helper.h index f5e92fe9151c..bd1c39907b92 100644 --- a/include/drm/drm_dp_mst_helper.h +++ b/include/drm/drm_dp_mst_helper.h @@ -783,6 +783,7 @@ drm_dp_mst_detect_port(struct drm_connector *connector, struct edid *drm_dp_mst_get_edid(struct drm_connector *connector, struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port); +int drm_dp_get_vc_payload_bw(int link_rate, int link_lane_count); int drm_dp_calc_pbn_mode(int clock, int bpp, bool dsc); From 882554042d138dbc6fb1a43017d0b9c3b38ee5f5 Mon Sep 17 00:00:00 2001 From: Imre Deak Date: Mon, 25 Jan 2021 19:36:36 +0200 Subject: [PATCH 143/284] drm/i915: Fix the MST PBN divider calculation Atm the driver will calculate a wrong MST timeslots/MTP (aka time unit) value for MST streams if the link parameters (link rate or lane count) are limited in a way independent of the sink capabilities (reported by DPCD). One example of such a limitation is when a MUX between the sink and source connects only a limited number of lanes to the display and connects the rest of the lanes to other peripherals (USB). Another issue is that atm MST core calculates the divider based on the backwards compatible DPCD (at address 0x0000) vs. the extended capability info (at address 0x2200). This can result in leaving some part of the MST BW unused (For instance in case of the WD19TB dock). Fix the above two issues by calculating the PBN divider value based on the rate and lane count link parameters that the driver uses for all other computation. Bugzilla: https://gitlab.freedesktop.org/drm/intel/-/issues/2977 Cc: Lyude Paul Cc: Ville Syrjala Cc: Signed-off-by: Imre Deak Reviewed-by: Ville Syrjala Link: https://patchwork.freedesktop.org/patch/msgid/20210125173636.1733812-2-imre.deak@intel.com (cherry picked from commit b59c27cab257cfbff939615a87b72bce83925710) Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/display/intel_dp_mst.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c b/drivers/gpu/drm/i915/display/intel_dp_mst.c index 27f04aed8764..3286b232be0b 100644 --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c @@ -69,7 +69,9 @@ static int intel_dp_mst_compute_link_config(struct intel_encoder *encoder, slots = drm_dp_atomic_find_vcpi_slots(state, &intel_dp->mst_mgr, connector->port, - crtc_state->pbn, 0); + crtc_state->pbn, + drm_dp_get_vc_payload_bw(crtc_state->port_clock, + crtc_state->lane_count)); if (slots == -EDEADLK) return slots; if (slots >= 0) From 2051c890caa50f9d8658335cb9d39bfcb5680a7e Mon Sep 17 00:00:00 2001 From: Imre Deak Date: Tue, 29 Dec 2020 19:22:00 +0200 Subject: [PATCH 144/284] drm/i915/dp: Move intel_dp_set_signal_levels() to intel_dp_link_training.c MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit intel_dp_set_signal_levels() is needed for link training, so move it to intel_dp_link_training.c. Signed-off-by: Imre Deak Reviewed-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20201229172201.4155327-1-imre.deak@intel.com (cherry picked from commit 1c6e527d6947ea77bebabe15bbeaa765a87b70ca) Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/display/intel_dp.c | 18 ------------------ drivers/gpu/drm/i915/display/intel_dp.h | 3 --- .../drm/i915/display/intel_dp_link_training.c | 18 ++++++++++++++++++ .../drm/i915/display/intel_dp_link_training.h | 2 ++ 4 files changed, 20 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c index 09123e8625c4..f6eec5c206d8 100644 --- a/drivers/gpu/drm/i915/display/intel_dp.c +++ b/drivers/gpu/drm/i915/display/intel_dp.c @@ -4637,24 +4637,6 @@ ivb_cpu_edp_set_signal_levels(struct intel_dp *intel_dp, intel_de_posting_read(dev_priv, intel_dp->output_reg); } -void intel_dp_set_signal_levels(struct intel_dp *intel_dp, - const struct intel_crtc_state *crtc_state) -{ - struct drm_i915_private *dev_priv = dp_to_i915(intel_dp); - u8 train_set = intel_dp->train_set[0]; - - drm_dbg_kms(&dev_priv->drm, "Using vswing level %d%s\n", - train_set & DP_TRAIN_VOLTAGE_SWING_MASK, - train_set & DP_TRAIN_MAX_SWING_REACHED ? " (max)" : ""); - drm_dbg_kms(&dev_priv->drm, "Using pre-emphasis level %d%s\n", - (train_set & DP_TRAIN_PRE_EMPHASIS_MASK) >> - DP_TRAIN_PRE_EMPHASIS_SHIFT, - train_set & DP_TRAIN_MAX_PRE_EMPHASIS_REACHED ? - " (max)" : ""); - - intel_dp->set_signal_levels(intel_dp, crtc_state); -} - void intel_dp_program_link_training_pattern(struct intel_dp *intel_dp, const struct intel_crtc_state *crtc_state, diff --git a/drivers/gpu/drm/i915/display/intel_dp.h b/drivers/gpu/drm/i915/display/intel_dp.h index 05f7ddf7a795..6620f9efdcbb 100644 --- a/drivers/gpu/drm/i915/display/intel_dp.h +++ b/drivers/gpu/drm/i915/display/intel_dp.h @@ -96,9 +96,6 @@ void intel_dp_program_link_training_pattern(struct intel_dp *intel_dp, const struct intel_crtc_state *crtc_state, u8 dp_train_pat); -void -intel_dp_set_signal_levels(struct intel_dp *intel_dp, - const struct intel_crtc_state *crtc_state); void intel_dp_compute_rate(struct intel_dp *intel_dp, int port_clock, u8 *link_bw, u8 *rate_select); bool intel_dp_source_supports_hbr2(struct intel_dp *intel_dp); diff --git a/drivers/gpu/drm/i915/display/intel_dp_link_training.c b/drivers/gpu/drm/i915/display/intel_dp_link_training.c index 91d3979902d0..7876e781f698 100644 --- a/drivers/gpu/drm/i915/display/intel_dp_link_training.c +++ b/drivers/gpu/drm/i915/display/intel_dp_link_training.c @@ -334,6 +334,24 @@ intel_dp_set_link_train(struct intel_dp *intel_dp, return drm_dp_dpcd_write(&intel_dp->aux, reg, buf, len) == len; } +void intel_dp_set_signal_levels(struct intel_dp *intel_dp, + const struct intel_crtc_state *crtc_state) +{ + struct drm_i915_private *dev_priv = dp_to_i915(intel_dp); + u8 train_set = intel_dp->train_set[0]; + + drm_dbg_kms(&dev_priv->drm, "Using vswing level %d%s\n", + train_set & DP_TRAIN_VOLTAGE_SWING_MASK, + train_set & DP_TRAIN_MAX_SWING_REACHED ? " (max)" : ""); + drm_dbg_kms(&dev_priv->drm, "Using pre-emphasis level %d%s\n", + (train_set & DP_TRAIN_PRE_EMPHASIS_MASK) >> + DP_TRAIN_PRE_EMPHASIS_SHIFT, + train_set & DP_TRAIN_MAX_PRE_EMPHASIS_REACHED ? + " (max)" : ""); + + intel_dp->set_signal_levels(intel_dp, crtc_state); +} + static bool intel_dp_reset_link_train(struct intel_dp *intel_dp, const struct intel_crtc_state *crtc_state, diff --git a/drivers/gpu/drm/i915/display/intel_dp_link_training.h b/drivers/gpu/drm/i915/display/intel_dp_link_training.h index 86905aa24db7..c3110c032bc2 100644 --- a/drivers/gpu/drm/i915/display/intel_dp_link_training.h +++ b/drivers/gpu/drm/i915/display/intel_dp_link_training.h @@ -17,6 +17,8 @@ void intel_dp_get_adjust_train(struct intel_dp *intel_dp, const struct intel_crtc_state *crtc_state, enum drm_dp_phy dp_phy, const u8 link_status[DP_LINK_STATUS_SIZE]); +void intel_dp_set_signal_levels(struct intel_dp *intel_dp, + const struct intel_crtc_state *crtc_state); void intel_dp_start_link_train(struct intel_dp *intel_dp, const struct intel_crtc_state *crtc_state); void intel_dp_stop_link_train(struct intel_dp *intel_dp, From 88ebe1f572e284ecfe088648e0ae93803a75a459 Mon Sep 17 00:00:00 2001 From: Imre Deak Date: Tue, 29 Dec 2020 19:22:01 +0200 Subject: [PATCH 145/284] drm/i915/dp: Fix LTTPR vswing/pre-emp setting in non-transparent mode MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The DP PHY vswing/pre-emphasis level programming the driver does is related to the DPTX -> first LTTPR link segment only. Accordingly it should be only programmed when link training the first LTTPR and kept as-is when training subsequent LTTPRs and the DPRX. For these latter PHYs the vs/pe levels will be set in response to writing the DP_TRAINING_LANEx_SET_PHY_REPEATERy DPCD registers (by an upstream LTTPR TX PHY snooping this write access of its downstream LTTPR/DPRX RX PHY). The above is also described in DP Standard v2.0 under 3.6.6.1. While at it simplify and add the LTTPR that is link trained to the debug message in intel_dp_set_signal_levels(). Fixes: b30edfd8d0b4 ("drm/i915: Switch to LTTPR non-transparent mode link training") Cc: Ville Syrjälä Signed-off-by: Imre Deak Reviewed-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20201229172201.4155327-2-imre.deak@intel.com (cherry picked from commit 67fba3f1c73b83569d171ae1fa463a537bbfe0a8) Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/display/intel_dp.c | 2 +- .../drm/i915/display/intel_dp_link_training.c | 19 +++++++++++-------- .../drm/i915/display/intel_dp_link_training.h | 3 ++- 3 files changed, 14 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c index f6eec5c206d8..8a26307c4896 100644 --- a/drivers/gpu/drm/i915/display/intel_dp.c +++ b/drivers/gpu/drm/i915/display/intel_dp.c @@ -5685,7 +5685,7 @@ static void intel_dp_process_phy_request(struct intel_dp *intel_dp, intel_dp_autotest_phy_ddi_disable(intel_dp, crtc_state); - intel_dp_set_signal_levels(intel_dp, crtc_state); + intel_dp_set_signal_levels(intel_dp, crtc_state, DP_PHY_DPRX); intel_dp_phy_pattern_update(intel_dp, crtc_state); diff --git a/drivers/gpu/drm/i915/display/intel_dp_link_training.c b/drivers/gpu/drm/i915/display/intel_dp_link_training.c index 7876e781f698..d8c6d7054d11 100644 --- a/drivers/gpu/drm/i915/display/intel_dp_link_training.c +++ b/drivers/gpu/drm/i915/display/intel_dp_link_training.c @@ -335,21 +335,24 @@ intel_dp_set_link_train(struct intel_dp *intel_dp, } void intel_dp_set_signal_levels(struct intel_dp *intel_dp, - const struct intel_crtc_state *crtc_state) + const struct intel_crtc_state *crtc_state, + enum drm_dp_phy dp_phy) { struct drm_i915_private *dev_priv = dp_to_i915(intel_dp); u8 train_set = intel_dp->train_set[0]; + char phy_name[10]; - drm_dbg_kms(&dev_priv->drm, "Using vswing level %d%s\n", + drm_dbg_kms(&dev_priv->drm, "Using vswing level %d%s, pre-emphasis level %d%s, at %s\n", train_set & DP_TRAIN_VOLTAGE_SWING_MASK, - train_set & DP_TRAIN_MAX_SWING_REACHED ? " (max)" : ""); - drm_dbg_kms(&dev_priv->drm, "Using pre-emphasis level %d%s\n", + train_set & DP_TRAIN_MAX_SWING_REACHED ? " (max)" : "", (train_set & DP_TRAIN_PRE_EMPHASIS_MASK) >> DP_TRAIN_PRE_EMPHASIS_SHIFT, train_set & DP_TRAIN_MAX_PRE_EMPHASIS_REACHED ? - " (max)" : ""); + " (max)" : "", + intel_dp_phy_name(dp_phy, phy_name, sizeof(phy_name))); - intel_dp->set_signal_levels(intel_dp, crtc_state); + if (intel_dp_phy_is_downstream_of_source(intel_dp, dp_phy)) + intel_dp->set_signal_levels(intel_dp, crtc_state); } static bool @@ -359,7 +362,7 @@ intel_dp_reset_link_train(struct intel_dp *intel_dp, u8 dp_train_pat) { memset(intel_dp->train_set, 0, sizeof(intel_dp->train_set)); - intel_dp_set_signal_levels(intel_dp, crtc_state); + intel_dp_set_signal_levels(intel_dp, crtc_state, dp_phy); return intel_dp_set_link_train(intel_dp, crtc_state, dp_phy, dp_train_pat); } @@ -373,7 +376,7 @@ intel_dp_update_link_train(struct intel_dp *intel_dp, DP_TRAINING_LANE0_SET_PHY_REPEATER(dp_phy); int ret; - intel_dp_set_signal_levels(intel_dp, crtc_state); + intel_dp_set_signal_levels(intel_dp, crtc_state, dp_phy); ret = drm_dp_dpcd_write(&intel_dp->aux, reg, intel_dp->train_set, crtc_state->lane_count); diff --git a/drivers/gpu/drm/i915/display/intel_dp_link_training.h b/drivers/gpu/drm/i915/display/intel_dp_link_training.h index c3110c032bc2..6a1f76bd8c75 100644 --- a/drivers/gpu/drm/i915/display/intel_dp_link_training.h +++ b/drivers/gpu/drm/i915/display/intel_dp_link_training.h @@ -18,7 +18,8 @@ void intel_dp_get_adjust_train(struct intel_dp *intel_dp, enum drm_dp_phy dp_phy, const u8 link_status[DP_LINK_STATUS_SIZE]); void intel_dp_set_signal_levels(struct intel_dp *intel_dp, - const struct intel_crtc_state *crtc_state); + const struct intel_crtc_state *crtc_state, + enum drm_dp_phy dp_phy); void intel_dp_start_link_train(struct intel_dp *intel_dp, const struct intel_crtc_state *crtc_state); void intel_dp_stop_link_train(struct intel_dp *intel_dp, From 943dea8af21bd896e0d6c30ea221203fb3cd3265 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Tue, 2 Feb 2021 08:55:46 -0800 Subject: [PATCH 146/284] KVM: x86: Update emulator context mode if SYSENTER xfers to 64-bit mode Set the emulator context to PROT64 if SYSENTER transitions from 32-bit userspace (compat mode) to a 64-bit kernel, otherwise the RIP update at the end of x86_emulate_insn() will incorrectly truncate the new RIP. Note, this bug is mostly limited to running an Intel virtual CPU model on an AMD physical CPU, as other combinations of virtual and physical CPUs do not trigger full emulation. On Intel CPUs, SYSENTER in compatibility mode is legal, and unconditionally transitions to 64-bit mode. On AMD CPUs, SYSENTER is illegal in compatibility mode and #UDs. If the vCPU is AMD, KVM injects a #UD on SYSENTER in compat mode. If the pCPU is Intel, SYSENTER will execute natively and not trigger #UD->VM-Exit (ignoring guest TLB shenanigans). Fixes: fede8076aab4 ("KVM: x86: handle wrap around 32-bit address space") Cc: stable@vger.kernel.org Signed-off-by: Jonny Barker [sean: wrote changelog] Signed-off-by: Sean Christopherson Message-Id: <20210202165546.2390296-1-seanjc@google.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/emulate.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 56cae1ff9e3f..66a08322988f 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2879,6 +2879,8 @@ static int em_sysenter(struct x86_emulate_ctxt *ctxt) ops->get_msr(ctxt, MSR_IA32_SYSENTER_ESP, &msr_data); *reg_write(ctxt, VCPU_REGS_RSP) = (efer & EFER_LMA) ? msr_data : (u32)msr_data; + if (efer & EFER_LMA) + ctxt->mode = X86EMUL_MODE_PROT64; return X86EMUL_CONTINUE; } From 91cb2c8b072e00632adf463b78b44f123d46a0fa Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Mon, 1 Feb 2021 19:06:33 +0000 Subject: [PATCH 147/284] arm64: Do not pass tagged addresses to __is_lm_address() Commit 519ea6f1c82f ("arm64: Fix kernel address detection of __is_lm_address()") fixed the incorrect validation of addresses below PAGE_OFFSET. However, it no longer allowed tagged addresses to be passed to virt_addr_valid(). Fix this by explicitly resetting the pointer tag prior to invoking __is_lm_address(). This is consistent with the __lm_to_phys() macro. Fixes: 519ea6f1c82f ("arm64: Fix kernel address detection of __is_lm_address()") Signed-off-by: Catalin Marinas Acked-by: Ard Biesheuvel Cc: # 5.4.x Cc: Will Deacon Cc: Vincenzo Frascino Cc: Mark Rutland Link: https://lore.kernel.org/r/20210201190634.22942-2-catalin.marinas@arm.com --- arch/arm64/include/asm/memory.h | 2 +- arch/arm64/mm/physaddr.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index 99d7e1494aaa..3c1aaa522cbd 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -332,7 +332,7 @@ static inline void *phys_to_virt(phys_addr_t x) #endif /* !CONFIG_SPARSEMEM_VMEMMAP || CONFIG_DEBUG_VIRTUAL */ #define virt_addr_valid(addr) ({ \ - __typeof__(addr) __addr = addr; \ + __typeof__(addr) __addr = __tag_reset(addr); \ __is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr)); \ }) diff --git a/arch/arm64/mm/physaddr.c b/arch/arm64/mm/physaddr.c index 67a9ba9eaa96..cde44c13dda1 100644 --- a/arch/arm64/mm/physaddr.c +++ b/arch/arm64/mm/physaddr.c @@ -9,7 +9,7 @@ phys_addr_t __virt_to_phys(unsigned long x) { - WARN(!__is_lm_address(x), + WARN(!__is_lm_address(__tag_reset(x)), "virt_to_phys used for non-linear address: %pK (%pS)\n", (void *)x, (void *)x); From 22cd5edb2d9c6d68b6ac0fc9584104d88710fa57 Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Mon, 1 Feb 2021 19:06:34 +0000 Subject: [PATCH 148/284] arm64: Use simpler arithmetics for the linear map macros Because of the tagged addresses, the __is_lm_address() and __lm_to_phys() macros grew to some harder to understand bitwise operations using PAGE_OFFSET. Since these macros only accept untagged addresses, use a simple subtract operation. Signed-off-by: Catalin Marinas Acked-by: Ard Biesheuvel Cc: Will Deacon Cc: Mark Rutland Link: https://lore.kernel.org/r/20210201190634.22942-3-catalin.marinas@arm.com --- arch/arm64/include/asm/memory.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index 3c1aaa522cbd..ff4732785c32 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -251,9 +251,9 @@ static inline const void *__tag_set(const void *addr, u8 tag) * lives in the [PAGE_OFFSET, PAGE_END) interval at the bottom of the * kernel's TTBR1 address range. */ -#define __is_lm_address(addr) (((u64)(addr) ^ PAGE_OFFSET) < (PAGE_END - PAGE_OFFSET)) +#define __is_lm_address(addr) (((u64)(addr) - PAGE_OFFSET) < (PAGE_END - PAGE_OFFSET)) -#define __lm_to_phys(addr) (((addr) & ~PAGE_OFFSET) + PHYS_OFFSET) +#define __lm_to_phys(addr) (((addr) - PAGE_OFFSET) + PHYS_OFFSET) #define __kimg_to_phys(addr) ((addr) - kimage_voffset) #define __virt_to_phys_nodebug(x) ({ \ From a50ea34d6dd00a12c9cd29cf7b0fa72816bffbcb Mon Sep 17 00:00:00 2001 From: Chunfeng Yun Date: Tue, 2 Feb 2021 16:38:24 +0800 Subject: [PATCH 149/284] usb: xhci-mtk: break loop when find the endpoint to drop No need to check the following endpoints after finding the endpoint wanted to drop. Fixes: 54f6a8af3722 ("usb: xhci-mtk: skip dropping bandwidth of unchecked endpoints") Cc: stable Reported-by: Ikjoon Jang Signed-off-by: Chunfeng Yun Link: https://lore.kernel.org/r/1612255104-5363-1-git-send-email-chunfeng.yun@mediatek.com Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-mtk-sch.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/usb/host/xhci-mtk-sch.c b/drivers/usb/host/xhci-mtk-sch.c index dee8a329076d..b45e5bf08997 100644 --- a/drivers/usb/host/xhci-mtk-sch.c +++ b/drivers/usb/host/xhci-mtk-sch.c @@ -689,8 +689,10 @@ void xhci_mtk_drop_ep_quirk(struct usb_hcd *hcd, struct usb_device *udev, sch_bw = &sch_array[bw_index]; list_for_each_entry_safe(sch_ep, tmp, &sch_bw->bw_ep_list, endpoint) { - if (sch_ep->ep == ep) + if (sch_ep->ep == ep) { destroy_sch_ep(udev, sch_bw, sch_ep); + break; + } } } EXPORT_SYMBOL_GPL(xhci_mtk_drop_ep_quirk); From ebb22a05943666155e6da04407cc6e913974c78c Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 1 Feb 2021 20:24:17 +0100 Subject: [PATCH 150/284] rtc: mc146818: Dont test for bit 0-5 in Register D The recent change to validate the RTC turned out to be overly tight. While it cures the problem on the reporters machine it breaks machines with Intel chipsets which use bit 0-5 of the D register. So check only for bit 6 being 0 which is the case on these Intel machines as well. Fixes: 211e5db19d15 ("rtc: mc146818: Detect and handle broken RTCs") Reported-by: Serge Belyshev Reported-by: Dirk Gouders Reported-by: Borislav Petkov Signed-off-by: Thomas Gleixner Tested-by: Dirk Gouders Tested-by: Len Brown Tested-by: Borislav Petkov Acked-by: Alexandre Belloni Link: https://lore.kernel.org/r/87zh0nbnha.fsf@nanos.tec.linutronix.de --- drivers/rtc/rtc-cmos.c | 4 ++-- drivers/rtc/rtc-mc146818-lib.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index 68a9ac6f2fe1..a701dae653c4 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -805,8 +805,8 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) spin_lock_irq(&rtc_lock); - /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */ - if ((CMOS_READ(RTC_VALID) & 0x7f) != 0) { + /* Ensure that the RTC is accessible. Bit 6 must be 0! */ + if ((CMOS_READ(RTC_VALID) & 0x40) != 0) { spin_unlock_irq(&rtc_lock); dev_warn(dev, "not accessible\n"); retval = -ENXIO; diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c index f83c13818af3..dcfaf09946ee 100644 --- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -21,8 +21,8 @@ unsigned int mc146818_get_time(struct rtc_time *time) again: spin_lock_irqsave(&rtc_lock, flags); - /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */ - if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x7f) != 0)) { + /* Ensure that the RTC is accessible. Bit 6 must be 0! */ + if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x40) != 0)) { spin_unlock_irqrestore(&rtc_lock, flags); memset(time, 0xff, sizeof(*time)); return 0; From 89fa15ecdca7eb46a711476b961f70a74765bbe4 Mon Sep 17 00:00:00 2001 From: Huang Rui Date: Sat, 30 Jan 2021 17:14:30 +0800 Subject: [PATCH 151/284] drm/amdgpu: fix the issue that retry constantly once the buffer is oversize MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We cannot modify initial_domain every time while the retry starts. That will cause the busy waiting that unable to switch to GTT while the vram is not enough. Fixes: f8aab60422c3 ("drm/amdgpu: Initialise drm_gem_object_funcs for imported BOs") Signed-off-by: Huang Rui Reviewed-by: Alex Deucher Reviewed-by: Christian König Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index d0a1fee1f5f6..174a73eb23f0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -269,8 +269,8 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data, resv = vm->root.base.bo->tbo.base.resv; } -retry: initial_domain = (u32)(0xffffffff & args->in.domains); +retry: r = amdgpu_gem_object_create(adev, size, args->in.alignment, initial_domain, flags, ttm_bo_type_device, resv, &gobj); From b99a8c8f239d76820bbed33c1a42c381cc1f16db Mon Sep 17 00:00:00 2001 From: Huang Rui Date: Mon, 1 Feb 2021 18:39:16 +0800 Subject: [PATCH 152/284] drm/amdkfd: fix null pointer panic while free buffer in kfd In drm_gem_object_free, it will call funcs of drm buffer obj. So kfd_alloc should use amdgpu_gem_object_create instead of amdgpu_bo_create to initialize the funcs as amdgpu_gem_object_funcs. [ 396.231390] amdgpu: Release VA 0x7f76b4ada000 - 0x7f76b4add000 [ 396.231394] amdgpu: remove VA 0x7f76b4ada000 - 0x7f76b4add000 in entry 0000000085c24a47 [ 396.231408] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 396.231445] #PF: supervisor read access in kernel mode [ 396.231466] #PF: error_code(0x0000) - not-present page [ 396.231484] PGD 0 P4D 0 [ 396.231495] Oops: 0000 [#1] SMP NOPTI [ 396.231509] CPU: 7 PID: 1352 Comm: clinfo Tainted: G OE 5.11.0-rc2-custom #1 [ 396.231537] Hardware name: AMD Celadon-RN/Celadon-RN, BIOS WCD0401N_Weekly_20_04_0 04/01/2020 [ 396.231563] RIP: 0010:drm_gem_object_free+0xc/0x22 [drm] [ 396.231606] Code: eb ec 48 89 c3 eb e7 0f 1f 44 00 00 55 48 89 e5 48 8b bf 00 06 00 00 e8 72 0d 01 00 5d c3 0f 1f 44 00 00 48 8b 87 40 01 00 00 <48> 8b 00 48 85 c0 74 0b 55 48 89 e5 e8 54 37 7c db 5d c3 0f 0b c3 [ 396.231666] RSP: 0018:ffffb4704177fcf8 EFLAGS: 00010246 [ 396.231686] RAX: 0000000000000000 RBX: ffff993a0d0cc400 RCX: 0000000000003113 [ 396.231711] RDX: 0000000000000001 RSI: e9cda7a5d0791c6d RDI: ffff993a333a9058 [ 396.231736] RBP: ffffb4704177fdd0 R08: ffff993a03855858 R09: 0000000000000000 [ 396.231761] R10: ffff993a0d1f7158 R11: 0000000000000001 R12: 0000000000000000 [ 396.231785] R13: ffff993a0d0cc428 R14: 0000000000003000 R15: ffffb4704177fde0 [ 396.231811] FS: 00007f76b5730740(0000) GS:ffff993b275c0000(0000) knlGS:0000000000000000 [ 396.231840] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 396.231860] CR2: 0000000000000000 CR3: 000000016d2e2000 CR4: 0000000000350ee0 [ 396.231885] Call Trace: [ 396.231897] ? amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x24c/0x25f [amdgpu] [ 396.232056] ? __dynamic_dev_dbg+0xcd/0x100 [ 396.232076] kfd_ioctl_free_memory_of_gpu+0x91/0x102 [amdgpu] [ 396.232214] kfd_ioctl+0x211/0x35b [amdgpu] [ 396.232341] ? kfd_ioctl_get_queue_wave_state+0x52/0x52 [amdgpu] Fixes: 246cb7e49a70 ("drm/amdgpu: Introduce GEM object functions") Reviewed-by: Felix Kuehling Tested-by: Changfeng Signed-off-by: Huang Rui Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 2d991da2cead..d1ed4f8df2b7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -26,6 +26,7 @@ #include #include "amdgpu_object.h" +#include "amdgpu_gem.h" #include "amdgpu_vm.h" #include "amdgpu_amdkfd.h" #include "amdgpu_dma_buf.h" @@ -1152,7 +1153,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( struct sg_table *sg = NULL; uint64_t user_addr = 0; struct amdgpu_bo *bo; - struct amdgpu_bo_param bp; + struct drm_gem_object *gobj; u32 domain, alloc_domain; u64 alloc_flags; int ret; @@ -1220,19 +1221,14 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( pr_debug("\tcreate BO VA 0x%llx size 0x%llx domain %s\n", va, size, domain_string(alloc_domain)); - memset(&bp, 0, sizeof(bp)); - bp.size = size; - bp.byte_align = 1; - bp.domain = alloc_domain; - bp.flags = alloc_flags; - bp.type = bo_type; - bp.resv = NULL; - ret = amdgpu_bo_create(adev, &bp, &bo); + ret = amdgpu_gem_object_create(adev, size, 1, alloc_domain, alloc_flags, + bo_type, NULL, &gobj); if (ret) { pr_debug("Failed to create BO on domain %s. ret %d\n", - domain_string(alloc_domain), ret); + domain_string(alloc_domain), ret); goto err_bo_create; } + bo = gem_to_amdgpu_bo(gobj); if (bo_type == ttm_bo_type_sg) { bo->tbo.sg = sg; bo->tbo.ttm->sg = sg; From ea41bd232f167d6fd6505d54485826148b52e54a Mon Sep 17 00:00:00 2001 From: chen gong Date: Fri, 29 Jan 2021 15:37:45 +0800 Subject: [PATCH 153/284] drm/amdgpu/gfx10: update CGTS_TCC_DISABLE and CGTS_USER_TCC_DISABLE register offsets for VGH For Vangogh: The offset of the CGTS_TCC_DISABLE is 0x5006 by calculation. The offset of the CGTS_USER_TCC_DISABLE is 0x5007 by calculation. Signed-off-by: chen gong Acked-by: Alex Deucher Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 346963e3cf73..d86b42a36560 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -99,6 +99,10 @@ #define mmGCR_GENERAL_CNTL_Sienna_Cichlid 0x1580 #define mmGCR_GENERAL_CNTL_Sienna_Cichlid_BASE_IDX 0 +#define mmCGTS_TCC_DISABLE_Vangogh 0x5006 +#define mmCGTS_TCC_DISABLE_Vangogh_BASE_IDX 1 +#define mmCGTS_USER_TCC_DISABLE_Vangogh 0x5007 +#define mmCGTS_USER_TCC_DISABLE_Vangogh_BASE_IDX 1 #define mmGOLDEN_TSC_COUNT_UPPER_Vangogh 0x0025 #define mmGOLDEN_TSC_COUNT_UPPER_Vangogh_BASE_IDX 1 #define mmGOLDEN_TSC_COUNT_LOWER_Vangogh 0x0026 @@ -4936,8 +4940,18 @@ static void gfx_v10_0_tcp_harvest(struct amdgpu_device *adev) static void gfx_v10_0_get_tcc_info(struct amdgpu_device *adev) { /* TCCs are global (not instanced). */ - uint32_t tcc_disable = RREG32_SOC15(GC, 0, mmCGTS_TCC_DISABLE) | - RREG32_SOC15(GC, 0, mmCGTS_USER_TCC_DISABLE); + uint32_t tcc_disable; + + switch (adev->asic_type) { + case CHIP_VANGOGH: + tcc_disable = RREG32_SOC15(GC, 0, mmCGTS_TCC_DISABLE_Vangogh) | + RREG32_SOC15(GC, 0, mmCGTS_USER_TCC_DISABLE_Vangogh); + break; + default: + tcc_disable = RREG32_SOC15(GC, 0, mmCGTS_TCC_DISABLE) | + RREG32_SOC15(GC, 0, mmCGTS_USER_TCC_DISABLE); + break; + } adev->gfx.config.tcc_disabled_mask = REG_GET_FIELD(tcc_disable, CGTS_TCC_DISABLE, TCC_DISABLE) | From 53a5a2729470ac7a7f77a64be4ae87dc4aa80d39 Mon Sep 17 00:00:00 2001 From: Xiaojian Du Date: Mon, 1 Feb 2021 16:20:38 +0800 Subject: [PATCH 154/284] drm/amd/pm: fill in the data member of v2 gpu metrics table for vangogh This patch is to fill in the data member of v2 gpu metrics table for vangogh. Signed-off-by: Xiaojian Du Reviewed-by: Kevin Wang Reviewed-by: Huang Rui Reviewed-by: Evan Quan Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c index 5c1482d4ca43..92ad2cdbae10 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c @@ -591,14 +591,17 @@ static ssize_t vangogh_get_gpu_metrics(struct smu_context *smu, gpu_metrics->average_socket_power = metrics.CurrentSocketPower; gpu_metrics->average_cpu_power = metrics.Power[0]; gpu_metrics->average_soc_power = metrics.Power[1]; + gpu_metrics->average_gfx_power = metrics.Power[2]; memcpy(&gpu_metrics->average_core_power[0], &metrics.CorePower[0], sizeof(uint16_t) * 8); gpu_metrics->average_gfxclk_frequency = metrics.GfxclkFrequency; gpu_metrics->average_socclk_frequency = metrics.SocclkFrequency; + gpu_metrics->average_uclk_frequency = metrics.MemclkFrequency; gpu_metrics->average_fclk_frequency = metrics.MemclkFrequency; gpu_metrics->average_vclk_frequency = metrics.VclkFrequency; + gpu_metrics->average_dclk_frequency = metrics.DclkFrequency; memcpy(&gpu_metrics->current_coreclk[0], &metrics.CoreFrequency[0], From cd9b0159beb7787bec38eb339ed7bc167d83b4ff Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= Date: Wed, 27 Jan 2021 13:20:40 +0100 Subject: [PATCH 155/284] drm/amdgpu: enable freesync for A+A configs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Some newer APUs can scanout directly from GTT, that saves us from allocating another bounce buffer in VRAM and enables freesync in such configurations. Without this patch creating a framebuffer from the imported BO will fail and userspace will fall back to a copy. Signed-off-by: Christian König Reviewed-by: Shashank Sharma Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 8 ++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c index f764803c53a4..48cb33e5b382 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c @@ -926,8 +926,10 @@ amdgpu_display_user_framebuffer_create(struct drm_device *dev, struct drm_file *file_priv, const struct drm_mode_fb_cmd2 *mode_cmd) { - struct drm_gem_object *obj; struct amdgpu_framebuffer *amdgpu_fb; + struct drm_gem_object *obj; + struct amdgpu_bo *bo; + uint32_t domains; int ret; obj = drm_gem_object_lookup(file_priv, mode_cmd->handles[0]); @@ -938,7 +940,9 @@ amdgpu_display_user_framebuffer_create(struct drm_device *dev, } /* Handle is imported dma-buf, so cannot be migrated to VRAM for scanout */ - if (obj->import_attach) { + bo = gem_to_amdgpu_bo(obj); + domains = amdgpu_display_supported_domains(drm_to_adev(dev), bo->flags); + if (obj->import_attach && !(domains & AMDGPU_GEM_DOMAIN_GTT)) { drm_dbg_kms(dev, "Cannot create framebuffer from imported dma_buf\n"); return ERR_PTR(-EINVAL); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 25ec4d57333f..b4c8e5d5c763 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -897,7 +897,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, return -EINVAL; /* A shared bo cannot be migrated to VRAM */ - if (bo->prime_shared_count) { + if (bo->prime_shared_count || bo->tbo.base.import_attach) { if (domain & AMDGPU_GEM_DOMAIN_GTT) domain = AMDGPU_GEM_DOMAIN_GTT; else From 2b6b7ab4b1cabfbee1af5d818efcab5d51d62c7e Mon Sep 17 00:00:00 2001 From: George Shen Date: Tue, 22 Dec 2020 14:05:41 -0500 Subject: [PATCH 156/284] drm/amd/display: Fix DPCD translation for LTTPR AUX_RD_INTERVAL [Why] The translation between the DPCD value and the specified AUX_RD_INTERVAL in the DP spec do not match. [How] Update values to match the spec. Signed-off-by: George Shen Reviewed-by: Wenjing Liu Acked-by: Anson Jacob Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c index f95bade59624..1e4794e2825c 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c @@ -892,14 +892,14 @@ static uint32_t translate_training_aux_read_interval(uint32_t dpcd_aux_read_inte switch (dpcd_aux_read_interval) { case 0x01: - aux_rd_interval_us = 400; - break; - case 0x02: aux_rd_interval_us = 4000; break; - case 0x03: + case 0x02: aux_rd_interval_us = 8000; break; + case 0x03: + aux_rd_interval_us = 12000; + break; case 0x04: aux_rd_interval_us = 16000; break; From 8866a67ab86cc0812e65c04f1ef02bcc41e24d68 Mon Sep 17 00:00:00 2001 From: Bhawanpreet Lakha Date: Wed, 6 Jan 2021 11:23:05 -0500 Subject: [PATCH 157/284] drm/amd/display: reuse current context instead of recreating one [Why] Currently we discard the current context and recreate it. The current context is what is applied to the HW so we should be re-using this rather than creating a new context. Recreating the context can lead to mismatch between new context and the current context For example: gsl groups get changed when we create a new context this can cause issues in a multi display config (with flip immediate) because we don't align the existing gsl groups in the new and current context. If we reuse the current context the gsl group assignment stays the same. [How] Instead of discarding the current context, we instead just copy the current state and add/remove planes and streams. Signed-off-by: Bhawanpreet Lakha Reviewed-by: Nicholas Kazlauskas Acked-by: Anson Jacob Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 23 +++++++-------- drivers/gpu/drm/amd/display/dc/core/dc.c | 29 +++++++++++++------ drivers/gpu/drm/amd/display/dc/dc_stream.h | 3 +- 3 files changed, 31 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index c6da89df055d..0757328e3085 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -1934,7 +1934,7 @@ static void dm_gpureset_commit_state(struct dc_state *dc_state, dc_commit_updates_for_stream( dm->dc, bundle->surface_updates, dc_state->stream_status->plane_count, - dc_state->streams[k], &bundle->stream_update, dc_state); + dc_state->streams[k], &bundle->stream_update); } cleanup: @@ -1965,8 +1965,7 @@ static void dm_set_dpms_off(struct dc_link *link) stream_update.stream = stream_state; dc_commit_updates_for_stream(stream_state->ctx->dc, NULL, 0, - stream_state, &stream_update, - stream_state->ctx->dc->current_state); + stream_state, &stream_update); mutex_unlock(&adev->dm.dc_lock); } @@ -7549,7 +7548,7 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, struct drm_crtc *pcrtc, bool wait_for_vblank) { - uint32_t i; + int i; uint64_t timestamp_ns; struct drm_plane *plane; struct drm_plane_state *old_plane_state, *new_plane_state; @@ -7590,7 +7589,7 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, amdgpu_dm_commit_cursors(state); /* update planes when needed */ - for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) { + for_each_oldnew_plane_in_state_reverse(state, plane, old_plane_state, new_plane_state, i) { struct drm_crtc *crtc = new_plane_state->crtc; struct drm_crtc_state *new_crtc_state; struct drm_framebuffer *fb = new_plane_state->fb; @@ -7813,8 +7812,7 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, bundle->surface_updates, planes_count, acrtc_state->stream, - &bundle->stream_update, - dc_state); + &bundle->stream_update); /** * Enable or disable the interrupts on the backend. @@ -8150,13 +8148,13 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state) struct dm_connector_state *dm_new_con_state = to_dm_connector_state(new_con_state); struct dm_connector_state *dm_old_con_state = to_dm_connector_state(old_con_state); struct amdgpu_crtc *acrtc = to_amdgpu_crtc(dm_new_con_state->base.crtc); - struct dc_surface_update dummy_updates[MAX_SURFACES]; + struct dc_surface_update surface_updates[MAX_SURFACES]; struct dc_stream_update stream_update; struct dc_info_packet hdr_packet; struct dc_stream_status *status = NULL; bool abm_changed, hdr_changed, scaling_changed; - memset(&dummy_updates, 0, sizeof(dummy_updates)); + memset(&surface_updates, 0, sizeof(surface_updates)); memset(&stream_update, 0, sizeof(stream_update)); if (acrtc) { @@ -8213,16 +8211,15 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state) * To fix this, DC should permit updating only stream properties. */ for (j = 0; j < status->plane_count; j++) - dummy_updates[j].surface = status->plane_states[0]; + surface_updates[j].surface = status->plane_states[j]; mutex_lock(&dm->dc_lock); dc_commit_updates_for_stream(dm->dc, - dummy_updates, + surface_updates, status->plane_count, dm_new_crtc_state->stream, - &stream_update, - dc_state); + &stream_update); mutex_unlock(&dm->dc_lock); } diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 58eb0d69873a..6cf1a5a2a5ec 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -2679,8 +2679,7 @@ void dc_commit_updates_for_stream(struct dc *dc, struct dc_surface_update *srf_updates, int surface_count, struct dc_stream_state *stream, - struct dc_stream_update *stream_update, - struct dc_state *state) + struct dc_stream_update *stream_update) { const struct dc_stream_status *stream_status; enum surface_update_type update_type; @@ -2699,6 +2698,12 @@ void dc_commit_updates_for_stream(struct dc *dc, if (update_type >= UPDATE_TYPE_FULL) { + struct dc_plane_state *new_planes[MAX_SURFACES]; + + memset(new_planes, 0, sizeof(new_planes)); + + for (i = 0; i < surface_count; i++) + new_planes[i] = srf_updates[i].surface; /* initialize scratch memory for building context */ context = dc_create_state(dc); @@ -2707,15 +2712,21 @@ void dc_commit_updates_for_stream(struct dc *dc, return; } - dc_resource_state_copy_construct(state, context); + dc_resource_state_copy_construct( + dc->current_state, context); - for (i = 0; i < dc->res_pool->pipe_count; i++) { - struct pipe_ctx *new_pipe = &context->res_ctx.pipe_ctx[i]; - struct pipe_ctx *old_pipe = &dc->current_state->res_ctx.pipe_ctx[i]; - - if (new_pipe->plane_state && new_pipe->plane_state != old_pipe->plane_state) - new_pipe->plane_state->force_full_update = true; + /*remove old surfaces from context */ + if (!dc_rem_all_planes_for_stream(dc, stream, context)) { + DC_ERROR("Failed to remove streams for new validate context!\n"); + return; } + + /* add surface to context */ + if (!dc_add_all_planes_for_stream(dc, stream, new_planes, surface_count, context)) { + DC_ERROR("Failed to add streams for new validate context!\n"); + return; + } + } diff --git a/drivers/gpu/drm/amd/display/dc/dc_stream.h b/drivers/gpu/drm/amd/display/dc/dc_stream.h index b7910976b81a..e243c01b9672 100644 --- a/drivers/gpu/drm/amd/display/dc/dc_stream.h +++ b/drivers/gpu/drm/amd/display/dc/dc_stream.h @@ -283,8 +283,7 @@ void dc_commit_updates_for_stream(struct dc *dc, struct dc_surface_update *srf_updates, int surface_count, struct dc_stream_state *stream, - struct dc_stream_update *stream_update, - struct dc_state *state); + struct dc_stream_update *stream_update); /* * Log the current stream state. */ From 1622711beebe887e4f0f8237fea1f09bb48e9a51 Mon Sep 17 00:00:00 2001 From: Sung Lee Date: Fri, 15 Jan 2021 13:53:15 -0500 Subject: [PATCH 158/284] drm/amd/display: Add more Clock Sources to DCN2.1 [WHY] When enabling HDMI on ComboPHY, there are not enough clock sources to complete display detection. [HOW] Initialize more clock sources. Signed-off-by: Sung Lee Reviewed-by: Tony Cheng Acked-by: Anson Jacob Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c index b000b43a820d..674376428916 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c @@ -906,6 +906,8 @@ enum dcn20_clk_src_array_id { DCN20_CLK_SRC_PLL0, DCN20_CLK_SRC_PLL1, DCN20_CLK_SRC_PLL2, + DCN20_CLK_SRC_PLL3, + DCN20_CLK_SRC_PLL4, DCN20_CLK_SRC_TOTAL_DCN21 }; @@ -2030,6 +2032,14 @@ static bool dcn21_resource_construct( dcn21_clock_source_create(ctx, ctx->dc_bios, CLOCK_SOURCE_COMBO_PHY_PLL2, &clk_src_regs[2], false); + pool->base.clock_sources[DCN20_CLK_SRC_PLL3] = + dcn21_clock_source_create(ctx, ctx->dc_bios, + CLOCK_SOURCE_COMBO_PHY_PLL3, + &clk_src_regs[3], false); + pool->base.clock_sources[DCN20_CLK_SRC_PLL4] = + dcn21_clock_source_create(ctx, ctx->dc_bios, + CLOCK_SOURCE_COMBO_PHY_PLL4, + &clk_src_regs[4], false); pool->base.clk_src_count = DCN20_CLK_SRC_TOTAL_DCN21; From 1a10e5244778169a5a53a527d7830cf0438132a1 Mon Sep 17 00:00:00 2001 From: Stylon Wang Date: Tue, 5 Jan 2021 11:29:34 +0800 Subject: [PATCH 159/284] drm/amd/display: Revert "Fix EDID parsing after resume from suspend" This reverts commit b24bdc37d03a0478189e20a50286092840f414fa. It caused memory leak after S3 on 4K HDMI displays. Signed-off-by: Stylon Wang Reviewed-by: Rodrigo Siqueira Acked-by: Anson Jacob Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 0757328e3085..8d496fe36dd2 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2346,8 +2346,6 @@ void amdgpu_dm_update_connector_after_detect( drm_connector_update_edid_property(connector, aconnector->edid); - drm_add_edid_modes(connector, aconnector->edid); - if (aconnector->dc_link->aux_mode) drm_dp_cec_set_edid(&aconnector->dm_dp_aux.aux, aconnector->edid); From 58180a0cc0c57fe62a799a112f95b60f6935bd96 Mon Sep 17 00:00:00 2001 From: Mikita Lipski Date: Thu, 14 Jan 2021 11:48:57 -0500 Subject: [PATCH 160/284] drm/amd/display: Release DSC before acquiring [why] Need to unassign DSC from pipes that are not using it so other pipes can acquire it. That is needed for asic's that have unmatching number of DSC engines from the number of pipes. [how] Before acquiring dsc to stream resources, first remove it. Signed-off-by: Mikita Lipski Reviewed-by: Eryk Brol Acked-by: Anson Jacob Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c index 8ab0b9060d2b..f2d8cf34be46 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c @@ -833,6 +833,9 @@ bool compute_mst_dsc_configs_for_state(struct drm_atomic_state *state, if (computed_streams[i]) continue; + if (dcn20_remove_stream_from_ctx(stream->ctx->dc, dc_state, stream) != DC_OK) + return false; + mutex_lock(&aconnector->mst_mgr.lock); if (!compute_mst_dsc_configs_for_link(state, dc_state, stream->link)) { mutex_unlock(&aconnector->mst_mgr.lock); @@ -850,7 +853,8 @@ bool compute_mst_dsc_configs_for_state(struct drm_atomic_state *state, stream = dc_state->streams[i]; if (stream->timing.flags.DSC == 1) - dc_stream_add_dsc_to_resource(stream->ctx->dc, dc_state, stream); + if (dc_stream_add_dsc_to_resource(stream->ctx->dc, dc_state, stream) != DC_OK) + return false; } return true; From 3ddc818d9bb877c64f5c649beab97af86c403702 Mon Sep 17 00:00:00 2001 From: Victor Lu Date: Thu, 14 Jan 2021 22:24:14 -0500 Subject: [PATCH 161/284] drm/amd/display: Fix dc_sink kref count in emulated_link_detect [why] prev_sink is not used anywhere else in the function and the reference to it from dc_link is replaced with a new dc_sink. [how] Change dc_sink_retain(prev_sink) to dc_sink_release(prev_sink). Signed-off-by: Victor Lu Reviewed-by: Nicholas Kazlauskas Acked-by: Anson Jacob Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 8d496fe36dd2..ee1aa91bd237 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -1833,8 +1833,8 @@ static void emulated_link_detect(struct dc_link *link) link->type = dc_connection_none; prev_sink = link->local_sink; - if (prev_sink != NULL) - dc_sink_retain(prev_sink); + if (prev_sink) + dc_sink_release(prev_sink); switch (link->connector_signal) { case SIGNAL_TYPE_HDMI_TYPE_A: { From 2abaa323d744011982b20b8f3886184d56d23946 Mon Sep 17 00:00:00 2001 From: Victor Lu Date: Thu, 14 Jan 2021 16:27:07 -0500 Subject: [PATCH 162/284] drm/amd/display: Free atomic state after drm_atomic_commit [why] drm_atomic_commit was changed so that the caller must free their drm_atomic_state reference on successes. [how] Add drm_atomic_commit_put after drm_atomic_commit call in dm_force_atomic_commit. Signed-off-by: Victor Lu Reviewed-by: Roman Li Acked-by: Anson Jacob Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index ee1aa91bd237..6761c612beae 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -8354,14 +8354,14 @@ static int dm_force_atomic_commit(struct drm_connector *connector) ret = PTR_ERR_OR_ZERO(conn_state); if (ret) - goto err; + goto out; /* Attach crtc to drm_atomic_state*/ crtc_state = drm_atomic_get_crtc_state(state, &disconnected_acrtc->base); ret = PTR_ERR_OR_ZERO(crtc_state); if (ret) - goto err; + goto out; /* force a restore */ crtc_state->mode_changed = true; @@ -8371,17 +8371,15 @@ static int dm_force_atomic_commit(struct drm_connector *connector) ret = PTR_ERR_OR_ZERO(plane_state); if (ret) - goto err; - + goto out; /* Call commit internally with the state we just constructed */ ret = drm_atomic_commit(state); - if (!ret) - return 0; -err: - DRM_ERROR("Restoring old state failed with %i\n", ret); +out: drm_atomic_state_put(state); + if (ret) + DRM_ERROR("Restoring old state failed with %i\n", ret); return ret; } From 8e92bb0fa75bca9a57e4aba2e36f67d8016a3053 Mon Sep 17 00:00:00 2001 From: Victor Lu Date: Fri, 15 Jan 2021 11:02:48 -0500 Subject: [PATCH 163/284] drm/amd/display: Decrement refcount of dc_sink before reassignment [why] An old dc_sink state is causing a memory leak because it is missing a dc_sink_release before a new dc_sink is assigned back to aconnector->dc_sink. [how] Decrement the dc_sink refcount before reassigning it to a new dc_sink. Signed-off-by: Victor Lu Reviewed-by: Rodrigo Siqueira Acked-by: Anson Jacob Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 6761c612beae..961abf1cf040 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2329,8 +2329,10 @@ void amdgpu_dm_update_connector_after_detect( * TODO: check if we still need the S3 mode update workaround. * If yes, put it here. */ - if (aconnector->dc_sink) + if (aconnector->dc_sink) { amdgpu_dm_update_freesync_caps(connector, NULL); + dc_sink_release(aconnector->dc_sink); + } aconnector->dc_sink = sink; dc_sink_retain(aconnector->dc_sink); From 074075aea2ff72dade5231b4ee9f2ab9a055f1ec Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Tue, 2 Feb 2021 15:06:04 +0900 Subject: [PATCH 164/284] scripts/clang-tools: switch explicitly to Python 3 For the same reason as commit 51839e29cb59 ("scripts: switch explicitly to Python 3"), switch some more scripts, which I tested and confirmed working on Python 3. Signed-off-by: Masahiro Yamada Acked-by: Nathan Chancellor --- scripts/clang-tools/gen_compile_commands.py | 2 +- scripts/clang-tools/run-clang-tools.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/clang-tools/gen_compile_commands.py b/scripts/clang-tools/gen_compile_commands.py index 19963708bcf8..8ddb5d099029 100755 --- a/scripts/clang-tools/gen_compile_commands.py +++ b/scripts/clang-tools/gen_compile_commands.py @@ -1,4 +1,4 @@ -#!/usr/bin/env python +#!/usr/bin/env python3 # SPDX-License-Identifier: GPL-2.0 # # Copyright (C) Google LLC, 2018 diff --git a/scripts/clang-tools/run-clang-tools.py b/scripts/clang-tools/run-clang-tools.py index fa7655c7cec0..f754415af398 100755 --- a/scripts/clang-tools/run-clang-tools.py +++ b/scripts/clang-tools/run-clang-tools.py @@ -1,4 +1,4 @@ -#!/usr/bin/env python +#!/usr/bin/env python3 # SPDX-License-Identifier: GPL-2.0 # # Copyright (C) Google LLC, 2020 From 2ab543823322b564f205cb15d0f0302803c87d11 Mon Sep 17 00:00:00 2001 From: Alexandre Ghiti Date: Fri, 29 Jan 2021 12:31:05 -0500 Subject: [PATCH 165/284] riscv: virt_addr_valid must check the address belongs to linear mapping virt_addr_valid macro checks that a virtual address is valid, ie that the address belongs to the linear mapping and that the corresponding physical page exists. Add the missing check that ensures the virtual address belongs to the linear mapping, otherwise __virt_to_phys, when compiled with CONFIG_DEBUG_VIRTUAL enabled, raises a WARN that is interpreted as a kernel bug by syzbot. Signed-off-by: Alexandre Ghiti Reviewed-by: Atish Patra Signed-off-by: Palmer Dabbelt --- arch/riscv/include/asm/page.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h index 2d50f76efe48..64a675c5c30a 100644 --- a/arch/riscv/include/asm/page.h +++ b/arch/riscv/include/asm/page.h @@ -135,7 +135,10 @@ extern phys_addr_t __phys_addr_symbol(unsigned long x); #endif /* __ASSEMBLY__ */ -#define virt_addr_valid(vaddr) (pfn_valid(virt_to_pfn(vaddr))) +#define virt_addr_valid(vaddr) ({ \ + unsigned long _addr = (unsigned long)vaddr; \ + (unsigned long)(_addr) >= PAGE_OFFSET && pfn_valid(virt_to_pfn(_addr)); \ +}) #define VM_DATA_DEFAULT_FLAGS VM_DATA_FLAGS_NON_EXEC From f105ea9890f42137344f8c08548c895dc9294bd8 Mon Sep 17 00:00:00 2001 From: Atish Patra Date: Fri, 29 Jan 2021 11:00:36 -0800 Subject: [PATCH 166/284] RISC-V: Fix .init section permission update .init section permission should only updated to non-execute if STRICT_KERNEL_RWX is enabled. Otherwise, this will lead to a kernel hang. Fixes: 19a00869028f ("RISC-V: Protect all kernel sections including init early") Cc: stable@vger.kernel.org Suggested-by: Geert Uytterhoeven Reported-by: Geert Uytterhoeven Signed-off-by: Atish Patra Reviewed-by: Atish Patra Signed-off-by: Palmer Dabbelt --- arch/riscv/kernel/setup.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c index 3fa3f26dde85..c7c0655dd45b 100644 --- a/arch/riscv/kernel/setup.c +++ b/arch/riscv/kernel/setup.c @@ -293,6 +293,8 @@ void free_initmem(void) unsigned long init_begin = (unsigned long)__init_begin; unsigned long init_end = (unsigned long)__init_end; - set_memory_rw_nx(init_begin, (init_end - init_begin) >> PAGE_SHIFT); + if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX)) + set_memory_rw_nx(init_begin, (init_end - init_begin) >> PAGE_SHIFT); + free_initmem_default(POISON_FREE_INITMEM); } From eefb5f3ab2e8e0b3ef5eba5c5a9f33457741300d Mon Sep 17 00:00:00 2001 From: Sebastien Van Cauwenberghe Date: Fri, 29 Jan 2021 11:00:37 -0800 Subject: [PATCH 167/284] riscv: Align on L1_CACHE_BYTES when STRICT_KERNEL_RWX Allows the sections to be aligned on smaller boundaries and therefore results in a smaller kernel image size. Signed-off-by: Sebastien Van Cauwenberghe Reviewed-by: Atish Patra Signed-off-by: Palmer Dabbelt --- arch/riscv/include/asm/set_memory.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/riscv/include/asm/set_memory.h b/arch/riscv/include/asm/set_memory.h index 211eb8244a45..8b80c80c7f1a 100644 --- a/arch/riscv/include/asm/set_memory.h +++ b/arch/riscv/include/asm/set_memory.h @@ -32,14 +32,14 @@ bool kernel_page_present(struct page *page); #endif /* __ASSEMBLY__ */ -#ifdef CONFIG_ARCH_HAS_STRICT_KERNEL_RWX +#ifdef CONFIG_STRICT_KERNEL_RWX #ifdef CONFIG_64BIT #define SECTION_ALIGN (1 << 21) #else #define SECTION_ALIGN (1 << 22) #endif -#else /* !CONFIG_ARCH_HAS_STRICT_KERNEL_RWX */ +#else /* !CONFIG_STRICT_KERNEL_RWX */ #define SECTION_ALIGN L1_CACHE_BYTES -#endif /* CONFIG_ARCH_HAS_STRICT_KERNEL_RWX */ +#endif /* CONFIG_STRICT_KERNEL_RWX */ #endif /* _ASM_RISCV_SET_MEMORY_H */ From de5f4b8f634beacf667e6eff334522601dd03b59 Mon Sep 17 00:00:00 2001 From: Atish Patra Date: Fri, 29 Jan 2021 11:00:38 -0800 Subject: [PATCH 168/284] RISC-V: Define MAXPHYSMEM_1GB only for RV32 MAXPHYSMEM_1GB option was added for RV32 because RV32 only supports 1GB of maximum physical memory. This lead to few compilation errors reported by kernel test robot which created the following configuration combination which are not useful but can be configured. 1. MAXPHYSMEM_1GB & RV64 2, MAXPHYSMEM_2GB & RV32 Fix this by restricting MAXPHYSMEM_1GB for RV32 and MAXPHYSMEM_2GB only for RV64. Fixes: e557793799c5 ("RISC-V: Fix maximum allowed phsyical memory for RV32") Cc: stable@vger.kernel.org Reported-by: Randy Dunlap Acked-by: Randy Dunlap Tested-by: Geert Uytterhoeven Signed-off-by: Atish Patra Signed-off-by: Palmer Dabbelt --- arch/riscv/Kconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index e9e2c1f0a690..e0a34eb5ed3b 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -252,8 +252,10 @@ choice default MAXPHYSMEM_128GB if 64BIT && CMODEL_MEDANY config MAXPHYSMEM_1GB + depends on 32BIT bool "1GiB" config MAXPHYSMEM_2GB + depends on 64BIT && CMODEL_MEDLOW bool "2GiB" config MAXPHYSMEM_128GB depends on 64BIT && CMODEL_MEDANY From 388c705b95f23f317fa43e6abf9ff07b583b721a Mon Sep 17 00:00:00 2001 From: Lin Feng Date: Tue, 2 Feb 2021 07:18:23 -0700 Subject: [PATCH 169/284] bfq-iosched: Revert "bfq: Fix computation of shallow depth" This reverts commit 6d4d273588378c65915acaf7b2ee74e9dd9c130a. bfq.limit_depth passes word_depths[] as shallow_depth down to sbitmap core sbitmap_get_shallow, which uses just the number to limit the scan depth of each bitmap word, formula: scan_percentage_for_each_word = shallow_depth / (1 << sbimap->shift) * 100% That means the comments's percentiles 50%, 75%, 18%, 37% of bfq are correct. But after commit patch 'bfq: Fix computation of shallow depth', we use sbitmap.depth instead, as a example in following case: sbitmap.depth = 256, map_nr = 4, shift = 6; sbitmap_word.depth = 64. The resulsts of computed bfqd->word_depths[] are {128, 192, 48, 96}, and three of the numbers exceed core dirver's 'sbitmap_word.depth=64' limit nothing. Signed-off-by: Lin Feng Reviewed-by: Jan Kara Signed-off-by: Jens Axboe --- block/bfq-iosched.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 9e4eb0fc1c16..9e81d1052091 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -6332,13 +6332,13 @@ static unsigned int bfq_update_depths(struct bfq_data *bfqd, * limit 'something'. */ /* no more than 50% of tags for async I/O */ - bfqd->word_depths[0][0] = max(bt->sb.depth >> 1, 1U); + bfqd->word_depths[0][0] = max((1U << bt->sb.shift) >> 1, 1U); /* * no more than 75% of tags for sync writes (25% extra tags * w.r.t. async I/O, to prevent async I/O from starving sync * writes) */ - bfqd->word_depths[0][1] = max((bt->sb.depth * 3) >> 2, 1U); + bfqd->word_depths[0][1] = max(((1U << bt->sb.shift) * 3) >> 2, 1U); /* * In-word depths in case some bfq_queue is being weight- @@ -6348,9 +6348,9 @@ static unsigned int bfq_update_depths(struct bfq_data *bfqd, * shortage. */ /* no more than ~18% of tags for async I/O */ - bfqd->word_depths[1][0] = max((bt->sb.depth * 3) >> 4, 1U); + bfqd->word_depths[1][0] = max(((1U << bt->sb.shift) * 3) >> 4, 1U); /* no more than ~37% of tags for sync writes (~20% extra tags) */ - bfqd->word_depths[1][1] = max((bt->sb.depth * 6) >> 4, 1U); + bfqd->word_depths[1][1] = max(((1U << bt->sb.shift) * 6) >> 4, 1U); for (i = 0; i < 2; i++) for (j = 0; j < 2; j++) From ccd85d90ce092bdb047a7f6580f3955393833b22 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Tue, 2 Feb 2021 13:20:17 -0800 Subject: [PATCH 170/284] KVM: SVM: Treat SVM as unsupported when running as an SEV guest Don't let KVM load when running as an SEV guest, regardless of what CPUID says. Memory is encrypted with a key that is not accessible to the host (L0), thus it's impossible for L0 to emulate SVM, e.g. it'll see garbage when reading the VMCB. Technically, KVM could decrypt all memory that needs to be accessible to the L0 and use shadow paging so that L0 does not need to shadow NPT, but exposing such information to L0 largely defeats the purpose of running as an SEV guest. This can always be revisited if someone comes up with a use case for running VMs inside SEV guests. Note, VMLOAD, VMRUN, etc... will also #GP on GPAs with C-bit set, i.e. KVM is doomed even if the SEV guest is debuggable and the hypervisor is willing to decrypt the VMCB. This may or may not be fixed on CPUs that have the SVME_ADDR_CHK fix. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Message-Id: <20210202212017.2486595-1-seanjc@google.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/svm/svm.c | 5 +++++ arch/x86/mm/mem_encrypt.c | 1 + 2 files changed, 6 insertions(+) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index f923e14e87df..3442d44ca53b 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -454,6 +454,11 @@ static int has_svm(void) return 0; } + if (sev_active()) { + pr_info("KVM is unsupported when running as an SEV guest\n"); + return 0; + } + return 1; } diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c index c79e5736ab2b..c3d5f0236f35 100644 --- a/arch/x86/mm/mem_encrypt.c +++ b/arch/x86/mm/mem_encrypt.c @@ -382,6 +382,7 @@ bool sev_active(void) { return sev_status & MSR_AMD64_SEV_ENABLED; } +EXPORT_SYMBOL_GPL(sev_active); /* Needs to be called from non-instrumentable code */ bool noinstr sev_es_active(void) From c1c35cf78bfab31b8cb455259524395c9e4c7cd6 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Fri, 13 Nov 2020 08:30:38 -0500 Subject: [PATCH 171/284] KVM: x86: cleanup CR3 reserved bits checks If not in long mode, the low bits of CR3 are reserved but not enforced to be zero, so remove those checks. If in long mode, however, the MBZ bits extend down to the highest physical address bit of the guest, excluding the encryption bit. Make the checks consistent with the above, and match them between nested_vmcb_checks and KVM_SET_SREGS. Cc: stable@vger.kernel.org Fixes: 761e41693465 ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests") Fixes: a780a3ea6282 ("KVM: X86: Fix reserved bits check for MOV to CR3") Reviewed-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- arch/x86/kvm/svm/nested.c | 13 +++---------- arch/x86/kvm/svm/svm.h | 3 --- arch/x86/kvm/x86.c | 2 ++ 3 files changed, 5 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 7a605ad8254d..db30670dd8c4 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -231,6 +231,7 @@ static bool nested_vmcb_check_controls(struct vmcb_control_area *control) static bool nested_vmcb_checks(struct vcpu_svm *svm, struct vmcb *vmcb12) { + struct kvm_vcpu *vcpu = &svm->vcpu; bool vmcb12_lma; if ((vmcb12->save.efer & EFER_SVME) == 0) @@ -244,18 +245,10 @@ static bool nested_vmcb_checks(struct vcpu_svm *svm, struct vmcb *vmcb12) vmcb12_lma = (vmcb12->save.efer & EFER_LME) && (vmcb12->save.cr0 & X86_CR0_PG); - if (!vmcb12_lma) { - if (vmcb12->save.cr4 & X86_CR4_PAE) { - if (vmcb12->save.cr3 & MSR_CR3_LEGACY_PAE_RESERVED_MASK) - return false; - } else { - if (vmcb12->save.cr3 & MSR_CR3_LEGACY_RESERVED_MASK) - return false; - } - } else { + if (vmcb12_lma) { if (!(vmcb12->save.cr4 & X86_CR4_PAE) || !(vmcb12->save.cr0 & X86_CR0_PE) || - (vmcb12->save.cr3 & MSR_CR3_LONG_MBZ_MASK)) + (vmcb12->save.cr3 & vcpu->arch.cr3_lm_rsvd_bits)) return false; } if (!kvm_is_valid_cr4(&svm->vcpu, vmcb12->save.cr4)) diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 0fe874ae5498..6e7d070f8b86 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -403,9 +403,6 @@ static inline bool gif_set(struct vcpu_svm *svm) } /* svm.c */ -#define MSR_CR3_LEGACY_RESERVED_MASK 0xfe7U -#define MSR_CR3_LEGACY_PAE_RESERVED_MASK 0x7U -#define MSR_CR3_LONG_MBZ_MASK 0xfff0000000000000U #define MSR_INVALID 0xffffffffU extern int sev; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 42b28d0f0311..c1650e26715b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9624,6 +9624,8 @@ static bool kvm_is_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) */ if (!(sregs->cr4 & X86_CR4_PAE) || !(sregs->efer & EFER_LMA)) return false; + if (sregs->cr3 & vcpu->arch.cr3_lm_rsvd_bits) + return false; } else { /* * Not in 64-bit mode: EFER.LMA is clear and the code From a900cac3750b9f0b8f5ed0503d9c6359532f644d Mon Sep 17 00:00:00 2001 From: Hermann Lauer Date: Thu, 28 Jan 2021 12:18:42 +0100 Subject: [PATCH 172/284] ARM: dts: sun7i: a20: bananapro: Fix ethernet phy-mode BPi Pro needs TX and RX delay for Gbit to work reliable and avoid high packet loss rates. The realtek phy driver overrides the settings of the pull ups for the delays, so fix this for BananaPro. Fix the phy-mode description to correctly reflect this so that the implementation doesn't reconfigure the delays incorrectly. This happened with commit bbc4d71d6354 ("net: phy: realtek: fix rtl8211e rx/tx delay config"). Fixes: 10662a33dcd9 ("ARM: dts: sun7i: Add dts file for Bananapro board") Signed-off-by: Hermann Lauer Signed-off-by: Maxime Ripard Link: https://lore.kernel.org/r/20210128111842.GA11919@lemon.iwr.uni-heidelberg.de --- arch/arm/boot/dts/sun7i-a20-bananapro.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/sun7i-a20-bananapro.dts b/arch/arm/boot/dts/sun7i-a20-bananapro.dts index 01ccff756996..5740f9442705 100644 --- a/arch/arm/boot/dts/sun7i-a20-bananapro.dts +++ b/arch/arm/boot/dts/sun7i-a20-bananapro.dts @@ -110,7 +110,7 @@ pinctrl-names = "default"; pinctrl-0 = <&gmac_rgmii_pins>; phy-handle = <&phy1>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-supply = <®_gmac_3v3>; status = "okay"; }; From 5638159f6d93b99ec9743ac7f65563fca3cf413d Mon Sep 17 00:00:00 2001 From: Alexandre Belloni Date: Wed, 3 Feb 2021 10:03:20 +0100 Subject: [PATCH 173/284] ARM: dts: lpc32xx: Revert set default clock rate of HCLK PLL This reverts commit c17e9377aa81664d94b4f2102559fcf2a01ec8e7. The lpc32xx clock driver is not able to actually change the PLL rate as this would require reparenting ARM_CLK, DDRAM_CLK, PERIPH_CLK to SYSCLK, then stop the PLL, update the register, restart the PLL and wait for the PLL to lock and finally reparent ARM_CLK, DDRAM_CLK, PERIPH_CLK to HCLK PLL. Currently, the HCLK driver simply updates the registers but this has no real effect and all the clock rate calculation end up being wrong. This is especially annoying for the peripheral (e.g. UARTs, I2C, SPI). Signed-off-by: Alexandre Belloni Tested-by: Gregory CLEMENT Link: https://lore.kernel.org/r/20210203090320.GA3760268@piout.net' Signed-off-by: Arnd Bergmann --- arch/arm/boot/dts/lpc32xx.dtsi | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/arm/boot/dts/lpc32xx.dtsi b/arch/arm/boot/dts/lpc32xx.dtsi index 3a5cfb0ddb20..c87066d6c995 100644 --- a/arch/arm/boot/dts/lpc32xx.dtsi +++ b/arch/arm/boot/dts/lpc32xx.dtsi @@ -326,9 +326,6 @@ clocks = <&xtal_32k>, <&xtal>; clock-names = "xtal_32k", "xtal"; - - assigned-clocks = <&clk LPC32XX_CLK_HCLK_PLL>; - assigned-clock-rates = <208000000>; }; }; From 3241929b67d28c83945d3191c6816a3271fd6b85 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pali=20Roh=C3=A1r?= Date: Mon, 1 Feb 2021 16:08:03 +0100 Subject: [PATCH 174/284] usb: host: xhci: mvebu: make USB 3.0 PHY optional for Armada 3720 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Older ATF does not provide SMC call for USB 3.0 phy power on functionality and therefore initialization of xhci-hcd is failing when older version of ATF is used. In this case phy_power_on() function returns -EOPNOTSUPP. [ 3.108467] mvebu-a3700-comphy d0018300.phy: unsupported SMC call, try updating your firmware [ 3.117250] phy phy-d0018300.phy.0: phy poweron failed --> -95 [ 3.123465] xhci-hcd: probe of d0058000.usb failed with error -95 This patch introduces a new plat_setup callback for xhci platform drivers which is called prior calling usb_add_hcd() function. This function at its beginning skips PHY init if hcd->skip_phy_initialization is set. Current init_quirk callback for xhci platform drivers is called from xhci_plat_setup() function which is called after chip reset completes. It happens in the middle of the usb_add_hcd() function and therefore this callback cannot be used for setting if PHY init should be skipped or not. For Armada 3720 this patch introduce a new xhci_mvebu_a3700_plat_setup() function configured as a xhci platform plat_setup callback. This new function calls phy_power_on() and in case it returns -EOPNOTSUPP then XHCI_SKIP_PHY_INIT quirk is set to instruct xhci-plat to skip PHY initialization. This patch fixes above failure by ignoring 'not supported' error in xhci-hcd driver. In this case it is expected that phy is already power on. It fixes initialization of xhci-hcd on Espressobin boards where is older Marvell's Arm Trusted Firmware without SMC call for USB 3.0 phy power. This is regression introduced in commit bd3d25b07342 ("arm64: dts: marvell: armada-37xx: link USB hosts with their PHYs") where USB 3.0 phy was defined and therefore xhci-hcd on Espressobin with older ATF started failing. Fixes: bd3d25b07342 ("arm64: dts: marvell: armada-37xx: link USB hosts with their PHYs") Cc: # 5.1+: ea17a0f153af: phy: marvell: comphy: Convert internal SMCC firmware return codes to errno Cc: # 5.1+: f768e718911e: usb: host: xhci-plat: add priv quirk for skip PHY initialization Tested-by: Tomasz Maciej Nowak Tested-by: Yoshihiro Shimoda # On R-Car Reviewed-by: Yoshihiro Shimoda # xhci-plat Acked-by: Mathias Nyman Signed-off-by: Pali Rohár Link: https://lore.kernel.org/r/20210201150803.7305-1-pali@kernel.org Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-mvebu.c | 42 +++++++++++++++++++++++++++++++++++ drivers/usb/host/xhci-mvebu.h | 6 +++++ drivers/usb/host/xhci-plat.c | 20 ++++++++++++++++- drivers/usb/host/xhci-plat.h | 1 + 4 files changed, 68 insertions(+), 1 deletion(-) diff --git a/drivers/usb/host/xhci-mvebu.c b/drivers/usb/host/xhci-mvebu.c index 60651a50770f..8ca1a235d164 100644 --- a/drivers/usb/host/xhci-mvebu.c +++ b/drivers/usb/host/xhci-mvebu.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include @@ -74,6 +75,47 @@ int xhci_mvebu_mbus_init_quirk(struct usb_hcd *hcd) return 0; } +int xhci_mvebu_a3700_plat_setup(struct usb_hcd *hcd) +{ + struct xhci_hcd *xhci = hcd_to_xhci(hcd); + struct device *dev = hcd->self.controller; + struct phy *phy; + int ret; + + /* Old bindings miss the PHY handle */ + phy = of_phy_get(dev->of_node, "usb3-phy"); + if (IS_ERR(phy) && PTR_ERR(phy) == -EPROBE_DEFER) + return -EPROBE_DEFER; + else if (IS_ERR(phy)) + goto phy_out; + + ret = phy_init(phy); + if (ret) + goto phy_put; + + ret = phy_set_mode(phy, PHY_MODE_USB_HOST_SS); + if (ret) + goto phy_exit; + + ret = phy_power_on(phy); + if (ret == -EOPNOTSUPP) { + /* Skip initializatin of XHCI PHY when it is unsupported by firmware */ + dev_warn(dev, "PHY unsupported by firmware\n"); + xhci->quirks |= XHCI_SKIP_PHY_INIT; + } + if (ret) + goto phy_exit; + + phy_power_off(phy); +phy_exit: + phy_exit(phy); +phy_put: + of_phy_put(phy); +phy_out: + + return 0; +} + int xhci_mvebu_a3700_init_quirk(struct usb_hcd *hcd) { struct xhci_hcd *xhci = hcd_to_xhci(hcd); diff --git a/drivers/usb/host/xhci-mvebu.h b/drivers/usb/host/xhci-mvebu.h index 3be021793cc8..01bf3fcb3eca 100644 --- a/drivers/usb/host/xhci-mvebu.h +++ b/drivers/usb/host/xhci-mvebu.h @@ -12,6 +12,7 @@ struct usb_hcd; #if IS_ENABLED(CONFIG_USB_XHCI_MVEBU) int xhci_mvebu_mbus_init_quirk(struct usb_hcd *hcd); +int xhci_mvebu_a3700_plat_setup(struct usb_hcd *hcd); int xhci_mvebu_a3700_init_quirk(struct usb_hcd *hcd); #else static inline int xhci_mvebu_mbus_init_quirk(struct usb_hcd *hcd) @@ -19,6 +20,11 @@ static inline int xhci_mvebu_mbus_init_quirk(struct usb_hcd *hcd) return 0; } +static inline int xhci_mvebu_a3700_plat_setup(struct usb_hcd *hcd) +{ + return 0; +} + static inline int xhci_mvebu_a3700_init_quirk(struct usb_hcd *hcd) { return 0; diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index 4d34f6005381..c1edcc9b13ce 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -44,6 +44,16 @@ static void xhci_priv_plat_start(struct usb_hcd *hcd) priv->plat_start(hcd); } +static int xhci_priv_plat_setup(struct usb_hcd *hcd) +{ + struct xhci_plat_priv *priv = hcd_to_xhci_priv(hcd); + + if (!priv->plat_setup) + return 0; + + return priv->plat_setup(hcd); +} + static int xhci_priv_init_quirk(struct usb_hcd *hcd) { struct xhci_plat_priv *priv = hcd_to_xhci_priv(hcd); @@ -111,6 +121,7 @@ static const struct xhci_plat_priv xhci_plat_marvell_armada = { }; static const struct xhci_plat_priv xhci_plat_marvell_armada3700 = { + .plat_setup = xhci_mvebu_a3700_plat_setup, .init_quirk = xhci_mvebu_a3700_init_quirk, }; @@ -330,7 +341,14 @@ static int xhci_plat_probe(struct platform_device *pdev) hcd->tpl_support = of_usb_host_tpl_support(sysdev->of_node); xhci->shared_hcd->tpl_support = hcd->tpl_support; - if (priv && (priv->quirks & XHCI_SKIP_PHY_INIT)) + + if (priv) { + ret = xhci_priv_plat_setup(hcd); + if (ret) + goto disable_usb_phy; + } + + if ((xhci->quirks & XHCI_SKIP_PHY_INIT) || (priv && (priv->quirks & XHCI_SKIP_PHY_INIT))) hcd->skip_phy_initialization = 1; if (priv && (priv->quirks & XHCI_SG_TRB_CACHE_SIZE_QUIRK)) diff --git a/drivers/usb/host/xhci-plat.h b/drivers/usb/host/xhci-plat.h index 1fb149d1fbce..561d0b7bce09 100644 --- a/drivers/usb/host/xhci-plat.h +++ b/drivers/usb/host/xhci-plat.h @@ -13,6 +13,7 @@ struct xhci_plat_priv { const char *firmware_name; unsigned long long quirks; + int (*plat_setup)(struct usb_hcd *); void (*plat_start)(struct usb_hcd *); int (*init_quirk)(struct usb_hcd *); int (*suspend_quirk)(struct usb_hcd *); From 7f1b11ba3564a391169420d98162987a12d0795d Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Thu, 28 Jan 2021 20:28:56 +0100 Subject: [PATCH 175/284] tools/power/turbostat: Fallback to an MSR read for EPB Commit 6d6501d912a9 ("tools/power/turbostat: Read energy_perf_bias from sysfs") converted turbostat to read the energy_perf_bias value from sysfs. However, older kernels which do not have that file yet, would fail. For those, fall back to the MSR reading. Fixes: 6d6501d912a9 ("tools/power/turbostat: Read energy_perf_bias from sysfs") Reported-by: Artem Bityutskiy Signed-off-by: Borislav Petkov Tested-by: Artem Bityutskiy Link: https://lkml.kernel.org/r/20210127132444.981120-1-dedekind1@gmail.com --- tools/power/x86/turbostat/turbostat.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index 389ea5209a83..a7c4f0772e53 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -1834,12 +1834,15 @@ int get_mp(int cpu, struct msr_counter *mp, unsigned long long *counterp) int get_epb(int cpu) { char path[128 + PATH_BYTES]; + unsigned long long msr; int ret, epb = -1; FILE *fp; sprintf(path, "/sys/devices/system/cpu/cpu%d/power/energy_perf_bias", cpu); - fp = fopen_or_die(path, "r"); + fp = fopen(path, "r"); + if (!fp) + goto msr_fallback; ret = fscanf(fp, "%d", &epb); if (ret != 1) @@ -1848,6 +1851,11 @@ int get_epb(int cpu) fclose(fp); return epb; + +msr_fallback: + get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS, &msr); + + return msr & 0xf; } void get_apic_id(struct thread_data *t) From 89e3becd8f821e507052e012d2559dcda59f538e Mon Sep 17 00:00:00 2001 From: Dave Jiang Date: Mon, 1 Feb 2021 08:26:14 -0700 Subject: [PATCH 176/284] dmaengine: idxd: check device state before issue command Add device state check before executing command. Without the check the command can be issued while device is in halt state and causes the driver to block while waiting for the completion of the command. Reported-by: Sanjay Kumar Signed-off-by: Dave Jiang Tested-by: Sanjay Kumar Fixes: 0d5c10b4c84d ("dmaengine: idxd: add work queue drain support") Link: https://lore.kernel.org/r/161219313921.2976211.12222625226450097465.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul --- drivers/dma/idxd/device.c | 23 ++++++++++++++++++++++- drivers/dma/idxd/idxd.h | 2 +- drivers/dma/idxd/init.c | 5 ++++- 3 files changed, 27 insertions(+), 3 deletions(-) diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 95f94a3ed6be..84a6ea60ecf0 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -398,17 +398,31 @@ static inline bool idxd_is_enabled(struct idxd_device *idxd) return false; } +static inline bool idxd_device_is_halted(struct idxd_device *idxd) +{ + union gensts_reg gensts; + + gensts.bits = ioread32(idxd->reg_base + IDXD_GENSTATS_OFFSET); + + return (gensts.state == IDXD_DEVICE_STATE_HALT); +} + /* * This is function is only used for reset during probe and will * poll for completion. Once the device is setup with interrupts, * all commands will be done via interrupt completion. */ -void idxd_device_init_reset(struct idxd_device *idxd) +int idxd_device_init_reset(struct idxd_device *idxd) { struct device *dev = &idxd->pdev->dev; union idxd_command_reg cmd; unsigned long flags; + if (idxd_device_is_halted(idxd)) { + dev_warn(&idxd->pdev->dev, "Device is HALTED!\n"); + return -ENXIO; + } + memset(&cmd, 0, sizeof(cmd)); cmd.cmd = IDXD_CMD_RESET_DEVICE; dev_dbg(dev, "%s: sending reset for init.\n", __func__); @@ -419,6 +433,7 @@ void idxd_device_init_reset(struct idxd_device *idxd) IDXD_CMDSTS_ACTIVE) cpu_relax(); spin_unlock_irqrestore(&idxd->dev_lock, flags); + return 0; } static void idxd_cmd_exec(struct idxd_device *idxd, int cmd_code, u32 operand, @@ -428,6 +443,12 @@ static void idxd_cmd_exec(struct idxd_device *idxd, int cmd_code, u32 operand, DECLARE_COMPLETION_ONSTACK(done); unsigned long flags; + if (idxd_device_is_halted(idxd)) { + dev_warn(&idxd->pdev->dev, "Device is HALTED!\n"); + *status = IDXD_CMDSTS_HW_ERR; + return; + } + memset(&cmd, 0, sizeof(cmd)); cmd.cmd = cmd_code; cmd.operand = operand; diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 5a50e91c71bf..81a0e65fd316 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -326,7 +326,7 @@ void idxd_mask_msix_vector(struct idxd_device *idxd, int vec_id); void idxd_unmask_msix_vector(struct idxd_device *idxd, int vec_id); /* device control */ -void idxd_device_init_reset(struct idxd_device *idxd); +int idxd_device_init_reset(struct idxd_device *idxd); int idxd_device_enable(struct idxd_device *idxd); int idxd_device_disable(struct idxd_device *idxd); void idxd_device_reset(struct idxd_device *idxd); diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 2c051e07c34c..fa04acd5582a 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -335,7 +335,10 @@ static int idxd_probe(struct idxd_device *idxd) int rc; dev_dbg(dev, "%s entered and resetting device\n", __func__); - idxd_device_init_reset(idxd); + rc = idxd_device_init_reset(idxd); + if (rc < 0) + return rc; + dev_dbg(dev, "IDXD reset complete\n"); if (IS_ENABLED(CONFIG_INTEL_IDXD_SVM)) { From d4a610635400ccc382792f6be69427078541c678 Mon Sep 17 00:00:00 2001 From: Mathias Nyman Date: Wed, 3 Feb 2021 13:37:02 +0200 Subject: [PATCH 177/284] xhci: fix bounce buffer usage for non-sg list case xhci driver may in some special cases need to copy small amounts of payload data to a bounce buffer in order to meet the boundary and alignment restrictions set by the xHCI specification. In the majority of these cases the data is in a sg list, and driver incorrectly assumed data is always in urb->sg when using the bounce buffer. If data instead is contiguous, and in urb->transfer_buffer, we may still need to bounce buffer a small part if data starts very close (less than packet size) to a 64k boundary. Check if sg list is used before copying data to/from it. Fixes: f9c589e142d0 ("xhci: TD-fragment, align the unsplittable case with a bounce buffer") Cc: stable@vger.kernel.org Reported-by: Andreas Hartmann Tested-by: Andreas Hartmann Signed-off-by: Mathias Nyman Link: https://lore.kernel.org/r/20210203113702.436762-2-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-ring.c | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index cf0c93a90200..89c3be9917f6 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -699,11 +699,16 @@ static void xhci_unmap_td_bounce_buffer(struct xhci_hcd *xhci, dma_unmap_single(dev, seg->bounce_dma, ring->bounce_buf_len, DMA_FROM_DEVICE); /* for in tranfers we need to copy the data from bounce to sg */ - len = sg_pcopy_from_buffer(urb->sg, urb->num_sgs, seg->bounce_buf, - seg->bounce_len, seg->bounce_offs); - if (len != seg->bounce_len) - xhci_warn(xhci, "WARN Wrong bounce buffer read length: %zu != %d\n", - len, seg->bounce_len); + if (urb->num_sgs) { + len = sg_pcopy_from_buffer(urb->sg, urb->num_sgs, seg->bounce_buf, + seg->bounce_len, seg->bounce_offs); + if (len != seg->bounce_len) + xhci_warn(xhci, "WARN Wrong bounce buffer read length: %zu != %d\n", + len, seg->bounce_len); + } else { + memcpy(urb->transfer_buffer + seg->bounce_offs, seg->bounce_buf, + seg->bounce_len); + } seg->bounce_len = 0; seg->bounce_offs = 0; } @@ -3277,12 +3282,16 @@ static int xhci_align_td(struct xhci_hcd *xhci, struct urb *urb, u32 enqd_len, /* create a max max_pkt sized bounce buffer pointed to by last trb */ if (usb_urb_dir_out(urb)) { - len = sg_pcopy_to_buffer(urb->sg, urb->num_sgs, - seg->bounce_buf, new_buff_len, enqd_len); - if (len != new_buff_len) - xhci_warn(xhci, - "WARN Wrong bounce buffer write length: %zu != %d\n", - len, new_buff_len); + if (urb->num_sgs) { + len = sg_pcopy_to_buffer(urb->sg, urb->num_sgs, + seg->bounce_buf, new_buff_len, enqd_len); + if (len != new_buff_len) + xhci_warn(xhci, "WARN Wrong bounce buffer write length: %zu != %d\n", + len, new_buff_len); + } else { + memcpy(seg->bounce_buf, urb->transfer_buffer + enqd_len, new_buff_len); + } + seg->bounce_dma = dma_map_single(dev, seg->bounce_buf, max_pkt, DMA_TO_DEVICE); } else { From 548f1191d86ccb9bde2a5305988877b7584c01eb Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov Date: Tue, 2 Feb 2021 23:06:36 -0800 Subject: [PATCH 178/284] bpf: Unbreak BPF_PROG_TYPE_KPROBE when kprobe is called via do_int3 The commit 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()") converted do_int3 handler to be "NMI-like". That made old if (in_nmi()) check abort execution of bpf programs attached to kprobe when kprobe is firing via int3 (For example when kprobe is placed in the middle of the function). Remove the check to restore user visible behavior. Fixes: 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()") Reported-by: Nikolay Borisov Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Tested-by: Nikolay Borisov Reviewed-by: Masami Hiramatsu Link: https://lore.kernel.org/bpf/20210203070636.70926-1-alexei.starovoitov@gmail.com --- kernel/trace/bpf_trace.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 6c0018abe68a..764400260eb6 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -96,9 +96,6 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx) { unsigned int ret; - if (in_nmi()) /* not supported yet */ - return 1; - cant_sleep(); if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) { From cb8563f5c735a042ea2dd7df1ad55ae06d63ffeb Mon Sep 17 00:00:00 2001 From: Sagi Grimberg Date: Wed, 3 Feb 2021 01:20:25 -0800 Subject: [PATCH 179/284] nvmet-tcp: fix out-of-bounds access when receiving multiple h2cdata PDUs When the host sends multiple h2cdata PDUs, we keep track on the receive progress and calculate the scatterlist index and offsets. The issue is that sg_offset should only be kept for the first iov entry we map in the iovec as this is the difference between our cursor and the sg entry offset itself. In addition, the sg index was calculated wrong because we should not round up when dividing the command byte offset with PAG_SIZE. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Reported-by: Narayan Ayalasomayajula Tested-by: Narayan Ayalasomayajula Signed-off-by: Sagi Grimberg Signed-off-by: Christoph Hellwig --- drivers/nvme/target/tcp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index dc1f0f647189..aacf06f0b431 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -305,7 +305,7 @@ static void nvmet_tcp_map_pdu_iovec(struct nvmet_tcp_cmd *cmd) length = cmd->pdu_len; cmd->nr_mapped = DIV_ROUND_UP(length, PAGE_SIZE); offset = cmd->rbytes_done; - cmd->sg_idx = DIV_ROUND_UP(offset, PAGE_SIZE); + cmd->sg_idx = offset / PAGE_SIZE; sg_offset = offset % PAGE_SIZE; sg = &cmd->req.sg[cmd->sg_idx]; @@ -318,6 +318,7 @@ static void nvmet_tcp_map_pdu_iovec(struct nvmet_tcp_cmd *cmd) length -= iov_len; sg = sg_next(sg); iov++; + sg_offset = 0; } iov_iter_kvec(&cmd->recv_msg.msg_iter, READ, cmd->iov, From 6183f4d3a0a2ad230511987c6c362ca43ec0055f Mon Sep 17 00:00:00 2001 From: Bui Quang Minh Date: Wed, 27 Jan 2021 06:36:53 +0000 Subject: [PATCH 180/284] bpf: Check for integer overflow when using roundup_pow_of_two() On 32-bit architecture, roundup_pow_of_two() can return 0 when the argument has upper most bit set due to resulting 1UL << 32. Add a check for this case. Fixes: d5a3b1f69186 ("bpf: introduce BPF_MAP_TYPE_STACK_TRACE") Signed-off-by: Bui Quang Minh Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20210127063653.3576-1-minhquangbui99@gmail.com --- kernel/bpf/stackmap.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index aea96b638473..bfafbf115bf3 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -115,6 +115,8 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr) /* hash table size must be power of 2 */ n_buckets = roundup_pow_of_two(attr->max_entries); + if (!n_buckets) + return ERR_PTR(-E2BIG); cost = n_buckets * sizeof(struct stack_map_bucket *) + sizeof(*smap); cost += n_buckets * (value_size + sizeof(struct stack_map_bucket)); From f295c8cfec833c2707ff1512da10d65386dde7af Mon Sep 17 00:00:00 2001 From: Dave Airlie Date: Mon, 1 Feb 2021 10:56:32 +1000 Subject: [PATCH 181/284] drm/nouveau: fix dma syncing warning with debugging on. Since I wrote the below patch if you run a debug kernel you can a dma debug warning like: nouveau 0000:1f:00.0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000016e012000] [size=4096 bytes] The old nouveau code wasn't consolidate the pages like the ttm code, but the dma-debug expects the sync code to give it the same base/range pairs as the allocator. Fix the nouveau sync code to consolidate pages before calling the sync code. Fixes: bd549d35b4be0 ("nouveau: use ttm populate mapping functions. (v2)") Reported-by: Lyude Paul Reviewed-by: Ben Skeggs Signed-off-by: Dave Airlie Link: https://patchwork.freedesktop.org/patch/417588/ --- drivers/gpu/drm/nouveau/nouveau_bo.c | 35 +++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index c85b1af06b7b..7ea367a5444d 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -547,7 +547,7 @@ nouveau_bo_sync_for_device(struct nouveau_bo *nvbo) { struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev); struct ttm_tt *ttm_dma = (struct ttm_tt *)nvbo->bo.ttm; - int i; + int i, j; if (!ttm_dma) return; @@ -556,10 +556,21 @@ nouveau_bo_sync_for_device(struct nouveau_bo *nvbo) if (nvbo->force_coherent) return; - for (i = 0; i < ttm_dma->num_pages; i++) + for (i = 0; i < ttm_dma->num_pages; ++i) { + struct page *p = ttm_dma->pages[i]; + size_t num_pages = 1; + + for (j = i + 1; j < ttm_dma->num_pages; ++j) { + if (++p != ttm_dma->pages[j]) + break; + + ++num_pages; + } dma_sync_single_for_device(drm->dev->dev, ttm_dma->dma_address[i], - PAGE_SIZE, DMA_TO_DEVICE); + num_pages * PAGE_SIZE, DMA_TO_DEVICE); + i += num_pages; + } } void @@ -567,7 +578,7 @@ nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo) { struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev); struct ttm_tt *ttm_dma = (struct ttm_tt *)nvbo->bo.ttm; - int i; + int i, j; if (!ttm_dma) return; @@ -576,9 +587,21 @@ nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo) if (nvbo->force_coherent) return; - for (i = 0; i < ttm_dma->num_pages; i++) + for (i = 0; i < ttm_dma->num_pages; ++i) { + struct page *p = ttm_dma->pages[i]; + size_t num_pages = 1; + + for (j = i + 1; j < ttm_dma->num_pages; ++j) { + if (++p != ttm_dma->pages[j]) + break; + + ++num_pages; + } + dma_sync_single_for_cpu(drm->dev->dev, ttm_dma->dma_address[i], - PAGE_SIZE, DMA_FROM_DEVICE); + num_pages * PAGE_SIZE, DMA_FROM_DEVICE); + i += num_pages; + } } void nouveau_bo_add_io_reserve_lru(struct ttm_buffer_object *bo) From a4dc7eee9106a9d2a6e08b442db19677aa9699c7 Mon Sep 17 00:00:00 2001 From: Christoph Schemmel Date: Tue, 2 Feb 2021 09:45:23 +0100 Subject: [PATCH 182/284] NET: usb: qmi_wwan: Adding support for Cinterion MV31 Adding support for Cinterion MV31 with PID 0x00B7. T: Bus=04 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 11 Spd=5000 MxCh= 0 D: Ver= 3.20 Cls=ef(misc ) Sub=02 Prot=01 MxPS= 9 #Cfgs= 1 P: Vendor=1e2d ProdID=00b7 Rev=04.14 S: Manufacturer=Cinterion S: Product=Cinterion USB Mobile Broadband S: SerialNumber=b3246eed C: #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=896mA I: If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan I: If#=0x1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#=0x3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option Signed-off-by: Christoph Schemmel Link: https://lore.kernel.org/r/20210202084523.4371-1-christoph.schemmel@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index cc4819282820..5a05add9b4e6 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1309,6 +1309,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x1e2d, 0x0082, 5)}, /* Cinterion PHxx,PXxx (2 RmNet) */ {QMI_FIXED_INTF(0x1e2d, 0x0083, 4)}, /* Cinterion PHxx,PXxx (1 RmNet + USB Audio)*/ {QMI_QUIRK_SET_DTR(0x1e2d, 0x00b0, 4)}, /* Cinterion CLS8 */ + {QMI_FIXED_INTF(0x1e2d, 0x00b7, 0)}, /* Cinterion MV31 RmNet */ {QMI_FIXED_INTF(0x413c, 0x81a2, 8)}, /* Dell Wireless 5806 Gobi(TM) 4G LTE Mobile Broadband Card */ {QMI_FIXED_INTF(0x413c, 0x81a3, 8)}, /* Dell Wireless 5570 HSPA+ (42Mbps) Mobile Broadband Card */ {QMI_FIXED_INTF(0x413c, 0x81a4, 8)}, /* Dell Wireless 5570e HSPA+ (42Mbps) Mobile Broadband Card */ From b1bdde33b72366da20d10770ab7a49fe87b5e190 Mon Sep 17 00:00:00 2001 From: Jozsef Kadlecsik Date: Fri, 29 Jan 2021 20:57:43 +0100 Subject: [PATCH 183/284] netfilter: xt_recent: Fix attempt to update deleted entry When both --reap and --update flag are specified, there's a code path at which the entry to be updated is reaped beforehand, which then leads to kernel crash. Reap only entries which won't be updated. Fixes kernel bugzilla #207773. Link: https://bugzilla.kernel.org/show_bug.cgi?id=207773 Reported-by: Reindl Harald Fixes: 0079c5aee348 ("netfilter: xt_recent: add an entry reaper") Signed-off-by: Jozsef Kadlecsik Signed-off-by: Pablo Neira Ayuso --- net/netfilter/xt_recent.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c index 606411869698..0446307516cd 100644 --- a/net/netfilter/xt_recent.c +++ b/net/netfilter/xt_recent.c @@ -152,7 +152,8 @@ static void recent_entry_remove(struct recent_table *t, struct recent_entry *e) /* * Drop entries with timestamps older then 'time'. */ -static void recent_entry_reap(struct recent_table *t, unsigned long time) +static void recent_entry_reap(struct recent_table *t, unsigned long time, + struct recent_entry *working, bool update) { struct recent_entry *e; @@ -161,6 +162,12 @@ static void recent_entry_reap(struct recent_table *t, unsigned long time) */ e = list_entry(t->lru_list.next, struct recent_entry, lru_list); + /* + * Do not reap the entry which are going to be updated. + */ + if (e == working && update) + return; + /* * The last time stamp is the most recent. */ @@ -303,7 +310,8 @@ recent_mt(const struct sk_buff *skb, struct xt_action_param *par) /* info->seconds must be non-zero */ if (info->check_set & XT_RECENT_REAP) - recent_entry_reap(t, time); + recent_entry_reap(t, time, e, + info->check_set & XT_RECENT_UPDATE && ret); } if (info->check_set & XT_RECENT_SET || From a3005b0f83f217c888393c6bf9cd36e3d1616bca Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Sat, 30 Jan 2021 11:14:25 +0100 Subject: [PATCH 184/284] selftests: netfilter: fix current year use date %Y instead of %G to read current year Problem appeared when running lkp-tests on 01/01/2021 Fixes: 48d072c4e8cd ("selftests: netfilter: add time counter check") Reported-by: kernel test robot Signed-off-by: Fabian Frederick Signed-off-by: Pablo Neira Ayuso --- tools/testing/selftests/netfilter/nft_meta.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/netfilter/nft_meta.sh b/tools/testing/selftests/netfilter/nft_meta.sh index 087f0e6e71ce..f33154c04d34 100755 --- a/tools/testing/selftests/netfilter/nft_meta.sh +++ b/tools/testing/selftests/netfilter/nft_meta.sh @@ -23,7 +23,7 @@ ip -net "$ns0" addr add 127.0.0.1 dev lo trap cleanup EXIT -currentyear=$(date +%G) +currentyear=$(date +%Y) lastyear=$((currentyear-1)) ip netns exec "$ns0" nft -f /dev/stdin < Date: Tue, 2 Feb 2021 16:07:37 +0100 Subject: [PATCH 185/284] netfilter: nftables: fix possible UAF over chains from packet path in netns Although hooks are released via call_rcu(), chain and rule objects are immediately released while packets are still walking over these bits. This patch adds the .pre_exit callback which is invoked before synchronize_rcu() in the netns framework to stay safe. Remove a comment which is not valid anymore since the core does not use synchronize_net() anymore since 8c873e219970 ("netfilter: core: free hooks with call_rcu"). Suggested-by: Florian Westphal Fixes: df05ef874b28 ("netfilter: nf_tables: release objects on netns destruction") Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_tables_api.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 8d3aa97b52e7..43fe80f10313 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -8949,6 +8949,17 @@ int __nft_release_basechain(struct nft_ctx *ctx) } EXPORT_SYMBOL_GPL(__nft_release_basechain); +static void __nft_release_hooks(struct net *net) +{ + struct nft_table *table; + struct nft_chain *chain; + + list_for_each_entry(table, &net->nft.tables, list) { + list_for_each_entry(chain, &table->chains, list) + nf_tables_unregister_hook(net, table, chain); + } +} + static void __nft_release_tables(struct net *net) { struct nft_flowtable *flowtable, *nf; @@ -8964,10 +8975,6 @@ static void __nft_release_tables(struct net *net) list_for_each_entry_safe(table, nt, &net->nft.tables, list) { ctx.family = table->family; - - list_for_each_entry(chain, &table->chains, list) - nf_tables_unregister_hook(net, table, chain); - /* No packets are walking on these chains anymore. */ ctx.table = table; list_for_each_entry(chain, &table->chains, list) { ctx.chain = chain; @@ -9016,6 +9023,11 @@ static int __net_init nf_tables_init_net(struct net *net) return 0; } +static void __net_exit nf_tables_pre_exit_net(struct net *net) +{ + __nft_release_hooks(net); +} + static void __net_exit nf_tables_exit_net(struct net *net) { mutex_lock(&net->nft.commit_mutex); @@ -9029,8 +9041,9 @@ static void __net_exit nf_tables_exit_net(struct net *net) } static struct pernet_operations nf_tables_net_ops = { - .init = nf_tables_init_net, - .exit = nf_tables_exit_net, + .init = nf_tables_init_net, + .pre_exit = nf_tables_pre_exit_net, + .exit = nf_tables_exit_net, }; static int __init nf_tables_module_init(void) From 8d6bca156e47d68551750a384b3ff49384c67be3 Mon Sep 17 00:00:00 2001 From: Sven Auhagen Date: Tue, 2 Feb 2021 18:01:16 +0100 Subject: [PATCH 186/284] netfilter: flowtable: fix tcp and udp header checksum update When updating the tcp or udp header checksum on port nat the function inet_proto_csum_replace2 with the last parameter pseudohdr as true. This leads to an error in the case that GRO is used and packets are split up in GSO. The tcp or udp checksum of all packets is incorrect. The error is probably masked due to the fact the most network driver implement tcp/udp checksum offloading. It also only happens when GRO is applied and not on single packets. The error is most visible when using a pppoe connection which is not triggering the tcp/udp checksum offload. Fixes: ac2a66665e23 ("netfilter: add generic flow table infrastructure") Signed-off-by: Sven Auhagen Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_flow_table_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index 513f78db3cb2..4a4acbba78ff 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -399,7 +399,7 @@ static int nf_flow_nat_port_tcp(struct sk_buff *skb, unsigned int thoff, return -1; tcph = (void *)(skb_network_header(skb) + thoff); - inet_proto_csum_replace2(&tcph->check, skb, port, new_port, true); + inet_proto_csum_replace2(&tcph->check, skb, port, new_port, false); return 0; } @@ -415,7 +415,7 @@ static int nf_flow_nat_port_udp(struct sk_buff *skb, unsigned int thoff, udph = (void *)(skb_network_header(skb) + thoff); if (udph->check || skb->ip_summed == CHECKSUM_PARTIAL) { inet_proto_csum_replace2(&udph->check, skb, port, - new_port, true); + new_port, false); if (!udph->check) udph->check = CSUM_MANGLED_0; } From 2a80c15812372e554474b1dba0b1d8e467af295d Mon Sep 17 00:00:00 2001 From: Sabyrzhan Tasbolatov Date: Tue, 2 Feb 2021 15:20:59 +0600 Subject: [PATCH 187/284] net/qrtr: restrict user-controlled length in qrtr_tun_write_iter() syzbot found WARNING in qrtr_tun_write_iter [1] when write_iter length exceeds KMALLOC_MAX_SIZE causing order >= MAX_ORDER condition. Additionally, there is no check for 0 length write. [1] WARNING: mm/page_alloc.c:5011 [..] Call Trace: alloc_pages_current+0x18c/0x2a0 mm/mempolicy.c:2267 alloc_pages include/linux/gfp.h:547 [inline] kmalloc_order+0x2e/0xb0 mm/slab_common.c:837 kmalloc_order_trace+0x14/0x120 mm/slab_common.c:853 kmalloc include/linux/slab.h:557 [inline] kzalloc include/linux/slab.h:682 [inline] qrtr_tun_write_iter+0x8a/0x180 net/qrtr/tun.c:83 call_write_iter include/linux/fs.h:1901 [inline] Reported-by: syzbot+c2a7e5c5211605a90865@syzkaller.appspotmail.com Signed-off-by: Sabyrzhan Tasbolatov Link: https://lore.kernel.org/r/20210202092059.1361381-1-snovitoll@gmail.com Signed-off-by: Jakub Kicinski --- net/qrtr/tun.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/qrtr/tun.c b/net/qrtr/tun.c index 15ce9b642b25..b238c40a9984 100644 --- a/net/qrtr/tun.c +++ b/net/qrtr/tun.c @@ -80,6 +80,12 @@ static ssize_t qrtr_tun_write_iter(struct kiocb *iocb, struct iov_iter *from) ssize_t ret; void *kbuf; + if (!len) + return -EINVAL; + + if (len > KMALLOC_MAX_SIZE) + return -ENOMEM; + kbuf = kzalloc(len, GFP_KERNEL); if (!kbuf) return -ENOMEM; From d795cc02a297df80910cf4ba23147680d15d8a7d Mon Sep 17 00:00:00 2001 From: Vadim Fedorenko Date: Wed, 3 Feb 2021 23:37:14 +0300 Subject: [PATCH 188/284] selftests/tls: fix selftest with CHACHA20-POLY1305 TLS selftests were broken also because of use of structure that was not exported to UAPI. Fix by defining the union in tests. Fixes: 4f336e88a870 (selftests/tls: add CHACHA20-POLY1305 to tls selftests) Reported-by: Rong Chen Signed-off-by: Vadim Fedorenko Link: https://lore.kernel.org/r/1612384634-5377-1-git-send-email-vfedorenko@novek.ru Signed-off-by: Jakub Kicinski --- tools/testing/selftests/net/tls.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c index e0088c2d38a5..426d07875a48 100644 --- a/tools/testing/selftests/net/tls.c +++ b/tools/testing/selftests/net/tls.c @@ -133,7 +133,10 @@ FIXTURE_VARIANT_ADD(tls, 13_chacha) FIXTURE_SETUP(tls) { - union tls_crypto_context tls12; + union { + struct tls12_crypto_info_aes_gcm_128 aes128; + struct tls12_crypto_info_chacha20_poly1305 chacha20; + } tls12; struct sockaddr_in addr; socklen_t len; int sfd, ret; @@ -143,14 +146,16 @@ FIXTURE_SETUP(tls) len = sizeof(addr); memset(&tls12, 0, sizeof(tls12)); - tls12.info.version = variant->tls_version; - tls12.info.cipher_type = variant->cipher_type; switch (variant->cipher_type) { case TLS_CIPHER_CHACHA20_POLY1305: - tls12_sz = sizeof(tls12_crypto_info_chacha20_poly1305); + tls12_sz = sizeof(struct tls12_crypto_info_chacha20_poly1305); + tls12.chacha20.info.version = variant->tls_version; + tls12.chacha20.info.cipher_type = variant->cipher_type; break; case TLS_CIPHER_AES_GCM_128: - tls12_sz = sizeof(tls12_crypto_info_aes_gcm_128); + tls12_sz = sizeof(struct tls12_crypto_info_aes_gcm_128); + tls12.aes128.info.version = variant->tls_version; + tls12.aes128.info.cipher_type = variant->cipher_type; break; default: tls12_sz = 0; From 87aa9ec939ec7277b730786e19c161c9194cc8ca Mon Sep 17 00:00:00 2001 From: Ben Gardon Date: Tue, 2 Feb 2021 10:57:16 -0800 Subject: [PATCH 189/284] KVM: x86/mmu: Fix TDP MMU zap collapsible SPTEs There is a bug in the TDP MMU function to zap SPTEs which could be replaced with a larger mapping which prevents the function from doing anything. Fix this by correctly zapping the last level SPTEs. Cc: stable@vger.kernel.org Fixes: 14881998566d ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU") Signed-off-by: Ben Gardon Message-Id: <20210202185734.1680553-11-bgardon@google.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/tdp_mmu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 2ef8615f9dba..b56d604809b8 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1049,8 +1049,8 @@ bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot) } /* - * Clear non-leaf entries (and free associated page tables) which could - * be replaced by large mappings, for GFNs within the slot. + * Clear leaf entries which could be replaced by large mappings, for + * GFNs within the slot. */ static void zap_collapsible_spte_range(struct kvm *kvm, struct kvm_mmu_page *root, @@ -1062,7 +1062,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm, tdp_root_for_each_pte(iter, root, start, end) { if (!is_shadow_present_pte(iter.old_spte) || - is_last_spte(iter.old_spte, iter.level)) + !is_last_spte(iter.old_spte, iter.level)) continue; pfn = spte_to_pfn(iter.old_spte); From d7e10d47691d1702db1cd1edcc689d3031eefc67 Mon Sep 17 00:00:00 2001 From: Xiaoguang Wang Date: Thu, 4 Feb 2021 17:20:56 +0800 Subject: [PATCH 190/284] io_uring: don't modify identity's files uncess identity is cowed Abaci Robot reported following panic: BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 800000010ef3f067 P4D 800000010ef3f067 PUD 10d9df067 PMD 0 Oops: 0002 [#1] SMP PTI CPU: 0 PID: 1869 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc3+ #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:put_files_struct+0x1b/0x120 Code: 24 18 c7 00 f4 ff ff ff e9 4d fd ff ff 66 90 0f 1f 44 00 00 41 57 41 56 49 89 fe 41 55 41 54 55 53 48 83 ec 08 e8 b5 6b db ff 41 ff 0e 74 13 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f e9 9c RSP: 0000:ffffc90002147d48 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88810d9a5300 RCX: 0000000000000000 RDX: ffff88810d87c280 RSI: ffffffff8144ba6b RDI: 0000000000000000 RBP: 0000000000000080 R08: 0000000000000001 R09: ffffffff81431500 R10: ffff8881001be000 R11: 0000000000000000 R12: ffff88810ac2f800 R13: ffff88810af38a00 R14: 0000000000000000 R15: ffff8881057130c0 FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000010dbaa002 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __io_clean_op+0x10c/0x2a0 io_dismantle_req+0x3c7/0x600 __io_free_req+0x34/0x280 io_put_req+0x63/0xb0 io_worker_handle_work+0x60e/0x830 ? io_wqe_worker+0x135/0x520 io_wqe_worker+0x158/0x520 ? __kthread_parkme+0x96/0xc0 ? io_worker_handle_work+0x830/0x830 kthread+0x134/0x180 ? kthread_create_worker_on_cpu+0x90/0x90 ret_from_fork+0x1f/0x30 Modules linked in: CR2: 0000000000000000 ---[ end trace c358ca86af95b1e7 ]--- I guess case below can trigger above panic: there're two threads which operates different io_uring ctxs and share same sqthread identity, and later one thread exits, io_uring_cancel_task_requests() will clear task->io_uring->identity->files to be NULL in sqpoll mode, then another ctx that uses same identity will panic. Indeed we don't need to clear task->io_uring->identity->files here, io_grab_identity() should handle identity->files changes well, if task->io_uring->identity->files is not equal to current->files, io_cow_identity() should handle this changes well. Cc: stable@vger.kernel.org # 5.5+ Reported-by: Abaci Robot Signed-off-by: Xiaoguang Wang Reviewed-by: Pavel Begunkov Signed-off-by: Jens Axboe --- fs/io_uring.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 38c6cbe1ab38..5d3348d66f06 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -8982,12 +8982,6 @@ static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx, if ((ctx->flags & IORING_SETUP_SQPOLL) && ctx->sq_data) { atomic_dec(&task->io_uring->in_idle); - /* - * If the files that are going away are the ones in the thread - * identity, clear them out. - */ - if (task->io_uring->identity->files == files) - task->io_uring->identity->files = NULL; io_sq_thread_unpark(ctx->sq_data); } } From 031b91a5fe6f1ce61b7617614ddde9ed61e252be Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Wed, 3 Feb 2021 16:01:06 -0800 Subject: [PATCH 191/284] KVM: x86: Set so called 'reserved CR3 bits in LM mask' at vCPU reset Set cr3_lm_rsvd_bits, which is effectively an invalid GPA mask, at vCPU reset. The reserved bits check needs to be done even if userspace never configures the guest's CPUID model. Cc: stable@vger.kernel.org Fixes: 0107973a80ad ("KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch") Signed-off-by: Sean Christopherson Message-Id: <20210204000117.3303214-2-seanjc@google.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c1650e26715b..1b404e4d7dd8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10003,6 +10003,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) fx_init(vcpu); vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu); + vcpu->arch.cr3_lm_rsvd_bits = rsvd_bits(cpuid_maxphyaddr(vcpu), 63); vcpu->arch.pat = MSR_IA32_CR_PAT_DEFAULT; From 5c279c4cf206e03995e04fd3404fa95ffd243a97 Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Thu, 4 Feb 2021 20:12:37 +0200 Subject: [PATCH 192/284] Revert "x86/setup: don't remove E820_TYPE_RAM for pfn 0" This reverts commit bde9cfa3afe4324ec251e4af80ebf9b7afaf7afe. Changing the first memory page type from E820_TYPE_RESERVED to E820_TYPE_RAM makes it a part of "System RAM" resource rather than a reserved resource and this in turn causes devmem_is_allowed() to treat is as area that can be accessed but it is filled with zeroes instead of the actual data as previously. The change in /dev/mem output causes lilo to fail as was reported at slakware users forum, and probably other legacy applications will experience similar problems. Link: https://www.linuxquestions.org/questions/slackware-14/slackware-current-lilo-vesa-warnings-after-recent-updates-4175689617/#post6214439 Signed-off-by: Mike Rapoport Cc: stable@kernel.org Signed-off-by: Linus Torvalds --- arch/x86/kernel/setup.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 3412c4595efd..740f3bdb3f61 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -660,6 +660,17 @@ static void __init trim_platform_memory_ranges(void) static void __init trim_bios_range(void) { + /* + * A special case is the first 4Kb of memory; + * This is a BIOS owned area, not kernel ram, but generally + * not listed as such in the E820 table. + * + * This typically reserves additional memory (64KiB by default) + * since some BIOSes are known to corrupt low memory. See the + * Kconfig help text for X86_RESERVE_LOW. + */ + e820__range_update(0, PAGE_SIZE, E820_TYPE_RAM, E820_TYPE_RESERVED); + /* * special case: Some BIOSes report the PC BIOS * area (640Kb -> 1Mb) as RAM even though it is not. @@ -717,15 +728,6 @@ early_param("reservelow", parse_reservelow); static void __init trim_low_memory_range(void) { - /* - * A special case is the first 4Kb of memory; - * This is a BIOS owned area, not kernel ram, but generally - * not listed as such in the E820 table. - * - * This typically reserves additional memory (64KiB by default) - * since some BIOSes are known to corrupt low memory. See the - * Kconfig help text for X86_RESERVE_LOW. - */ memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE)); } From 25a068b8e9a4eb193d755d58efcb3c98928636e0 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Thu, 5 Mar 2020 09:47:08 -0800 Subject: [PATCH 193/284] x86/apic: Add extra serialization for non-serializing MSRs Jan Kiszka reported that the x2apic_wrmsr_fence() function uses a plain MFENCE while the Intel SDM (10.12.3 MSR Access in x2APIC Mode) calls for MFENCE; LFENCE. Short summary: we have special MSRs that have weaker ordering than all the rest. Add fencing consistent with current SDM recommendations. This is not known to cause any issues in practice, only in theory. Longer story below: The reason the kernel uses a different semantic is that the SDM changed (roughly in late 2017). The SDM changed because folks at Intel were auditing all of the recommended fences in the SDM and realized that the x2apic fences were insufficient. Why was the pain MFENCE judged insufficient? WRMSR itself is normally a serializing instruction. No fences are needed because the instruction itself serializes everything. But, there are explicit exceptions for this serializing behavior written into the WRMSR instruction documentation for two classes of MSRs: IA32_TSC_DEADLINE and the X2APIC MSRs. Back to x2apic: WRMSR is *not* serializing in this specific case. But why is MFENCE insufficient? MFENCE makes writes visible, but only affects load/store instructions. WRMSR is unfortunately not a load/store instruction and is unaffected by MFENCE. This means that a non-serializing WRMSR could be reordered by the CPU to execute before the writes made visible by the MFENCE have even occurred in the first place. This means that an x2apic IPI could theoretically be triggered before there is any (visible) data to process. Does this affect anything in practice? I honestly don't know. It seems quite possible that by the time an interrupt gets to consume the (not yet) MFENCE'd data, it has become visible, mostly by accident. To be safe, add the SDM-recommended fences for all x2apic WRMSRs. This also leaves open the question of the _other_ weakly-ordered WRMSR: MSR_IA32_TSC_DEADLINE. While it has the same ordering architecture as the x2APIC MSRs, it seems substantially less likely to be a problem in practice. While writes to the in-memory Local Vector Table (LVT) might theoretically be reordered with respect to a weakly-ordered WRMSR like TSC_DEADLINE, the SDM has this to say: In x2APIC mode, the WRMSR instruction is used to write to the LVT entry. The processor ensures the ordering of this write and any subsequent WRMSR to the deadline; no fencing is required. But, that might still leave xAPIC exposed. The safest thing to do for now is to add the extra, recommended LFENCE. [ bp: Massage commit message, fix typos, drop accidentally added newline to tools/arch/x86/include/asm/barrier.h. ] Reported-by: Jan Kiszka Signed-off-by: Dave Hansen Signed-off-by: Borislav Petkov Acked-by: Peter Zijlstra (Intel) Acked-by: Thomas Gleixner Cc: Link: https://lkml.kernel.org/r/20200305174708.F77040DD@viggo.jf.intel.com --- arch/x86/include/asm/apic.h | 10 ---------- arch/x86/include/asm/barrier.h | 18 ++++++++++++++++++ arch/x86/kernel/apic/apic.c | 4 ++++ arch/x86/kernel/apic/x2apic_cluster.c | 6 ++++-- arch/x86/kernel/apic/x2apic_phys.c | 9 ++++++--- 5 files changed, 32 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 34cb3c159481..412b51e059c8 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -197,16 +197,6 @@ static inline bool apic_needs_pit(void) { return true; } #endif /* !CONFIG_X86_LOCAL_APIC */ #ifdef CONFIG_X86_X2APIC -/* - * Make previous memory operations globally visible before - * sending the IPI through x2apic wrmsr. We need a serializing instruction or - * mfence for this. - */ -static inline void x2apic_wrmsr_fence(void) -{ - asm volatile("mfence" : : : "memory"); -} - static inline void native_apic_msr_write(u32 reg, u32 v) { if (reg == APIC_DFR || reg == APIC_ID || reg == APIC_LDR || diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index 7f828fe49797..4819d5e5a335 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -84,4 +84,22 @@ do { \ #include +/* + * Make previous memory operations globally visible before + * a WRMSR. + * + * MFENCE makes writes visible, but only affects load/store + * instructions. WRMSR is unfortunately not a load/store + * instruction and is unaffected by MFENCE. The LFENCE ensures + * that the WRMSR is not reordered. + * + * Most WRMSRs are full serializing instructions themselves and + * do not require this barrier. This is only required for the + * IA32_TSC_DEADLINE and X2APIC MSRs. + */ +static inline void weak_wrmsr_fence(void) +{ + asm volatile("mfence; lfence" : : : "memory"); +} + #endif /* _ASM_X86_BARRIER_H */ diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 6bd20c0de8bc..7f4c081f59f0 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include #include @@ -477,6 +478,9 @@ static int lapic_next_deadline(unsigned long delta, { u64 tsc; + /* This MSR is special and need a special fence: */ + weak_wrmsr_fence(); + tsc = rdtsc(); wrmsrl(MSR_IA32_TSC_DEADLINE, tsc + (((u64) delta) * TSC_DIVISOR)); return 0; diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c index df6adc5674c9..f4da9bb69a88 100644 --- a/arch/x86/kernel/apic/x2apic_cluster.c +++ b/arch/x86/kernel/apic/x2apic_cluster.c @@ -29,7 +29,8 @@ static void x2apic_send_IPI(int cpu, int vector) { u32 dest = per_cpu(x86_cpu_to_logical_apicid, cpu); - x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence(); __x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL); } @@ -41,7 +42,8 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest) unsigned long flags; u32 dest; - x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence(); local_irq_save(flags); tmpmsk = this_cpu_cpumask_var_ptr(ipi_mask); diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c index 0e4e81971567..6bde05a86b4e 100644 --- a/arch/x86/kernel/apic/x2apic_phys.c +++ b/arch/x86/kernel/apic/x2apic_phys.c @@ -43,7 +43,8 @@ static void x2apic_send_IPI(int cpu, int vector) { u32 dest = per_cpu(x86_cpu_to_apicid, cpu); - x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence(); __x2apic_send_IPI_dest(dest, vector, APIC_DEST_PHYSICAL); } @@ -54,7 +55,8 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest) unsigned long this_cpu; unsigned long flags; - x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence(); local_irq_save(flags); @@ -125,7 +127,8 @@ void __x2apic_send_IPI_shorthand(int vector, u32 which) { unsigned long cfg = __prepare_ICR(which, vector, 0); - x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence(); native_x2apic_icr_write(cfg, 0); } From ec7d8e7dd3a59528e305a18e93f1cb98f7faf83b Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Tue, 2 Feb 2021 08:09:38 +0100 Subject: [PATCH 194/284] xen/netback: avoid race in xenvif_rx_ring_slots_available() Since commit 23025393dbeb3b8b3 ("xen/netback: use lateeoi irq binding") xenvif_rx_ring_slots_available() is no longer called only from the rx queue kernel thread, so it needs to access the rx queue with the associated queue held. Reported-by: Igor Druzhinin Fixes: 23025393dbeb3b8b3 ("xen/netback: use lateeoi irq binding") Signed-off-by: Juergen Gross Acked-by: Wei Liu Link: https://lore.kernel.org/r/20210202070938.7863-1-jgross@suse.com Signed-off-by: Jakub Kicinski --- drivers/net/xen-netback/rx.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/xen-netback/rx.c b/drivers/net/xen-netback/rx.c index b8febe1d1bfd..accc991d153f 100644 --- a/drivers/net/xen-netback/rx.c +++ b/drivers/net/xen-netback/rx.c @@ -38,10 +38,15 @@ static bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue) RING_IDX prod, cons; struct sk_buff *skb; int needed; + unsigned long flags; + + spin_lock_irqsave(&queue->rx_queue.lock, flags); skb = skb_peek(&queue->rx_queue); - if (!skb) + if (!skb) { + spin_unlock_irqrestore(&queue->rx_queue.lock, flags); return false; + } needed = DIV_ROUND_UP(skb->len, XEN_PAGE_SIZE); if (skb_is_gso(skb)) @@ -49,6 +54,8 @@ static bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue) if (skb->sw_hash) needed++; + spin_unlock_irqrestore(&queue->rx_queue.lock, flags); + do { prod = queue->rx.sring->req_prod; cons = queue->rx.req_cons; From aec18a57edad562d620f7d19016de1fc0cc2208c Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Thu, 4 Feb 2021 19:22:46 +0000 Subject: [PATCH 195/284] io_uring: drop mm/files between task_work_submit Since SQPOLL task can be shared and so task_work entries can be a mix of them, we need to drop mm and files before trying to issue next request. Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe --- fs/io_uring.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 5d3348d66f06..1f68105a41ed 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2205,6 +2205,9 @@ static void __io_req_task_submit(struct io_kiocb *req) else __io_req_task_cancel(req, -EFAULT); mutex_unlock(&ctx->uring_lock); + + if (ctx->flags & IORING_SETUP_SQPOLL) + io_sq_thread_drop_mm_files(); } static void io_req_task_submit(struct callback_head *cb) From 3401e4aa43a540881cc97190afead650e709c418 Mon Sep 17 00:00:00 2001 From: Raju Rangoju Date: Tue, 2 Feb 2021 23:55:11 +0530 Subject: [PATCH 196/284] cxgb4: Add new T6 PCI device id 0x6092 Signed-off-by: Raju Rangoju Link: https://lore.kernel.org/r/20210202182511.8109-1-rajur@chelsio.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h index 0c5373462ced..0b1b5f9c67d4 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h @@ -219,6 +219,7 @@ CH_PCI_DEVICE_ID_TABLE_DEFINE_BEGIN CH_PCI_ID_TABLE_FENTRY(0x6089), /* Custom T62100-KR */ CH_PCI_ID_TABLE_FENTRY(0x608a), /* Custom T62100-CR */ CH_PCI_ID_TABLE_FENTRY(0x608b), /* Custom T6225-CR */ + CH_PCI_ID_TABLE_FENTRY(0x6092), /* Custom T62100-CR-LOM */ CH_PCI_DEVICE_ID_TABLE_DEFINE_END; #endif /* __T4_PCI_ID_TBL_H__ */ From 7b5eab57cac45e270a0ad624ba157c5b30b3d44d Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 3 Feb 2021 08:47:56 +0000 Subject: [PATCH 197/284] rxrpc: Fix clearance of Tx/Rx ring when releasing a call At the end of rxrpc_release_call(), rxrpc_cleanup_ring() is called to clear the Rx/Tx skbuff ring, but this doesn't lock the ring whilst it's accessing it. Unfortunately, rxrpc_resend() might be trying to retransmit a packet concurrently with this - and whilst it does lock the ring, this isn't protection against rxrpc_cleanup_call(). Fix this by removing the call to rxrpc_cleanup_ring() from rxrpc_release_call(). rxrpc_cleanup_ring() will be called again anyway from rxrpc_cleanup_call(). The earlier call is just an optimisation to recycle skbuffs more quickly. Alternative solutions include rxrpc_release_call() could try to cancel the work item or wait for it to complete or rxrpc_cleanup_ring() could lock when accessing the ring (which would require a bh lock). This can produce a report like the following: BUG: KASAN: use-after-free in rxrpc_send_data_packet+0x19b4/0x1e70 net/rxrpc/output.c:372 Read of size 4 at addr ffff888011606e04 by task kworker/0:0/5 ... Workqueue: krxrpcd rxrpc_process_call Call Trace: ... kasan_report.cold+0x79/0xd5 mm/kasan/report.c:413 rxrpc_send_data_packet+0x19b4/0x1e70 net/rxrpc/output.c:372 rxrpc_resend net/rxrpc/call_event.c:266 [inline] rxrpc_process_call+0x1634/0x1f60 net/rxrpc/call_event.c:412 process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275 ... Allocated by task 2318: ... sock_alloc_send_pskb+0x793/0x920 net/core/sock.c:2348 rxrpc_send_data+0xb51/0x2bf0 net/rxrpc/sendmsg.c:358 rxrpc_do_sendmsg+0xc03/0x1350 net/rxrpc/sendmsg.c:744 rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:560 ... Freed by task 2318: ... kfree_skb+0x140/0x3f0 net/core/skbuff.c:704 rxrpc_free_skb+0x11d/0x150 net/rxrpc/skbuff.c:78 rxrpc_cleanup_ring net/rxrpc/call_object.c:485 [inline] rxrpc_release_call+0x5dd/0x860 net/rxrpc/call_object.c:552 rxrpc_release_calls_on_socket+0x21c/0x300 net/rxrpc/call_object.c:579 rxrpc_release_sock net/rxrpc/af_rxrpc.c:885 [inline] rxrpc_release+0x263/0x5a0 net/rxrpc/af_rxrpc.c:916 __sock_release+0xcd/0x280 net/socket.c:597 ... The buggy address belongs to the object at ffff888011606dc0 which belongs to the cache skbuff_head_cache of size 232 Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Reported-by: syzbot+174de899852504e4a74a@syzkaller.appspotmail.com Reported-by: syzbot+3d1c772efafd3c38d007@syzkaller.appspotmail.com Signed-off-by: David Howells cc: Hillf Danton Link: https://lore.kernel.org/r/161234207610.653119.5287360098400436976.stgit@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski --- net/rxrpc/call_object.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index c845594b663f..4eb91d958a48 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -548,8 +548,6 @@ void rxrpc_release_call(struct rxrpc_sock *rx, struct rxrpc_call *call) rxrpc_disconnect_call(call); if (call->security) call->security->free_call_crypto(call); - - rxrpc_cleanup_ring(call); _leave(""); } From 81b8be68ef8e8915d0cc6cedd2ac425c74a24813 Mon Sep 17 00:00:00 2001 From: Xie He Date: Tue, 2 Feb 2021 23:15:41 -0800 Subject: [PATCH 198/284] net: hdlc_x25: Return meaningful error code in x25_open It's not meaningful to pass on LAPB error codes to HDLC code or other parts of the system, because they will not understand the error codes. Instead, use system-wide recognizable error codes. Fixes: f362e5fe0f1f ("wan/hdlc_x25: make lapb params configurable") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xie He Acked-by: Martin Schiller Link: https://lore.kernel.org/r/20210203071541.86138-1-xie.he.0141@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/wan/hdlc_x25.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/wan/hdlc_x25.c b/drivers/net/wan/hdlc_x25.c index bb164805804e..4aaa6388b9ee 100644 --- a/drivers/net/wan/hdlc_x25.c +++ b/drivers/net/wan/hdlc_x25.c @@ -169,11 +169,11 @@ static int x25_open(struct net_device *dev) result = lapb_register(dev, &cb); if (result != LAPB_OK) - return result; + return -ENOMEM; result = lapb_getparms(dev, ¶ms); if (result != LAPB_OK) - return result; + return -EINVAL; if (state(hdlc)->settings.dce) params.mode = params.mode | LAPB_DCE; @@ -188,7 +188,7 @@ static int x25_open(struct net_device *dev) result = lapb_setparms(dev, ¶ms); if (result != LAPB_OK) - return result; + return -EINVAL; return 0; } From 1d23a56b0296d29e7047b41fe0a42a001036160d Mon Sep 17 00:00:00 2001 From: Alex Elder Date: Wed, 3 Feb 2021 19:06:55 -0600 Subject: [PATCH 199/284] net: ipa: set error code in gsi_channel_setup() In gsi_channel_setup(), we check to see if the configuration data contains any information about channels that are not supported by the hardware. If one is found, we abort the setup process, but the error code (ret) is not set in this case. Fix this bug. Fixes: 650d1603825d8 ("soc: qcom: ipa: the generic software interface") Reported-by: Dan Carpenter Signed-off-by: Alex Elder Link: https://lore.kernel.org/r/20210204010655.15619-1-elder@linaro.org Signed-off-by: Jakub Kicinski --- drivers/net/ipa/gsi.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ipa/gsi.c b/drivers/net/ipa/gsi.c index 34e5f2155d62..b77f5fef7aec 100644 --- a/drivers/net/ipa/gsi.c +++ b/drivers/net/ipa/gsi.c @@ -1710,6 +1710,7 @@ static int gsi_channel_setup(struct gsi *gsi) if (!channel->gsi) continue; /* Ignore uninitialized channels */ + ret = -EINVAL; dev_err(gsi->dev, "channel %u not supported by hardware\n", channel_id - 1); channel_id = gsi->channel_count; From 52cbd23a119c6ebf40a527e53f3402d2ea38eccb Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Wed, 3 Feb 2021 14:29:52 -0500 Subject: [PATCH 200/284] udp: fix skb_copy_and_csum_datagram with odd segment sizes When iteratively computing a checksum with csum_block_add, track the offset "pos" to correctly rotate in csum_block_add when offset is odd. The open coded implementation of skb_copy_and_csum_datagram did this. With the switch to __skb_datagram_iter calling csum_and_copy_to_iter, pos was reinitialized to 0 on each call. Bring back the pos by passing it along with the csum to the callback. Changes v1->v2 - pass csum value, instead of csump pointer (Alexander Duyck) Link: https://lore.kernel.org/netdev/20210128152353.GB27281@optiplex/ Fixes: 950fcaecd5cc ("datagram: consolidate datagram copy to iter helpers") Reported-by: Oliver Graute Signed-off-by: Willem de Bruijn Reviewed-by: Alexander Duyck Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/20210203192952.1849843-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/uio.h | 8 +++++++- lib/iov_iter.c | 24 ++++++++++++++---------- net/core/datagram.c | 12 ++++++++++-- 3 files changed, 31 insertions(+), 13 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 72d88566694e..27ff8eb786dc 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -260,7 +260,13 @@ static inline void iov_iter_reexpand(struct iov_iter *i, size_t count) { i->count = count; } -size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump, struct iov_iter *i); + +struct csum_state { + __wsum csum; + size_t off; +}; + +size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csstate, struct iov_iter *i); size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i); bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i); size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp, diff --git a/lib/iov_iter.c b/lib/iov_iter.c index a21e6a5792c5..f0b2ccb1bb01 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -592,14 +592,15 @@ static __wsum csum_and_memcpy(void *to, const void *from, size_t len, } static size_t csum_and_copy_to_pipe_iter(const void *addr, size_t bytes, - __wsum *csum, struct iov_iter *i) + struct csum_state *csstate, + struct iov_iter *i) { struct pipe_inode_info *pipe = i->pipe; unsigned int p_mask = pipe->ring_size - 1; + __wsum sum = csstate->csum; + size_t off = csstate->off; unsigned int i_head; size_t n, r; - size_t off = 0; - __wsum sum = *csum; if (!sanity(i)) return 0; @@ -621,7 +622,8 @@ static size_t csum_and_copy_to_pipe_iter(const void *addr, size_t bytes, i_head++; } while (n); i->count -= bytes; - *csum = sum; + csstate->csum = sum; + csstate->off = off; return bytes; } @@ -1522,18 +1524,19 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, } EXPORT_SYMBOL(csum_and_copy_from_iter_full); -size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump, +size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate, struct iov_iter *i) { + struct csum_state *csstate = _csstate; const char *from = addr; - __wsum *csum = csump; __wsum sum, next; - size_t off = 0; + size_t off; if (unlikely(iov_iter_is_pipe(i))) - return csum_and_copy_to_pipe_iter(addr, bytes, csum, i); + return csum_and_copy_to_pipe_iter(addr, bytes, _csstate, i); - sum = *csum; + sum = csstate->csum; + off = csstate->off; if (unlikely(iov_iter_is_discard(i))) { WARN_ON(1); /* for now */ return 0; @@ -1561,7 +1564,8 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump, off += v.iov_len; }) ) - *csum = sum; + csstate->csum = sum; + csstate->off = off; return bytes; } EXPORT_SYMBOL(csum_and_copy_to_iter); diff --git a/net/core/datagram.c b/net/core/datagram.c index 81809fa735a7..15ab9ffb27fe 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -721,8 +721,16 @@ static int skb_copy_and_csum_datagram(const struct sk_buff *skb, int offset, struct iov_iter *to, int len, __wsum *csump) { - return __skb_datagram_iter(skb, offset, to, len, true, - csum_and_copy_to_iter, csump); + struct csum_state csdata = { .csum = *csump }; + int ret; + + ret = __skb_datagram_iter(skb, offset, to, len, true, + csum_and_copy_to_iter, &csdata); + if (ret) + return ret; + + *csump = csdata.csum; + return 0; } /** From 12bc8dfb83b5292fe387b795210018b7632ee08b Mon Sep 17 00:00:00 2001 From: "Andrea Parri (Microsoft)" Date: Wed, 3 Feb 2021 12:36:02 +0100 Subject: [PATCH 201/284] hv_netvsc: Reset the RSC count if NVSP_STAT_FAIL in netvsc_receive() Commit 44144185951a0f ("hv_netvsc: Add validation for untrusted Hyper-V values") added validation to rndis_filter_receive_data() (and rndis_filter_receive()) which introduced NVSP_STAT_FAIL-scenarios where the count is not updated/reset. Fix this omission, and prevent similar scenarios from occurring in the future. Reported-by: Juan Vazquez Signed-off-by: Andrea Parri (Microsoft) Fixes: 44144185951a0f ("hv_netvsc: Add validation for untrusted Hyper-V values") Reviewed-by: Jesse Brandeburg Link: https://lore.kernel.org/r/20210203113602.558916-1-parri.andrea@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/hyperv/netvsc.c | 5 ++++- drivers/net/hyperv/rndis_filter.c | 2 -- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 2350342b961f..13bd48a75db7 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -1262,8 +1262,11 @@ static int netvsc_receive(struct net_device *ndev, ret = rndis_filter_receive(ndev, net_device, nvchan, data, buflen); - if (unlikely(ret != NVSP_STAT_SUCCESS)) + if (unlikely(ret != NVSP_STAT_SUCCESS)) { + /* Drop incomplete packet */ + nvchan->rsc.cnt = 0; status = NVSP_STAT_FAIL; + } } enq_receive_complete(ndev, net_device, q_idx, diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c index 598713c0d5a8..3aab2b867fc0 100644 --- a/drivers/net/hyperv/rndis_filter.c +++ b/drivers/net/hyperv/rndis_filter.c @@ -509,8 +509,6 @@ static int rndis_filter_receive_data(struct net_device *ndev, return ret; drop: - /* Drop incomplete packet */ - nvchan->rsc.cnt = 0; return NVSP_STAT_FAIL; } From 07bf34a50e327975b21a9dee64d220c3dcb72ee9 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 4 Feb 2021 15:45:11 +0200 Subject: [PATCH 202/284] net: enetc: initialize the RFS and RSS memories Michael tried to enable Advanced Error Reporting through the ENETC's Root Complex Event Collector, and the system started spitting out single bit correctable ECC errors coming from the ENETC interfaces: pcieport 0000:00:1f.0: AER: Multiple Corrected error received: 0000:00:00.0 fsl_enetc 0000:00:00.0: PCIe Bus Error: severity=Corrected, type=Transaction Layer, (Receiver ID) fsl_enetc 0000:00:00.0: device [1957:e100] error status/mask=00004000/00000000 fsl_enetc 0000:00:00.0: [14] CorrIntErr fsl_enetc 0000:00:00.1: PCIe Bus Error: severity=Corrected, type=Transaction Layer, (Receiver ID) fsl_enetc 0000:00:00.1: device [1957:e100] error status/mask=00004000/00000000 fsl_enetc 0000:00:00.1: [14] CorrIntErr Further investigating the port correctable memory error detect register (PCMEDR) shows that these AER errors have an associated SOURCE_ID of 6 (RFS/RSS): $ devmem 0x1f8010e10 32 0xC0000006 $ devmem 0x1f8050e10 32 0xC0000006 Discussion with the hardware design engineers reveals that on LS1028A, the hardware does not do initialization of that RFS/RSS memory, and that software should clear/initialize the entire table before starting to operate. That comes as a bit of a surprise, since the driver does not do initialization of the RFS memory. Also, the initialization of the Receive Side Scaling is done only partially. Even though the entire ENETC IP has a single shared flow steering memory, the flow steering service should returns matches only for TCAM entries that are within the range of the Station Interface that is doing the search. Therefore, it should be sufficient for a Station Interface to initialize all of its own entries in order to avoid any ECC errors, and only the Station Interfaces in use should need initialization. There are Physical Station Interfaces associated with PCIe PFs and Virtual Station Interfaces associated with PCIe VFs. We let the PF driver initialize the entire port's memory, which includes the RFS entries which are going to be used by the VF. Reported-by: Michael Walle Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: Vladimir Oltean Tested-by: Michael Walle Reviewed-by: Jesse Brandeburg Link: https://lore.kernel.org/r/20210204134511.2640309-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- .../net/ethernet/freescale/enetc/enetc_hw.h | 2 + .../net/ethernet/freescale/enetc/enetc_pf.c | 59 +++++++++++++++++++ 2 files changed, 61 insertions(+) diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h index e1e950d48c92..c71fe8d751d5 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h +++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h @@ -196,6 +196,8 @@ enum enetc_bdr_type {TX, RX}; #define ENETC_CBS_BW_MASK GENMASK(6, 0) #define ENETC_PTCCBSR1(n) (0x1114 + (n) * 8) /* n = 0 to 7*/ #define ENETC_RSSHASH_KEY_SIZE 40 +#define ENETC_PRSSCAPR 0x1404 +#define ENETC_PRSSCAPR_GET_NUM_RSS(val) (BIT((val) & 0xf) * 32) #define ENETC_PRSSK(n) (0x1410 + (n) * 4) /* n = [0..9] */ #define ENETC_PSIVLANFMR 0x1700 #define ENETC_PSIVLANFMR_VS BIT(0) diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c index ed8fcb8b486e..3eb5f1375bd4 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c @@ -996,6 +996,51 @@ static void enetc_phylink_destroy(struct enetc_ndev_priv *priv) phylink_destroy(priv->phylink); } +/* Initialize the entire shared memory for the flow steering entries + * of this port (PF + VFs) + */ +static int enetc_init_port_rfs_memory(struct enetc_si *si) +{ + struct enetc_cmd_rfse rfse = {0}; + struct enetc_hw *hw = &si->hw; + int num_rfs, i, err = 0; + u32 val; + + val = enetc_port_rd(hw, ENETC_PRFSCAPR); + num_rfs = ENETC_PRFSCAPR_GET_NUM_RFS(val); + + for (i = 0; i < num_rfs; i++) { + err = enetc_set_fs_entry(si, &rfse, i); + if (err) + break; + } + + return err; +} + +static int enetc_init_port_rss_memory(struct enetc_si *si) +{ + struct enetc_hw *hw = &si->hw; + int num_rss, err; + int *rss_table; + u32 val; + + val = enetc_port_rd(hw, ENETC_PRSSCAPR); + num_rss = ENETC_PRSSCAPR_GET_NUM_RSS(val); + if (!num_rss) + return 0; + + rss_table = kcalloc(num_rss, sizeof(*rss_table), GFP_KERNEL); + if (!rss_table) + return -ENOMEM; + + err = enetc_set_rss_table(si, rss_table, num_rss); + + kfree(rss_table); + + return err; +} + static int enetc_pf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) { @@ -1051,6 +1096,18 @@ static int enetc_pf_probe(struct pci_dev *pdev, goto err_alloc_si_res; } + err = enetc_init_port_rfs_memory(si); + if (err) { + dev_err(&pdev->dev, "Failed to initialize RFS memory\n"); + goto err_init_port_rfs; + } + + err = enetc_init_port_rss_memory(si); + if (err) { + dev_err(&pdev->dev, "Failed to initialize RSS memory\n"); + goto err_init_port_rss; + } + err = enetc_alloc_msix(priv); if (err) { dev_err(&pdev->dev, "MSIX alloc failed\n"); @@ -1079,6 +1136,8 @@ err_phylink_create: enetc_mdiobus_destroy(pf); err_mdiobus_create: enetc_free_msix(priv); +err_init_port_rss: +err_init_port_rfs: err_alloc_msix: enetc_free_si_resources(priv); err_alloc_si_res: From 8fd54a73b7cda11548154451bdb4bde6d8ff74c7 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 4 Feb 2021 18:33:51 +0200 Subject: [PATCH 203/284] net: dsa: call teardown method on probe failure Since teardown is supposed to undo the effects of the setup method, it should be called in the error path for dsa_switch_setup, not just in dsa_switch_teardown. Fixes: 5e3f847a02aa ("net: dsa: Add teardown callback for drivers") Signed-off-by: Vladimir Oltean Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Link: https://lore.kernel.org/r/20210204163351.2929670-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- net/dsa/dsa2.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c index a47e0f9b20d0..a04fd637b4cd 100644 --- a/net/dsa/dsa2.c +++ b/net/dsa/dsa2.c @@ -462,20 +462,23 @@ static int dsa_switch_setup(struct dsa_switch *ds) ds->slave_mii_bus = devm_mdiobus_alloc(ds->dev); if (!ds->slave_mii_bus) { err = -ENOMEM; - goto unregister_notifier; + goto teardown; } dsa_slave_mii_bus_init(ds); err = mdiobus_register(ds->slave_mii_bus); if (err < 0) - goto unregister_notifier; + goto teardown; } ds->setup = true; return 0; +teardown: + if (ds->ops->teardown) + ds->ops->teardown(ds); unregister_notifier: dsa_switch_unregister_notifier(ds); unregister_devlink_ports: From 647b8dd5184665432cc8a2b5bca46a201f690c37 Mon Sep 17 00:00:00 2001 From: Vadim Fedorenko Date: Thu, 4 Feb 2021 20:50:34 +0300 Subject: [PATCH 204/284] selftests: txtimestamp: fix compilation issue PACKET_TX_TIMESTAMP is defined in if_packet.h but it is not included in test. Include it instead of otherwise the error of redefinition arrives. Also fix the compiler warning about ambiguous control flow by adding explicit braces. Fixes: 8fe2f761cae9 ("net-timestamp: expand documentation") Suggested-by: Willem de Bruijn Signed-off-by: Vadim Fedorenko Acked-by: Willem de Bruijn Link: https://lore.kernel.org/r/1612461034-24524-1-git-send-email-vfedorenko@novek.ru Signed-off-by: Jakub Kicinski --- tools/testing/selftests/net/txtimestamp.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/net/txtimestamp.c b/tools/testing/selftests/net/txtimestamp.c index 490a8cca708a..fabb1d555ee5 100644 --- a/tools/testing/selftests/net/txtimestamp.c +++ b/tools/testing/selftests/net/txtimestamp.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -34,7 +35,6 @@ #include #include #include -#include #include #include #include @@ -495,12 +495,12 @@ static void do_test(int family, unsigned int report_opt) total_len = cfg_payload_len; if (cfg_use_pf_packet || cfg_proto == SOCK_RAW) { total_len += sizeof(struct udphdr); - if (cfg_use_pf_packet || cfg_ipproto == IPPROTO_RAW) + if (cfg_use_pf_packet || cfg_ipproto == IPPROTO_RAW) { if (family == PF_INET) total_len += sizeof(struct iphdr); else total_len += sizeof(struct ipv6hdr); - + } /* special case, only rawv6_sendmsg: * pass proto in sin6_port if not connected * also see ANK comment in net/ipv4/raw.c From 315da87c0f99a4741a639782d59dae44878199f5 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Wed, 3 Feb 2021 16:52:39 +0900 Subject: [PATCH 205/284] kbuild: fix duplicated flags in DEBUG_CFLAGS Sedat Dilek noticed duplicated flags in DEBUG_CFLAGS when building deb-pkg with CONFIG_DEBUG_INFO. For example, 'make CC=clang bindeb-pkg' reproduces the issue. Kbuild recurses to the top Makefile for some targets such as package builds. With commit 121c5d08d53c ("kbuild: Only add -fno-var-tracking-assignments for old GCC versions") applied, DEBUG_CFLAGS is now reset only when CONFIG_CC_IS_GCC=y. Fix it to reset DEBUG_CFLAGS all the time. Fixes: 121c5d08d53c ("kbuild: Only add -fno-var-tracking-assignments for old GCC versions") Reported-by: Sedat Dilek Signed-off-by: Masahiro Yamada Tested-by: Sedat Dilek Reviewed-by: Mark Wielaard Reviewed-by: Nathan Chancellor --- Makefile | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 89217e4e68c6..97c781cf042b 100644 --- a/Makefile +++ b/Makefile @@ -811,10 +811,12 @@ KBUILD_CFLAGS += -ftrivial-auto-var-init=zero KBUILD_CFLAGS += -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang endif +DEBUG_CFLAGS := + # Workaround for GCC versions < 5.0 # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61801 ifdef CONFIG_CC_IS_GCC -DEBUG_CFLAGS := $(call cc-ifversion, -lt, 0500, $(call cc-option, -fno-var-tracking-assignments)) +DEBUG_CFLAGS += $(call cc-ifversion, -lt, 0500, $(call cc-option, -fno-var-tracking-assignments)) endif ifdef CONFIG_DEBUG_INFO From efe6e3068067212b85c2d0474b5ee3b2d0c7adab Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 4 Feb 2021 16:29:47 +0100 Subject: [PATCH 206/284] kallsyms: fix nonconverging kallsyms table with lld ARM randconfig builds with lld sometimes show a build failure from kallsyms: Inconsistent kallsyms data Try make KALLSYMS_EXTRA_PASS=1 as a workaround The problem is the veneers/thunks getting added by the linker extend the symbol table, which in turn leads to more veneers being needed, so it may take a few extra iterations to converge. This bug has been fixed multiple times before, but comes back every time a new symbol name is used. lld uses a different set of identifiers from ld.bfd, so the additional ones need to be added as well. I looked through the sources and found that arm64 and mips define similar prefixes, so I'm adding those as well, aside from the ones I observed. I'm not sure about powerpc64, which seems to already be handled through a section match, but if it comes back, the "__long_branch_" and "__plt_" prefixes would have to get added as well. Signed-off-by: Arnd Bergmann Signed-off-by: Masahiro Yamada --- scripts/kallsyms.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c index 7ecd2ccba531..54ad86d13784 100644 --- a/scripts/kallsyms.c +++ b/scripts/kallsyms.c @@ -112,6 +112,12 @@ static bool is_ignored_symbol(const char *name, char type) "__crc_", /* modversions */ "__efistub_", /* arm64 EFI stub namespace */ "__kvm_nvhe_", /* arm64 non-VHE KVM namespace */ + "__AArch64ADRPThunk_", /* arm64 lld */ + "__ARMV5PILongThunk_", /* arm lld */ + "__ARMV7PILongThunk_", + "__ThumbV7PILongThunk_", + "__LA25Thunk_", /* mips lld */ + "__microLA25Thunk_", NULL }; From 0e5a3c8284a30f4c43fd81d7285528ece74563b5 Mon Sep 17 00:00:00 2001 From: Gary Bisson Date: Mon, 25 Jan 2021 17:19:34 +0100 Subject: [PATCH 207/284] usb: dwc3: fix clock issue during resume in OTG mode Commit fe8abf332b8f ("usb: dwc3: support clocks and resets for DWC3 core") introduced clock support and a new function named dwc3_core_init_for_resume() which enables the clock before calling dwc3_core_init() during resume as clocks get disabled during suspend. Unfortunately in this commit the DWC3_GCTL_PRTCAP_OTG case was forgotten and therefore during resume, a platform could call dwc3_core_init() without re-enabling the clocks first, preventing to resume properly. So update the resume path to call dwc3_core_init_for_resume() as it should. Fixes: fe8abf332b8f ("usb: dwc3: support clocks and resets for DWC3 core") Cc: stable@vger.kernel.org Signed-off-by: Gary Bisson Link: https://lore.kernel.org/r/20210125161934.527820-1-gary.bisson@boundarydevices.com Signed-off-by: Greg Kroah-Hartman --- drivers/usb/dwc3/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c index 841daec70b6e..3101f0dcf6ae 100644 --- a/drivers/usb/dwc3/core.c +++ b/drivers/usb/dwc3/core.c @@ -1758,7 +1758,7 @@ static int dwc3_resume_common(struct dwc3 *dwc, pm_message_t msg) if (PMSG_IS_AUTO(msg)) break; - ret = dwc3_core_init(dwc); + ret = dwc3_core_init_for_resume(dwc); if (ret) return ret; From f670e9f9c8cac716c3506c6bac9e997b27ad441a Mon Sep 17 00:00:00 2001 From: Heiko Stuebner Date: Wed, 27 Jan 2021 11:39:19 +0100 Subject: [PATCH 208/284] usb: dwc2: Fix endpoint direction check in ep_from_windex dwc2_hsotg_process_req_status uses ep_from_windex() to retrieve the endpoint for the index provided in the wIndex request param. In a test-case with a rndis gadget running and sending a malformed packet to it like: dev.ctrl_transfer( 0x82, # bmRequestType 0x00, # bRequest 0x0000, # wValue 0x0001, # wIndex 0x00 # wLength ) it is possible to cause a crash: [ 217.533022] dwc2 ff300000.usb: dwc2_hsotg_process_req_status: USB_REQ_GET_STATUS [ 217.559003] Unable to handle kernel read from unreadable memory at virtual address 0000000000000088 ... [ 218.313189] Call trace: [ 218.330217] ep_from_windex+0x3c/0x54 [ 218.348565] usb_gadget_giveback_request+0x10/0x20 [ 218.368056] dwc2_hsotg_complete_request+0x144/0x184 This happens because ep_from_windex wants to compare the endpoint direction even if index_to_ep() didn't return an endpoint due to the direction not matching. The fix is easy insofar that the actual direction check is already happening when calling index_to_ep() which will return NULL if there is no endpoint for the targeted direction, so the offending check can go away completely. Fixes: c6f5c050e2a7 ("usb: dwc2: gadget: add bi-directional endpoint support") Cc: stable@vger.kernel.org Reported-by: Gerhard Klostermeier Signed-off-by: Heiko Stuebner Link: https://lore.kernel.org/r/20210127103919.58215-1-heiko@sntech.de Signed-off-by: Greg Kroah-Hartman --- drivers/usb/dwc2/gadget.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/usb/dwc2/gadget.c b/drivers/usb/dwc2/gadget.c index 0a0d11151cfb..ad4c94366dad 100644 --- a/drivers/usb/dwc2/gadget.c +++ b/drivers/usb/dwc2/gadget.c @@ -1543,7 +1543,6 @@ static void dwc2_hsotg_complete_oursetup(struct usb_ep *ep, static struct dwc2_hsotg_ep *ep_from_windex(struct dwc2_hsotg *hsotg, u32 windex) { - struct dwc2_hsotg_ep *ep; int dir = (windex & USB_DIR_IN) ? 1 : 0; int idx = windex & 0x7F; @@ -1553,12 +1552,7 @@ static struct dwc2_hsotg_ep *ep_from_windex(struct dwc2_hsotg *hsotg, if (idx > hsotg->num_of_eps) return NULL; - ep = index_to_ep(hsotg, idx, dir); - - if (idx && ep->dir_in != dir) - return NULL; - - return ep; + return index_to_ep(hsotg, idx, dir); } /** From 9c698bff66ab4914bb3d71da7dc6112519bde23e Mon Sep 17 00:00:00 2001 From: Russell King Date: Fri, 29 Jan 2021 10:19:07 +0000 Subject: [PATCH 209/284] ARM: ensure the signal page contains defined contents Ensure that the signal page contains our poison instruction to increase the protection against ROP attacks and also contains well defined contents. Acked-by: Will Deacon Signed-off-by: Russell King --- arch/arm/kernel/signal.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c index 9d2e916121be..a3a38d0a4c85 100644 --- a/arch/arm/kernel/signal.c +++ b/arch/arm/kernel/signal.c @@ -693,18 +693,20 @@ struct page *get_signal_page(void) addr = page_address(page); + /* Poison the entire page */ + memset32(addr, __opcode_to_mem_arm(0xe7fddef1), + PAGE_SIZE / sizeof(u32)); + /* Give the signal return code some randomness */ offset = 0x200 + (get_random_int() & 0x7fc); signal_return_offset = offset; - /* - * Copy signal return handlers into the vector page, and - * set sigreturn to be a pointer to these. - */ + /* Copy signal return handlers into the page */ memcpy(addr + offset, sigreturn_codes, sizeof(sigreturn_codes)); - ptr = (unsigned long)addr + offset; - flush_icache_range(ptr, ptr + sizeof(sigreturn_codes)); + /* Flush out all instructions in this page */ + ptr = (unsigned long)addr; + flush_icache_range(ptr, ptr + PAGE_SIZE); return page; } From 4d62e81b60d4025e2dfcd5ea531cc1394ce9226f Mon Sep 17 00:00:00 2001 From: Russell King Date: Mon, 1 Feb 2021 19:40:01 +0000 Subject: [PATCH 210/284] ARM: kexec: fix oops after TLB are invalidated Giancarlo Ferrari reports the following oops while trying to use kexec: Unable to handle kernel paging request at virtual address 80112f38 pgd = fd7ef03e [80112f38] *pgd=0001141e(bad) Internal error: Oops: 80d [#1] PREEMPT SMP ARM ... This is caused by machine_kexec() trying to set the kernel text to be read/write, so it can poke values into the relocation code before copying it - and an interrupt occuring which changes the page tables. The subsequent writes then hit read-only sections that trigger a data abort resulting in the above oops. Fix this by copying the relocation code, and then writing the variables into the destination, thereby avoiding the need to make the kernel text read/write. Reported-by: Giancarlo Ferrari Tested-by: Giancarlo Ferrari Signed-off-by: Russell King --- arch/arm/include/asm/kexec-internal.h | 12 +++++++++ arch/arm/kernel/asm-offsets.c | 5 ++++ arch/arm/kernel/machine_kexec.c | 20 ++++++-------- arch/arm/kernel/relocate_kernel.S | 38 ++++++++------------------- 4 files changed, 36 insertions(+), 39 deletions(-) create mode 100644 arch/arm/include/asm/kexec-internal.h diff --git a/arch/arm/include/asm/kexec-internal.h b/arch/arm/include/asm/kexec-internal.h new file mode 100644 index 000000000000..ecc2322db7aa --- /dev/null +++ b/arch/arm/include/asm/kexec-internal.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ARM_KEXEC_INTERNAL_H +#define _ARM_KEXEC_INTERNAL_H + +struct kexec_relocate_data { + unsigned long kexec_start_address; + unsigned long kexec_indirection_page; + unsigned long kexec_mach_type; + unsigned long kexec_r2; +}; + +#endif diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c index a1570c8bab25..be8050b0c3df 100644 --- a/arch/arm/kernel/asm-offsets.c +++ b/arch/arm/kernel/asm-offsets.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -170,5 +171,9 @@ int main(void) DEFINE(MPU_RGN_PRBAR, offsetof(struct mpu_rgn, prbar)); DEFINE(MPU_RGN_PRLAR, offsetof(struct mpu_rgn, prlar)); #endif + DEFINE(KEXEC_START_ADDR, offsetof(struct kexec_relocate_data, kexec_start_address)); + DEFINE(KEXEC_INDIR_PAGE, offsetof(struct kexec_relocate_data, kexec_indirection_page)); + DEFINE(KEXEC_MACH_TYPE, offsetof(struct kexec_relocate_data, kexec_mach_type)); + DEFINE(KEXEC_R2, offsetof(struct kexec_relocate_data, kexec_r2)); return 0; } diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c index 5d84ad333f05..2b09dad7935e 100644 --- a/arch/arm/kernel/machine_kexec.c +++ b/arch/arm/kernel/machine_kexec.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -22,11 +23,6 @@ extern void relocate_new_kernel(void); extern const unsigned int relocate_new_kernel_size; -extern unsigned long kexec_start_address; -extern unsigned long kexec_indirection_page; -extern unsigned long kexec_mach_type; -extern unsigned long kexec_boot_atags; - static atomic_t waiting_for_crash_ipi; /* @@ -159,6 +155,7 @@ void (*kexec_reinit)(void); void machine_kexec(struct kimage *image) { unsigned long page_list, reboot_entry_phys; + struct kexec_relocate_data *data; void (*reboot_entry)(void); void *reboot_code_buffer; @@ -174,18 +171,17 @@ void machine_kexec(struct kimage *image) reboot_code_buffer = page_address(image->control_code_page); - /* Prepare parameters for reboot_code_buffer*/ - set_kernel_text_rw(); - kexec_start_address = image->start; - kexec_indirection_page = page_list; - kexec_mach_type = machine_arch_type; - kexec_boot_atags = image->arch.kernel_r2; - /* copy our kernel relocation code to the control code page */ reboot_entry = fncpy(reboot_code_buffer, &relocate_new_kernel, relocate_new_kernel_size); + data = reboot_code_buffer + relocate_new_kernel_size; + data->kexec_start_address = image->start; + data->kexec_indirection_page = page_list; + data->kexec_mach_type = machine_arch_type; + data->kexec_r2 = image->arch.kernel_r2; + /* get the identity mapping physical address for the reboot code */ reboot_entry_phys = virt_to_idmap(reboot_entry); diff --git a/arch/arm/kernel/relocate_kernel.S b/arch/arm/kernel/relocate_kernel.S index 72a08786e16e..218d524360fc 100644 --- a/arch/arm/kernel/relocate_kernel.S +++ b/arch/arm/kernel/relocate_kernel.S @@ -5,14 +5,16 @@ #include #include +#include #include .align 3 /* not needed for this code, but keeps fncpy() happy */ ENTRY(relocate_new_kernel) - ldr r0,kexec_indirection_page - ldr r1,kexec_start_address + adr r7, relocate_new_kernel_end + ldr r0, [r7, #KEXEC_INDIR_PAGE] + ldr r1, [r7, #KEXEC_START_ADDR] /* * If there is no indirection page (we are doing crashdumps) @@ -57,34 +59,16 @@ ENTRY(relocate_new_kernel) 2: /* Jump to relocated kernel */ - mov lr,r1 - mov r0,#0 - ldr r1,kexec_mach_type - ldr r2,kexec_boot_atags - ARM( ret lr ) - THUMB( bx lr ) - - .align - - .globl kexec_start_address -kexec_start_address: - .long 0x0 - - .globl kexec_indirection_page -kexec_indirection_page: - .long 0x0 - - .globl kexec_mach_type -kexec_mach_type: - .long 0x0 - - /* phy addr of the atags for the new kernel */ - .globl kexec_boot_atags -kexec_boot_atags: - .long 0x0 + mov lr, r1 + mov r0, #0 + ldr r1, [r7, #KEXEC_MACH_TYPE] + ldr r2, [r7, #KEXEC_R2] + ARM( ret lr ) + THUMB( bx lr ) ENDPROC(relocate_new_kernel) + .align 3 relocate_new_kernel_end: .globl relocate_new_kernel_size From 9f5f8ec50165630cfc49897410b30997d4d677b5 Mon Sep 17 00:00:00 2001 From: Barry Song Date: Sat, 6 Feb 2021 00:33:24 +1300 Subject: [PATCH 211/284] dma-mapping: benchmark: use u8 for reserved field in uAPI structure The original code put five u32 before a u64 expansion[10] array. Five is odd, this will cause trouble in the extension of the structure by adding new features. This patch moves to use u8 for reserved field to avoid future alignment risk. Meanwhile, it also clears the memory of struct map_benchmark in tools, otherwise, if users use old version to run on newer kernel, the random expansion value will cause side effect on newer kernel. Signed-off-by: Barry Song Signed-off-by: Christoph Hellwig --- kernel/dma/map_benchmark.c | 2 +- tools/testing/selftests/dma/dma_map_benchmark.c | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c index 1b1b8ff875cb..da95df381483 100644 --- a/kernel/dma/map_benchmark.c +++ b/kernel/dma/map_benchmark.c @@ -36,7 +36,7 @@ struct map_benchmark { __s32 node; /* which numa node this benchmark will run on */ __u32 dma_bits; /* DMA addressing capability */ __u32 dma_dir; /* DMA data direction */ - __u64 expansion[10]; /* For future use */ + __u8 expansion[84]; /* For future use */ }; struct map_benchmark_data { diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c b/tools/testing/selftests/dma/dma_map_benchmark.c index 7065163a8388..537d65968c48 100644 --- a/tools/testing/selftests/dma/dma_map_benchmark.c +++ b/tools/testing/selftests/dma/dma_map_benchmark.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -35,7 +36,7 @@ struct map_benchmark { __s32 node; /* which numa node this benchmark will run on */ __u32 dma_bits; /* DMA addressing capability */ __u32 dma_dir; /* DMA data direction */ - __u64 expansion[10]; /* For future use */ + __u8 expansion[84]; /* For future use */ }; int main(int argc, char **argv) @@ -102,6 +103,7 @@ int main(int argc, char **argv) exit(1); } + memset(&map, 0, sizeof(map)); map.seconds = seconds; map.threads = threads; map.node = node; From 91792bb8089b63b7b780251eb83939348ac58a64 Mon Sep 17 00:00:00 2001 From: Pavel Shilovsky Date: Tue, 2 Feb 2021 22:34:32 -0600 Subject: [PATCH 212/284] smb3: fix crediting for compounding when only one request in flight Currently we try to guess if a compound request is going to succeed waiting for credits or not based on the number of requests in flight. This approach doesn't work correctly all the time because there may be only one request in flight which is going to bring multiple credits satisfying the compound request. Change the behavior to fail a request only if there are no requests in flight at all and proceed waiting for credits otherwise. Cc: # 5.1+ Signed-off-by: Pavel Shilovsky Reviewed-by: Tom Talpey Reviewed-by: Shyam Prasad N Signed-off-by: Steve French --- fs/cifs/transport.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index 95ef26b555b9..4a2b836eb017 100644 --- a/fs/cifs/transport.c +++ b/fs/cifs/transport.c @@ -666,10 +666,22 @@ wait_for_compound_request(struct TCP_Server_Info *server, int num, if (*credits < num) { /* - * Return immediately if not too many requests in flight since - * we will likely be stuck on waiting for credits. + * If the server is tight on resources or just gives us less + * credits for other reasons (e.g. requests are coming out of + * order and the server delays granting more credits until it + * processes a missing mid) and we exhausted most available + * credits there may be situations when we try to send + * a compound request but we don't have enough credits. At this + * point the client needs to decide if it should wait for + * additional credits or fail the request. If at least one + * request is in flight there is a high probability that the + * server will return enough credits to satisfy this compound + * request. + * + * Return immediately if no requests in flight since we will be + * stuck on waiting for credits. */ - if (server->in_flight < num - *credits) { + if (server->in_flight == 0) { spin_unlock(&server->req_lock); trace_smb3_insufficient_credits(server->CurrentMid, server->hostname, scredits, sin_flight); From b35ccebe3ef76168aa2edaa35809c0232cb3578e Mon Sep 17 00:00:00 2001 From: Eli Cohen Date: Thu, 4 Feb 2021 09:36:18 +0200 Subject: [PATCH 213/284] vdpa/mlx5: Restore the hardware used index after change map When a change of memory map occurs, the hardware resources are destroyed and then re-created again with the new memory map. In such case, we need to restore the hardware available and used indices. The driver failed to restore the used index which is added here. Also, since the driver also fails to reset the available and used indices upon device reset, fix this here to avoid regression caused by the fact that used index may not be zero upon device reset. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen Link: https://lore.kernel.org/r/20210204073618.36336-1-elic@nvidia.com Signed-off-by: Michael S. Tsirkin Acked-by: Jason Wang --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 88dde3455bfd..b5fe6d2ad22f 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info { u64 device_addr; u64 driver_addr; u16 avail_index; + u16 used_index; bool ready; struct vdpa_callback cb; bool restore; @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue { u32 virtq_id; struct mlx5_vdpa_net *ndev; u16 avail_idx; + u16 used_idx; int fw_state; /* keep last in the struct */ @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context); MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx); + MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx); MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3, get_features_12_3(ndev->mvdev.actual_features)); vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context); @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m struct mlx5_virtq_attr { u8 state; u16 available_index; + u16 used_index; }; static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu memset(attr, 0, sizeof(*attr)); attr->state = MLX5_GET(virtio_net_q_object, obj_context, state); attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index); + attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, hw_used_index); kfree(out); return 0; @@ -1535,6 +1540,16 @@ static void teardown_virtqueues(struct mlx5_vdpa_net *ndev) } } +static void clear_virtqueues(struct mlx5_vdpa_net *ndev) +{ + int i; + + for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) { + ndev->vqs[i].avail_idx = 0; + ndev->vqs[i].used_idx = 0; + } +} + /* TODO: cross-endian support */ static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev) { @@ -1610,6 +1625,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu return err; ri->avail_index = attr.available_index; + ri->used_index = attr.used_index; ri->ready = mvq->ready; ri->num_ent = mvq->num_ent; ri->desc_addr = mvq->desc_addr; @@ -1654,6 +1670,7 @@ static void restore_channels_info(struct mlx5_vdpa_net *ndev) continue; mvq->avail_idx = ri->avail_index; + mvq->used_idx = ri->used_index; mvq->ready = ri->ready; mvq->num_ent = ri->num_ent; mvq->desc_addr = ri->desc_addr; @@ -1768,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if (!status) { mlx5_vdpa_info(mvdev, "performing device reset\n"); teardown_driver(ndev); + clear_virtqueues(ndev); mlx5_vdpa_destroy_mr(&ndev->mvdev); ndev->mvdev.status = 0; ndev->mvdev.mlx_features = 0; From 24c242ec7abb3d21fa0b1da6bb251521dc1717b5 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Mon, 25 Jan 2021 15:30:39 +0100 Subject: [PATCH 214/284] ntp: Use freezable workqueue for RTC synchronization The bug fixed by commit e3fab2f3de081e98 ("ntp: Fix RTC synchronization on 32-bit platforms") revealed an underlying issue: RTC synchronization may happen anytime, even while the system is partially suspended. On systems where the RTC is connected to an I2C bus, the I2C bus controller may already or still be suspended, triggering a WARNING during suspend or resume from s2ram: WARNING: CPU: 0 PID: 124 at drivers/i2c/i2c-core.h:54 __i2c_transfer+0x634/0x680 i2c i2c-6: Transfer while suspended [...] Workqueue: events_power_efficient sync_hw_clock [...] (__i2c_transfer) (i2c_transfer) (regmap_i2c_read) ... (da9063_rtc_set_time) (rtc_set_time) (sync_hw_clock) (process_one_work) Fix this race condition by using the freezable instead of the normal power-efficient workqueue. Signed-off-by: Geert Uytterhoeven Signed-off-by: Thomas Gleixner Acked-by: Rafael J. Wysocki Link: https://lore.kernel.org/r/20210125143039.1051912-1-geert+renesas@glider.be --- kernel/time/ntp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 87389b9e21ab..5247afd7f345 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -502,7 +502,7 @@ static struct hrtimer sync_hrtimer; static enum hrtimer_restart sync_timer_callback(struct hrtimer *timer) { - queue_work(system_power_efficient_wq, &sync_work); + queue_work(system_freezable_power_efficient_wq, &sync_work); return HRTIMER_NORESTART; } @@ -668,7 +668,7 @@ void ntp_notify_cmos_timer(void) * just a pointless work scheduled. */ if (ntp_synced() && !hrtimer_is_queued(&sync_hrtimer)) - queue_work(system_power_efficient_wq, &sync_work); + queue_work(system_freezable_power_efficient_wq, &sync_work); } static void __init ntp_init_cmos_sync(void) From 585fc0d2871c9318c949fbf45b1f081edd489e96 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Thu, 4 Feb 2021 18:32:03 -0800 Subject: [PATCH 215/284] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page If a new hugetlb page is allocated during fallocate it will not be marked as active (set_page_huge_active) which will result in a later isolate_huge_page failure when the page migration code would like to move that page. Such a failure would be unexpected and wrong. Only export set_page_huge_active, just leave clear_page_huge_active as static. Because there are no external users. Link: https://lkml.kernel.org/r/20210115124942.46403-3-songmuchun@bytedance.com Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate()) Signed-off-by: Muchun Song Acked-by: Michal Hocko Reviewed-by: Mike Kravetz Reviewed-by: Oscar Salvador Cc: David Hildenbrand Cc: Yang Shi Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/hugetlbfs/inode.c | 3 ++- include/linux/hugetlb.h | 2 ++ mm/hugetlb.c | 2 +- 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index b5c109703daa..21c20fd5f9ee 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -735,9 +735,10 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, mutex_unlock(&hugetlb_fault_mutex_table[hash]); + set_page_huge_active(page); /* * unlock_page because locked by add_to_page_cache() - * page_put due to reference from alloc_huge_page() + * put_page() due to reference from alloc_huge_page() */ unlock_page(page); put_page(page); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ebca2ef02212..b5807f23caf8 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -770,6 +770,8 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, } #endif +void set_page_huge_active(struct page *page); + #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 18f6ee317900..6f0e242d38ca 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1349,7 +1349,7 @@ bool page_huge_active(struct page *page) } /* never called for tail page */ -static void set_page_huge_active(struct page *page) +void set_page_huge_active(struct page *page) { VM_BUG_ON_PAGE(!PageHeadHuge(page), page); SetPagePrivate(&page[1]); From 7ffddd499ba6122b1a07828f023d1d67629aa017 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Thu, 4 Feb 2021 18:32:06 -0800 Subject: [PATCH 216/284] mm: hugetlb: fix a race between freeing and dissolving the page There is a race condition between __free_huge_page() and dissolve_free_huge_page(). CPU0: CPU1: // page_count(page) == 1 put_page(page) __free_huge_page(page) dissolve_free_huge_page(page) spin_lock(&hugetlb_lock) // PageHuge(page) && !page_count(page) update_and_free_page(page) // page is freed to the buddy spin_unlock(&hugetlb_lock) spin_lock(&hugetlb_lock) clear_page_huge_active(page) enqueue_huge_page(page) // It is wrong, the page is already freed spin_unlock(&hugetlb_lock) The race window is between put_page() and dissolve_free_huge_page(). We should make sure that the page is already on the free list when it is dissolved. As a result __free_huge_page would corrupt page(s) already in the buddy allocator. Link: https://lkml.kernel.org/r/20210115124942.46403-4-songmuchun@bytedance.com Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Reviewed-by: Oscar Salvador Acked-by: Michal Hocko Cc: David Hildenbrand Cc: Yang Shi Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/hugetlb.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6f0e242d38ca..c6ee3c28a04e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock); static int num_fault_mutexes; struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp; +static inline bool PageHugeFreed(struct page *head) +{ + return page_private(head + 4) == -1UL; +} + +static inline void SetPageHugeFreed(struct page *head) +{ + set_page_private(head + 4, -1UL); +} + +static inline void ClearPageHugeFreed(struct page *head) +{ + set_page_private(head + 4, 0); +} + /* Forward declaration */ static int hugetlb_acct_memory(struct hstate *h, long delta); @@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hstate *h, struct page *page) list_move(&page->lru, &h->hugepage_freelists[nid]); h->free_huge_pages++; h->free_huge_pages_node[nid]++; + SetPageHugeFreed(page); } static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid) @@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid) list_move(&page->lru, &h->hugepage_activelist); set_page_refcounted(page); + ClearPageHugeFreed(page); h->free_huge_pages--; h->free_huge_pages_node[nid]--; return page; @@ -1505,6 +1522,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) spin_lock(&hugetlb_lock); h->nr_huge_pages++; h->nr_huge_pages_node[nid]++; + ClearPageHugeFreed(page); spin_unlock(&hugetlb_lock); } @@ -1755,6 +1773,7 @@ int dissolve_free_huge_page(struct page *page) { int rc = -EBUSY; +retry: /* Not to disrupt normal path by vainly holding hugetlb_lock */ if (!PageHuge(page)) return 0; @@ -1771,6 +1790,26 @@ int dissolve_free_huge_page(struct page *page) int nid = page_to_nid(head); if (h->free_huge_pages - h->resv_huge_pages == 0) goto out; + + /* + * We should make sure that the page is already on the free list + * when it is dissolved. + */ + if (unlikely(!PageHugeFreed(head))) { + spin_unlock(&hugetlb_lock); + cond_resched(); + + /* + * Theoretically, we should return -EBUSY when we + * encounter this race. In fact, we have a chance + * to successfully dissolve the page if we do a + * retry. Because the race window is quite small. + * If we seize this opportunity, it is an optimization + * for increasing the success rate of dissolving page. + */ + goto retry; + } + /* * Move PageHWPoison flag from head page to the raw error page, * which makes any subpages rather than the error page reusable. From 0eb2df2b5629794020f75e94655e1994af63f0d4 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Thu, 4 Feb 2021 18:32:10 -0800 Subject: [PATCH 217/284] mm: hugetlb: fix a race between isolating and freeing page There is a race between isolate_huge_page() and __free_huge_page(). CPU0: CPU1: if (PageHuge(page)) put_page(page) __free_huge_page(page) spin_lock(&hugetlb_lock) update_and_free_page(page) set_compound_page_dtor(page, NULL_COMPOUND_DTOR) spin_unlock(&hugetlb_lock) isolate_huge_page(page) // trigger BUG_ON VM_BUG_ON_PAGE(!PageHead(page), page) spin_lock(&hugetlb_lock) page_huge_active(page) // trigger BUG_ON VM_BUG_ON_PAGE(!PageHuge(page), page) spin_unlock(&hugetlb_lock) When we isolate a HugeTLB page on CPU0. Meanwhile, we free it to the buddy allocator on CPU1. Then, we can trigger a BUG_ON on CPU0, because it is already freed to the buddy allocator. Link: https://lkml.kernel.org/r/20210115124942.46403-5-songmuchun@bytedance.com Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Acked-by: Michal Hocko Reviewed-by: Oscar Salvador Cc: David Hildenbrand Cc: Yang Shi Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/hugetlb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c6ee3c28a04e..90e03c3000c5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5594,9 +5594,9 @@ bool isolate_huge_page(struct page *page, struct list_head *list) { bool ret = true; - VM_BUG_ON_PAGE(!PageHead(page), page); spin_lock(&hugetlb_lock); - if (!page_huge_active(page) || !get_page_unless_zero(page)) { + if (!PageHeadHuge(page) || !page_huge_active(page) || + !get_page_unless_zero(page)) { ret = false; goto unlock; } From ecbf4724e6061b4b01be20f6d797d64d462b2bc8 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Thu, 4 Feb 2021 18:32:13 -0800 Subject: [PATCH 218/284] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active The page_huge_active() can be called from scan_movable_pages() which do not hold a reference count to the HugeTLB page. So when we call page_huge_active() from scan_movable_pages(), the HugeTLB page can be freed parallel. Then we will trigger a BUG_ON which is in the page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the VM_BUG_ON_PAGE. Link: https://lkml.kernel.org/r/20210115124942.46403-6-songmuchun@bytedance.com Fixes: 7e1f049efb86 ("mm: hugetlb: cleanup using paeg_huge_active()") Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Acked-by: Michal Hocko Reviewed-by: Oscar Salvador Cc: David Hildenbrand Cc: Yang Shi Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/hugetlb.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 90e03c3000c5..d78111f0fa2c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1361,8 +1361,7 @@ struct hstate *size_to_hstate(unsigned long size) */ bool page_huge_active(struct page *page) { - VM_BUG_ON_PAGE(!PageHuge(page), page); - return PageHead(page) && PagePrivate(&page[1]); + return PageHeadHuge(page) && PagePrivate(&page[1]); } /* never called for tail page */ From 71a64f618be9594cd0645105c0989855c0f86d90 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Thu, 4 Feb 2021 18:32:17 -0800 Subject: [PATCH 219/284] mm: migrate: do not migrate HugeTLB page whose refcount is one All pages isolated for the migration have an elevated reference count and therefore seeing a reference count equal to 1 means that the last user of the page has dropped the reference and the page has became unused and there doesn't make much sense to migrate it anymore. This has been done for regular pages and this patch does the same for hugetlb pages. Although the likelihood of the race is rather small for hugetlb pages it makes sense the two code paths in sync. Link: https://lkml.kernel.org/r/20210115124942.46403-2-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Acked-by: Yang Shi Acked-by: Michal Hocko Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/migrate.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/mm/migrate.c b/mm/migrate.c index c0efe921bca5..20ca887ea769 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1280,6 +1280,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, return -ENOSYS; } + if (page_count(hpage) == 1) { + /* page was freed from under us. So we are done. */ + putback_active_hugepage(hpage); + return MIGRATEPAGE_SUCCESS; + } + new_hpage = get_new_page(hpage, private); if (!new_hpage) return -ENOMEM; From 74e21484e40bb8ce0f9828bbfe1c9fc9b04249c6 Mon Sep 17 00:00:00 2001 From: Rokudo Yan Date: Thu, 4 Feb 2021 18:32:20 -0800 Subject: [PATCH 220/284] mm, compaction: move high_pfn to the for loop scope In fast_isolate_freepages, high_pfn will be used if a prefered one (ie PFN >= low_fn) not found. But the high_pfn is not reset before searching an free area, so when it was used as freepage, it may from another free area searched before. As a result move_freelist_head(freelist, freepage) will have unexpected behavior (eg corrupt the MOVABLE freelist) Unable to handle kernel paging request at virtual address dead000000000200 Mem abort info: ESR = 0x96000044 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000044 CM = 0, WnR = 1 [dead000000000200] address between user and kernel address ranges -000|list_cut_before(inline) -000|move_freelist_head(inline) -000|fast_isolate_freepages(inline) -000|isolate_freepages(inline) -000|compaction_alloc(?, ?) -001|unmap_and_move(inline) -001|migrate_pages([NSD:0xFFFFFF80088CBBD0] from = 0xFFFFFF80088CBD88, [NSD:0xFFFFFF80088CBBC8] get_new_p -002|__read_once_size(inline) -002|static_key_count(inline) -002|static_key_false(inline) -002|trace_mm_compaction_migratepages(inline) -002|compact_zone(?, [NSD:0xFFFFFF80088CBCB0] capc = 0x0) -003|kcompactd_do_work(inline) -003|kcompactd([X19] p = 0xFFFFFF93227FBC40) -004|kthread([X20] _create = 0xFFFFFFE1AFB26380) -005|ret_from_fork(asm) The issue was reported on an smart phone product with 6GB ram and 3GB zram as swap device. This patch fixes the issue by reset high_pfn before searching each free area, which ensure freepage and freelist match when call move_freelist_head in fast_isolate_freepages(). Link: http://lkml.kernel.org/r/20190118175136.31341-12-mgorman@techsingularity.net Link: https://lkml.kernel.org/r/20210112094720.1238444-1-wu-yan@tcl.com Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target") Signed-off-by: Rokudo Yan Acked-by: Mel Gorman Acked-by: Vlastimil Babka Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/compaction.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/compaction.c b/mm/compaction.c index e5acb9714436..190ccdaa6c19 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1342,7 +1342,7 @@ fast_isolate_freepages(struct compact_control *cc) { unsigned int limit = min(1U, freelist_scan_limit(cc) >> 1); unsigned int nr_scanned = 0; - unsigned long low_pfn, min_pfn, high_pfn = 0, highest = 0; + unsigned long low_pfn, min_pfn, highest = 0; unsigned long nr_isolated = 0; unsigned long distance; struct page *page = NULL; @@ -1387,6 +1387,7 @@ fast_isolate_freepages(struct compact_control *cc) struct page *freepage; unsigned long flags; unsigned int order_scanned = 0; + unsigned long high_pfn = 0; if (!area->nr_free) continue; From 4f6ec8602341e97b364e4e0d41a1ed08148f5e98 Mon Sep 17 00:00:00 2001 From: Rick Edgecombe Date: Thu, 4 Feb 2021 18:32:24 -0800 Subject: [PATCH 221/284] mm/vmalloc: separate put pages and flush VM flags When VM_MAP_PUT_PAGES was added, it was defined with the same value as VM_FLUSH_RESET_PERMS. This doesn't seem like it will cause any big functional problems other than some excess flushing for VM_MAP_PUT_PAGES allocations. Redefine VM_MAP_PUT_PAGES to have its own value. Also, rearrange things so flags are less likely to be missed in the future. Link: https://lkml.kernel.org/r/20210122233706.9304-1-rick.p.edgecombe@intel.com Fixes: b944afc9d64d ("mm: add a VM_MAP_PUT_PAGES flag for vmap") Signed-off-by: Rick Edgecombe Suggested-by: Matthew Wilcox Cc: Miaohe Lin Cc: Christoph Hellwig Cc: Daniel Axtens Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/vmalloc.h | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 80c0181c411d..cedcda6593f6 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -24,7 +24,8 @@ struct notifier_block; /* in notifier.h */ #define VM_UNINITIALIZED 0x00000020 /* vm_struct is not fully initialized */ #define VM_NO_GUARD 0x00000040 /* don't add guard page */ #define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */ -#define VM_MAP_PUT_PAGES 0x00000100 /* put pages and free array in vfree */ +#define VM_FLUSH_RESET_PERMS 0x00000100 /* reset direct map and flush TLB on unmap, can't be freed in atomic context */ +#define VM_MAP_PUT_PAGES 0x00000200 /* put pages and free array in vfree */ /* * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC. @@ -37,12 +38,6 @@ struct notifier_block; /* in notifier.h */ * determine which allocations need the module shadow freed. */ -/* - * Memory with VM_FLUSH_RESET_PERMS cannot be freed in an interrupt or with - * vfree_atomic(). - */ -#define VM_FLUSH_RESET_PERMS 0x00000100 /* Reset direct map and flush TLB on unmap */ - /* bits [20..32] reserved for arch specific ioremap internals */ /* From 55b6f763d8bcb5546997933105d66d3e6b080e6a Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Thu, 4 Feb 2021 18:32:28 -0800 Subject: [PATCH 222/284] init/gcov: allow CONFIG_CONSTRUCTORS on UML to fix module gcov On ARCH=um, loading a module doesn't result in its constructors getting called, which breaks module gcov since the debugfs files are never registered. On the other hand, in-kernel constructors have already been called by the dynamic linker, so we can't call them again. Get out of this conundrum by allowing CONFIG_CONSTRUCTORS to be selected, but avoiding the in-kernel constructor calls. Also remove the "if !UML" from GCOV selecting CONSTRUCTORS now, since we really do want CONSTRUCTORS, just not kernel binary ones. Link: https://lkml.kernel.org/r/20210120172041.c246a2cac2fb.I1358f584b76f1898373adfed77f4462c8705b736@changeid Signed-off-by: Johannes Berg Reviewed-by: Peter Oberparleiter Cc: Arnd Bergmann Cc: Jessica Yu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/Kconfig | 1 - init/main.c | 8 +++++++- kernel/gcov/Kconfig | 2 +- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..29ad68325028 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -76,7 +76,6 @@ config CC_HAS_ASM_INLINE config CONSTRUCTORS bool - depends on !UML config IRQ_WORK bool diff --git a/init/main.c b/init/main.c index c68d784376ca..a626e78dbf06 100644 --- a/init/main.c +++ b/init/main.c @@ -1066,7 +1066,13 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void) /* Call all constructor functions linked into the kernel. */ static void __init do_ctors(void) { -#ifdef CONFIG_CONSTRUCTORS +/* + * For UML, the constructors have already been called by the + * normal setup code as it's just a normal ELF binary, so we + * cannot do it again - but we do need CONFIG_CONSTRUCTORS + * even on UML for modules. + */ +#if defined(CONFIG_CONSTRUCTORS) && !defined(CONFIG_UML) ctor_fn_t *fn = (ctor_fn_t *) __ctors_start; for (; fn < (ctor_fn_t *) __ctors_end; fn++) diff --git a/kernel/gcov/Kconfig b/kernel/gcov/Kconfig index 3110c77230c7..f62de2dea8a3 100644 --- a/kernel/gcov/Kconfig +++ b/kernel/gcov/Kconfig @@ -4,7 +4,7 @@ menu "GCOV-based kernel profiling" config GCOV_KERNEL bool "Enable gcov-based kernel profiling" depends on DEBUG_FS - select CONSTRUCTORS if !UML + select CONSTRUCTORS default n help This option enables gcov-based code profiling (e.g. for code coverage From 1c2f67308af4c102b4e1e6cd6f69819ae59408e0 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Thu, 4 Feb 2021 18:32:31 -0800 Subject: [PATCH 223/284] mm: thp: fix MADV_REMOVE deadlock on shmem THP Sergey reported deadlock between kswapd correctly doing its usual lock_page(page) followed by down_read(page->mapping->i_mmap_rwsem), and madvise(MADV_REMOVE) on an madvise(MADV_HUGEPAGE) area doing down_write(page->mapping->i_mmap_rwsem) followed by lock_page(page). This happened when shmem_fallocate(punch hole)'s unmap_mapping_range() reaches zap_pmd_range()'s call to __split_huge_pmd(). The same deadlock could occur when partially truncating a mapped huge tmpfs file, or using fallocate(FALLOC_FL_PUNCH_HOLE) on it. __split_huge_pmd()'s page lock was added in 5.8, to make sure that any concurrent use of reuse_swap_page() (holding page lock) could not catch the anon THP's mapcounts and swapcounts while they were being split. Fortunately, reuse_swap_page() is never applied to a shmem or file THP (not even by khugepaged, which checks PageSwapCache before calling), and anonymous THPs are never created in shmem or file areas: so that __split_huge_pmd()'s page lock can only be necessary for anonymous THPs, on which there is no risk of deadlock with i_mmap_rwsem. Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2101161409470.2022@eggly.anvils Fixes: c444eb564fb1 ("mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()") Signed-off-by: Hugh Dickins Reported-by: Sergey Senozhatsky Reviewed-by: Andrea Arcangeli Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/huge_memory.c | 37 +++++++++++++++++++++++-------------- 1 file changed, 23 insertions(+), 14 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9237976abe72..91ca9b103ee5 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2202,7 +2202,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, { spinlock_t *ptl; struct mmu_notifier_range range; - bool was_locked = false; + bool do_unlock_page = false; pmd_t _pmd; mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, @@ -2218,7 +2218,6 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(freeze && !page); if (page) { VM_WARN_ON_ONCE(!PageLocked(page)); - was_locked = true; if (page != pmd_page(*pmd)) goto out; } @@ -2227,19 +2226,29 @@ repeat: if (pmd_trans_huge(*pmd)) { if (!page) { page = pmd_page(*pmd); - if (unlikely(!trylock_page(page))) { - get_page(page); - _pmd = *pmd; - spin_unlock(ptl); - lock_page(page); - spin_lock(ptl); - if (unlikely(!pmd_same(*pmd, _pmd))) { - unlock_page(page); + /* + * An anonymous page must be locked, to ensure that a + * concurrent reuse_swap_page() sees stable mapcount; + * but reuse_swap_page() is not used on shmem or file, + * and page lock must not be taken when zap_pmd_range() + * calls __split_huge_pmd() while i_mmap_lock is held. + */ + if (PageAnon(page)) { + if (unlikely(!trylock_page(page))) { + get_page(page); + _pmd = *pmd; + spin_unlock(ptl); + lock_page(page); + spin_lock(ptl); + if (unlikely(!pmd_same(*pmd, _pmd))) { + unlock_page(page); + put_page(page); + page = NULL; + goto repeat; + } put_page(page); - page = NULL; - goto repeat; } - put_page(page); + do_unlock_page = true; } } if (PageMlocked(page)) @@ -2249,7 +2258,7 @@ repeat: __split_huge_pmd_locked(vma, pmd, range.start, freeze); out: spin_unlock(ptl); - if (!was_locked && page) + if (do_unlock_page) unlock_page(page); /* * No need to double call mmu_notifier->invalidate_range() callback. From 2dcb3964544177c51853a210b6ad400de78ef17d Mon Sep 17 00:00:00 2001 From: Roman Gushchin Date: Thu, 4 Feb 2021 18:32:36 -0800 Subject: [PATCH 224/284] memblock: do not start bottom-up allocations with kernel_end With kaslr the kernel image is placed at a random place, so starting the bottom-up allocation with the kernel_end can result in an allocation failure and a warning like this one: hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node ------------[ cut here ]------------ memblock: bottom-up allocation failed, memory hotremove may be affected WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 RIP: 0010:memblock_find_in_range_node+0x178/0x25a Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000 RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046 RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000 R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000 FS: 0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0 Call Trace: memblock_alloc_range_nid+0x8d/0x11e cma_declare_contiguous_nid+0x2c4/0x38c hugetlb_cma_reserve+0xdc/0x128 flush_tlb_one_kernel+0xc/0x20 native_set_fixmap+0x82/0xd0 flat_get_apic_id+0x5/0x10 register_lapic_address+0x8e/0x97 setup_arch+0x8a5/0xc3f start_kernel+0x66/0x547 load_ucode_bsp+0x4c/0xcd secondary_startup_64_no_verify+0xb0/0xbb random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0 ---[ end trace f151227d0b39be70 ]--- At the same time, the kernel image is protected with memblock_reserve(), so we can just start searching at PAGE_SIZE. In this case the bottom-up allocation has the same chances to success as a top-down allocation, so there is no reason to fallback in the case of a failure. All together it simplifies the logic. Link: https://lkml.kernel.org/r/20201217201214.3414100-2-guro@fb.com Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory") Signed-off-by: Roman Gushchin Reviewed-by: Mike Rapoport Cc: Joonsoo Kim Cc: Michal Hocko Cc: Rik van Riel Cc: Wonhyuk Yang Cc: Thiago Jung Bauermann Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memblock.c | 49 ++++++------------------------------------------- 1 file changed, 6 insertions(+), 43 deletions(-) diff --git a/mm/memblock.c b/mm/memblock.c index 1eaaec1e7687..8d9b5f1e7040 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -275,14 +275,6 @@ __memblock_find_range_top_down(phys_addr_t start, phys_addr_t end, * * Find @size free area aligned to @align in the specified range and node. * - * When allocation direction is bottom-up, the @start should be greater - * than the end of the kernel image. Otherwise, it will be trimmed. The - * reason is that we want the bottom-up allocation just near the kernel - * image so it is highly likely that the allocated memory and the kernel - * will reside in the same node. - * - * If bottom-up allocation failed, will try to allocate memory top-down. - * * Return: * Found address on success, 0 on failure. */ @@ -291,8 +283,6 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size, phys_addr_t end, int nid, enum memblock_flags flags) { - phys_addr_t kernel_end, ret; - /* pump up @end */ if (end == MEMBLOCK_ALLOC_ACCESSIBLE || end == MEMBLOCK_ALLOC_KASAN) @@ -301,40 +291,13 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size, /* avoid allocating the first page */ start = max_t(phys_addr_t, start, PAGE_SIZE); end = max(start, end); - kernel_end = __pa_symbol(_end); - /* - * try bottom-up allocation only when bottom-up mode - * is set and @end is above the kernel image. - */ - if (memblock_bottom_up() && end > kernel_end) { - phys_addr_t bottom_up_start; - - /* make sure we will allocate above the kernel */ - bottom_up_start = max(start, kernel_end); - - /* ok, try bottom-up allocation first */ - ret = __memblock_find_range_bottom_up(bottom_up_start, end, - size, align, nid, flags); - if (ret) - return ret; - - /* - * we always limit bottom-up allocation above the kernel, - * but top-down allocation doesn't have the limit, so - * retrying top-down allocation may succeed when bottom-up - * allocation failed. - * - * bottom-up allocation is expected to be fail very rarely, - * so we use WARN_ONCE() here to see the stack trace if - * fail happens. - */ - WARN_ONCE(IS_ENABLED(CONFIG_MEMORY_HOTREMOVE), - "memblock: bottom-up allocation failed, memory hotremove may be affected\n"); - } - - return __memblock_find_range_top_down(start, end, size, align, nid, - flags); + if (memblock_bottom_up()) + return __memblock_find_range_bottom_up(start, end, size, align, + nid, flags); + else + return __memblock_find_range_top_down(start, end, size, align, + nid, flags); } /** From 4c415b9a710b6ebce6517f6d4cdc5c4c31cfd7d9 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Thu, 4 Feb 2021 18:32:39 -0800 Subject: [PATCH 225/284] mailmap: fix name/email for Viresh Kumar For some of the patches the email id was misspelled to linaro.com instead of linaro.org and for others Viresh Kumar was written as "viresh kumar" (all small). Fix both with help of mailmap entries. Link: https://lkml.kernel.org/r/d6b80b210d7fe0ddc1d4d0b22eff9708c72ef8b3.1612178938.git.viresh.kumar@linaro.org Signed-off-by: Viresh Kumar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- .mailmap | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.mailmap b/.mailmap index cc4e91d3075e..d6580d267a1b 100644 --- a/.mailmap +++ b/.mailmap @@ -334,6 +334,8 @@ Vinod Koul Viresh Kumar Viresh Kumar Viresh Kumar +Viresh Kumar +Viresh Kumar Vivien Didelot Vlad Dogaru Vladimir Davydov From 9c41e526a56f2cf25816e58284f4a5f9c12ccef7 Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Thu, 4 Feb 2021 18:32:42 -0800 Subject: [PATCH 226/284] mailmap: add entries for Manivannan Sadhasivam Map my personal and work addresses to korg mail address. Link: https://lkml.kernel.org/r/20210201104640.108556-1-manivannan.sadhasivam@linaro.org Signed-off-by: Manivannan Sadhasivam Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- .mailmap | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.mailmap b/.mailmap index d6580d267a1b..4105b5f80f4c 100644 --- a/.mailmap +++ b/.mailmap @@ -199,6 +199,8 @@ Li Yang Li Yang Lukasz Luba Maciej W. Rozycki +Manivannan Sadhasivam +Manivannan Sadhasivam Marcin Nowakowski Marc Zyngier Mark Brown From da74240eb3fcd806edb1643874363e954d9e948b Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Thu, 4 Feb 2021 18:32:45 -0800 Subject: [PATCH 227/284] mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked() Commit 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API") introduced a bug in __add_to_page_cache_locked() causing the following splat: page dumped because: VM_BUG_ON_PAGE(page_memcg(page)) pages's memcg:ffff8889a4116000 ------------[ cut here ]------------ kernel BUG at mm/memcontrol.c:2924! invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 35 PID: 12345 Comm: cat Tainted: G S W I 5.11.0-rc4-debug+ #1 Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017 RIP: commit_charge+0xf4/0x130 Call Trace: mem_cgroup_charge+0x175/0x770 __add_to_page_cache_locked+0x712/0xad0 add_to_page_cache_lru+0xc5/0x1f0 cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles] __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache] __nfs_readpages_from_fscache+0x16d/0x630 [nfs] nfs_readpages+0x24e/0x540 [nfs] read_pages+0x5b1/0xc40 page_cache_ra_unbounded+0x460/0x750 generic_file_buffered_read_get_pages+0x290/0x1710 generic_file_buffered_read+0x2a9/0xc30 nfs_file_read+0x13f/0x230 [nfs] new_sync_read+0x3af/0x610 vfs_read+0x339/0x4b0 ksys_read+0xf1/0x1c0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Before that commit, there was a try_charge() and commit_charge() in __add_to_page_cache_locked(). These two separated charge functions were replaced by a single mem_cgroup_charge(). However, it forgot to add a matching mem_cgroup_uncharge() when the xarray insertion failed with the page released back to the pool. Fix this by adding a mem_cgroup_uncharge() call when insertion error happens. Link: https://lkml.kernel.org/r/20210125042441.20030-1-longman@redhat.com Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API") Signed-off-by: Waiman Long Reviewed-by: Alex Shi Acked-by: Johannes Weiner Cc: Matthew Wilcox Cc: Miaohe Lin Cc: Muchun Song Cc: Michal Hocko Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/filemap.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mm/filemap.c b/mm/filemap.c index 5c9d564317a5..aa0e0fb04670 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -835,6 +835,7 @@ noinline int __add_to_page_cache_locked(struct page *page, XA_STATE(xas, &mapping->i_pages, offset); int huge = PageHuge(page); int error; + bool charged = false; VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(PageSwapBacked(page), page); @@ -848,6 +849,7 @@ noinline int __add_to_page_cache_locked(struct page *page, error = mem_cgroup_charge(page, current->mm, gfp); if (error) goto error; + charged = true; } gfp &= GFP_RECLAIM_MASK; @@ -896,6 +898,8 @@ unlock: if (xas_error(&xas)) { error = xas_error(&xas); + if (charged) + mem_cgroup_uncharge(page); goto error; } From 49c6631d3b4f61a7b5bb0453a885a12bfa06ffd8 Mon Sep 17 00:00:00 2001 From: Vincenzo Frascino Date: Thu, 4 Feb 2021 18:32:49 -0800 Subject: [PATCH 228/284] kasan: add explicit preconditions to kasan_report() Patch series "kasan: Fix metadata detection for KASAN_HW_TAGS", v5. With the introduction of KASAN_HW_TAGS, kasan_report() currently assumes that every location in memory has valid metadata associated. This is due to the fact that addr_has_metadata() returns always true. As a consequence of this, an invalid address (e.g. NULL pointer address) passed to kasan_report() when KASAN_HW_TAGS is enabled, leads to a kernel panic. Example below, based on arm64: BUG: KASAN: invalid-access in 0x0 Read at addr 0000000000000000 by task swapper/0/1 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 ... Call trace: mte_get_mem_tag+0x24/0x40 kasan_report+0x1a4/0x410 alsa_sound_last_init+0x8c/0xa4 do_one_initcall+0x50/0x1b0 kernel_init_freeable+0x1d4/0x23c kernel_init+0x14/0x118 ret_from_fork+0x10/0x34 Code: d65f03c0 9000f021 f9428021 b6cfff61 (d9600000) ---[ end trace 377c8bb45bdd3a1a ]--- hrtimer: interrupt took 48694256 ns note: swapper/0[1] exited with preempt_count 1 Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b SMP: stopping secondary CPUs Kernel Offset: 0x35abaf140000 from 0xffff800010000000 PHYS_OFFSET: 0x40000000 CPU features: 0x0a7e0152,61c0a030 Memory Limit: none ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- This series fixes the behavior of addr_has_metadata() that now returns true only when the address is valid. This patch (of 2): With the introduction of KASAN_HW_TAGS, kasan_report() accesses the metadata only when addr_has_metadata() succeeds. Add a comment to make sure that the preconditions to the function are explicitly clarified. Link: https://lkml.kernel.org/r/20210126134409.47894-1-vincenzo.frascino@arm.com Link: https://lkml.kernel.org/r/20210126134409.47894-2-vincenzo.frascino@arm.com Signed-off-by: Vincenzo Frascino Reviewed-by: Andrey Konovalov Cc: Andrey Ryabinin Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: Leon Romanovsky Cc: Andrey Konovalov Cc: Catalin Marinas Cc: Will Deacon Cc: Mark Rutland Cc: "Paul E . McKenney" Cc: Naresh Kamboju Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kasan.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/include/linux/kasan.h b/include/linux/kasan.h index fe1ae73ff8b5..0aea9e2a2a01 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -333,6 +333,13 @@ static inline void *kasan_reset_tag(const void *addr) return (void *)arch_kasan_reset_tag(addr); } +/** + * kasan_report - print a report about a bad memory access detected by KASAN + * @addr: address of the bad access + * @size: size of the bad access + * @is_write: whether the bad access is a write or a read + * @ip: instruction pointer for the accessibility check or the bad access itself + */ bool kasan_report(unsigned long addr, size_t size, bool is_write, unsigned long ip); From b99acdcbfe3c8394ddd8b8d89d9bae2bbba4a459 Mon Sep 17 00:00:00 2001 From: Vincenzo Frascino Date: Thu, 4 Feb 2021 18:32:53 -0800 Subject: [PATCH 229/284] kasan: make addr_has_metadata() return true for valid addresses Currently, addr_has_metadata() returns true for every address. An invalid address (e.g. NULL) passed to the function when, KASAN_HW_TAGS is enabled, leads to a kernel panic. Make addr_has_metadata() return true for valid addresses only. Note: KASAN_HW_TAGS support for vmalloc will be added with a future patch. Link: https://lkml.kernel.org/r/20210126134409.47894-3-vincenzo.frascino@arm.com Fixes: 2e903b91479782b7 ("kasan, arm64: implement HW_TAGS runtime") Signed-off-by: Vincenzo Frascino Reviewed-by: Andrey Konovalov Cc: Andrey Ryabinin Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: Leon Romanovsky Cc: Catalin Marinas Cc: Mark Rutland Cc: Naresh Kamboju Cc: "Paul E . McKenney" Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/kasan/kasan.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h index cc4d9e1d49b1..8c706e7652f2 100644 --- a/mm/kasan/kasan.h +++ b/mm/kasan/kasan.h @@ -209,7 +209,7 @@ bool check_memory_region(unsigned long addr, size_t size, bool write, static inline bool addr_has_metadata(const void *addr) { - return true; + return (is_vmalloc_addr(addr) || virt_addr_valid(addr)); } #endif /* CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS */ From 28abcc963149e06d956d95a18a85f4ba26af746f Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Thu, 4 Feb 2021 18:32:57 -0800 Subject: [PATCH 230/284] ubsan: implement __ubsan_handle_alignment_assumption When building ARCH=mips 32r2el_defconfig with CONFIG_UBSAN_ALIGNMENT: ld.lld: error: undefined symbol: __ubsan_handle_alignment_assumption referenced by slab.h:557 (include/linux/slab.h:557) main.o:(do_initcalls) in archive init/built-in.a referenced by slab.h:448 (include/linux/slab.h:448) do_mounts_rd.o:(rd_load_image) in archive init/built-in.a referenced by slab.h:448 (include/linux/slab.h:448) do_mounts_rd.o:(identify_ramdisk_image) in archive init/built-in.a referenced 1579 more times Implement this for the kernel based on LLVM's handleAlignmentAssumptionImpl because the kernel is not linked against the compiler runtime. Link: https://github.com/ClangBuiltLinux/linux/issues/1245 Link: https://github.com/llvm/llvm-project/blob/llvmorg-11.0.1/compiler-rt/lib/ubsan/ubsan_handlers.cpp#L151-L190 Link: https://lkml.kernel.org/r/20210127224451.2587372-1-nathan@kernel.org Signed-off-by: Nathan Chancellor Acked-by: Kees Cook Reviewed-by: Nick Desaulniers Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- lib/ubsan.c | 31 +++++++++++++++++++++++++++++++ lib/ubsan.h | 6 ++++++ 2 files changed, 37 insertions(+) diff --git a/lib/ubsan.c b/lib/ubsan.c index 3e3352f3d0da..bec38c64d6a6 100644 --- a/lib/ubsan.c +++ b/lib/ubsan.c @@ -427,3 +427,34 @@ void __ubsan_handle_load_invalid_value(void *_data, void *val) ubsan_epilogue(); } EXPORT_SYMBOL(__ubsan_handle_load_invalid_value); + +void __ubsan_handle_alignment_assumption(void *_data, unsigned long ptr, + unsigned long align, + unsigned long offset); +void __ubsan_handle_alignment_assumption(void *_data, unsigned long ptr, + unsigned long align, + unsigned long offset) +{ + struct alignment_assumption_data *data = _data; + unsigned long real_ptr; + + if (suppress_report(&data->location)) + return; + + ubsan_prologue(&data->location, "alignment-assumption"); + + if (offset) + pr_err("assumption of %lu byte alignment (with offset of %lu byte) for pointer of type %s failed", + align, offset, data->type->type_name); + else + pr_err("assumption of %lu byte alignment for pointer of type %s failed", + align, data->type->type_name); + + real_ptr = ptr - offset; + pr_err("%saddress is %lu aligned, misalignment offset is %lu bytes", + offset ? "offset " : "", BIT(real_ptr ? __ffs(real_ptr) : 0), + real_ptr & (align - 1)); + + ubsan_epilogue(); +} +EXPORT_SYMBOL(__ubsan_handle_alignment_assumption); diff --git a/lib/ubsan.h b/lib/ubsan.h index 7b56c09473a9..9a0b71c5ff9f 100644 --- a/lib/ubsan.h +++ b/lib/ubsan.h @@ -78,6 +78,12 @@ struct invalid_value_data { struct type_descriptor *type; }; +struct alignment_assumption_data { + struct source_location location; + struct source_location assumption_location; + struct type_descriptor *type; +}; + #if defined(CONFIG_ARCH_SUPPORTS_INT128) typedef __int128 s_max; typedef unsigned __int128 u_max; From e558464be982af2546229dcbef746d24e942abaa Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Thu, 4 Feb 2021 18:33:00 -0800 Subject: [PATCH 231/284] mm: hugetlb: fix missing put_page in gather_surplus_pages() The VM_BUG_ON_PAGE avoids the generation of any code, even if that expression has side-effects when !CONFIG_DEBUG_VM. Link: https://lkml.kernel.org/r/20210126031009.96266-1-songmuchun@bytedance.com Fixes: e5dfacebe4a4 ("mm/hugetlb.c: just use put_page_testzero() instead of page_count()") Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Reviewed-by: Miaohe Lin Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/hugetlb.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d78111f0fa2c..4bdb58ab14cb 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2047,13 +2047,16 @@ retry: /* Free the needed pages to the hugetlb pool */ list_for_each_entry_safe(page, tmp, &surplus_list, lru) { + int zeroed; + if ((--needed) < 0) break; /* * This page is now managed by the hugetlb allocator and has * no users -- drop the buddy allocator's reference. */ - VM_BUG_ON_PAGE(!put_page_testzero(page), page); + zeroed = put_page_testzero(page); + VM_BUG_ON_PAGE(!zeroed, page); enqueue_huge_page(h, page); } free: From 654eb3f2a009af1fc64b10442e559e0d1e50904a Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Thu, 4 Feb 2021 18:33:03 -0800 Subject: [PATCH 232/284] MAINTAINERS/.mailmap: use my @kernel.org address Use my @kernel.org for all points of contact so that I am always accessible. Link: https://lkml.kernel.org/r/20210126212730.2097108-1-nathan@kernel.org Signed-off-by: Nathan Chancellor Acked-by: Nick Desaulniers Acked-by: Miguel Ojeda Cc: Sedat Dilek Cc: Lukas Bulwahn Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- .mailmap | 1 + MAINTAINERS | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/.mailmap b/.mailmap index 4105b5f80f4c..a90952727bc8 100644 --- a/.mailmap +++ b/.mailmap @@ -246,6 +246,7 @@ Morten Welinder Morten Welinder Morten Welinder Mythri P K +Nathan Chancellor Nguyen Anh Quynh Nicolas Ferre Nicolas Pitre diff --git a/MAINTAINERS b/MAINTAINERS index 2a0737dfca89..667d03852191 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4304,7 +4304,7 @@ S: Maintained F: .clang-format CLANG/LLVM BUILD SUPPORT -M: Nathan Chancellor +M: Nathan Chancellor M: Nick Desaulniers L: clang-built-linux@googlegroups.com S: Supported From c4bed4b96918ff1d062ee81fdae4d207da4fa9b0 Mon Sep 17 00:00:00 2001 From: Lai Jiangshan Date: Thu, 4 Feb 2021 23:27:06 +0800 Subject: [PATCH 233/284] x86/debug: Prevent data breakpoints on __per_cpu_offset When FSGSBASE is enabled, paranoid_entry() fetches the per-CPU GSBASE value via __per_cpu_offset or pcpu_unit_offsets. When a data breakpoint is set on __per_cpu_offset[cpu] (read-write operation), the specific CPU will be stuck in an infinite #DB loop. RCU will try to send an NMI to the specific CPU, but it is not working either since NMI also relies on paranoid_entry(). Which means it's undebuggable. Fixes: eaad981291ee3("x86/entry/64: Introduce the FIND_PERCPU_BASE macro") Signed-off-by: Lai Jiangshan Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210204152708.21308-1-jiangshanlai@gmail.com --- arch/x86/kernel/hw_breakpoint.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c index 6694c0f8e6c1..012ed82e5bd3 100644 --- a/arch/x86/kernel/hw_breakpoint.c +++ b/arch/x86/kernel/hw_breakpoint.c @@ -269,6 +269,20 @@ static inline bool within_cpu_entry(unsigned long addr, unsigned long end) CPU_ENTRY_AREA_TOTAL_SIZE)) return true; + /* + * When FSGSBASE is enabled, paranoid_entry() fetches the per-CPU + * GSBASE value via __per_cpu_offset or pcpu_unit_offsets. + */ +#ifdef CONFIG_SMP + if (within_area(addr, end, (unsigned long)__per_cpu_offset, + sizeof(unsigned long) * nr_cpu_ids)) + return true; +#else + if (within_area(addr, end, (unsigned long)&pcpu_unit_offsets, + sizeof(pcpu_unit_offsets))) + return true; +#endif + for_each_possible_cpu(cpu) { /* The original rw GDT is being used after load_direct_gdt() */ if (within_area(addr, end, (unsigned long)get_cpu_gdt_rw(cpu), From 3943abf2dbfae9ea4d2da05c1db569a0603f76da Mon Sep 17 00:00:00 2001 From: Lai Jiangshan Date: Thu, 4 Feb 2021 23:27:07 +0800 Subject: [PATCH 234/284] x86/debug: Prevent data breakpoints on cpu_dr7 local_db_save() is called at the start of exc_debug_kernel(), reads DR7 and disables breakpoints to prevent recursion. When running in a guest (X86_FEATURE_HYPERVISOR), local_db_save() reads the per-cpu variable cpu_dr7 to check whether a breakpoint is active or not before it accesses DR7. A data breakpoint on cpu_dr7 therefore results in infinite #DB recursion. Disallow data breakpoints on cpu_dr7 to prevent that. Fixes: 84b6a3491567a("x86/entry: Optimize local_db_save() for virt") Signed-off-by: Lai Jiangshan Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210204152708.21308-2-jiangshanlai@gmail.com --- arch/x86/kernel/hw_breakpoint.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c index 012ed82e5bd3..668a4a6533d9 100644 --- a/arch/x86/kernel/hw_breakpoint.c +++ b/arch/x86/kernel/hw_breakpoint.c @@ -307,6 +307,14 @@ static inline bool within_cpu_entry(unsigned long addr, unsigned long end) (unsigned long)&per_cpu(cpu_tlbstate, cpu), sizeof(struct tlb_state))) return true; + + /* + * When in guest (X86_FEATURE_HYPERVISOR), local_db_save() + * will read per-cpu cpu_dr7 before clear dr7 register. + */ + if (within_area(addr, end, (unsigned long)&per_cpu(cpu_dr7, cpu), + sizeof(cpu_dr7))) + return true; } return false; From 21b200d091826a83aafc95d847139b2b0582f6d1 Mon Sep 17 00:00:00 2001 From: Aurelien Aptel Date: Fri, 5 Feb 2021 15:42:48 +0100 Subject: [PATCH 235/284] cifs: report error instead of invalid when revalidating a dentry fails Assuming - //HOST/a is mounted on /mnt - //HOST/b is mounted on /mnt/b On a slow connection, running 'df' and killing it while it's processing /mnt/b can make cifs_get_inode_info() returns -ERESTARTSYS. This triggers the following chain of events: => the dentry revalidation fail => dentry is put and released => superblock associated with the dentry is put => /mnt/b is unmounted This patch makes cifs_d_revalidate() return the error instead of 0 (invalid) when cifs_revalidate_dentry() fails, except for ENOENT (file deleted) and ESTALE (file recreated). Signed-off-by: Aurelien Aptel Suggested-by: Shyam Prasad N Reviewed-by: Shyam Prasad N CC: stable@vger.kernel.org Signed-off-by: Steve French --- fs/cifs/dir.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c index 68900f1629bf..97ac363b5df1 100644 --- a/fs/cifs/dir.c +++ b/fs/cifs/dir.c @@ -737,6 +737,7 @@ static int cifs_d_revalidate(struct dentry *direntry, unsigned int flags) { struct inode *inode; + int rc; if (flags & LOOKUP_RCU) return -ECHILD; @@ -746,8 +747,25 @@ cifs_d_revalidate(struct dentry *direntry, unsigned int flags) if ((flags & LOOKUP_REVAL) && !CIFS_CACHE_READ(CIFS_I(inode))) CIFS_I(inode)->time = 0; /* force reval */ - if (cifs_revalidate_dentry(direntry)) - return 0; + rc = cifs_revalidate_dentry(direntry); + if (rc) { + cifs_dbg(FYI, "cifs_revalidate_dentry failed with rc=%d", rc); + switch (rc) { + case -ENOENT: + case -ESTALE: + /* + * Those errors mean the dentry is invalid + * (file was deleted or recreated) + */ + return 0; + default: + /* + * Otherwise some unexpected error happened + * report it as-is to VFS layer + */ + return rc; + } + } else { /* * If the inode wasn't known to be a dfs entry when From 4c7bcb51ae25f79e3733982e5d0cd8ce8640ddfc Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Mon, 21 Dec 2020 19:56:47 +0100 Subject: [PATCH 236/284] genirq: Prevent [devm_]irq_alloc_desc from returning irq 0 Since commit a85a6c86c25b ("driver core: platform: Clarify that IRQ 0 is invalid"), having a linux-irq with number 0 will trigger a WARN() when calling platform_get_irq*() to retrieve that linux-irq. Since [devm_]irq_alloc_desc allocs a single irq and since irq 0 is not used on some systems, it can return 0, triggering that WARN(). This happens e.g. on Intel Bay Trail and Cherry Trail devices using the LPE audio engine for HDMI audio: 0 is an invalid IRQ number WARNING: CPU: 3 PID: 472 at drivers/base/platform.c:238 platform_get_irq_optional+0x108/0x180 Modules linked in: snd_hdmi_lpe_audio(+) ... Call Trace: platform_get_irq+0x17/0x30 hdmi_lpe_audio_probe+0x4a/0x6c0 [snd_hdmi_lpe_audio] ---[ end trace ceece38854223a0b ]--- Change the 'from' parameter passed to __[devm_]irq_alloc_descs() by the [devm_]irq_alloc_desc macros from 0 to 1, so that these macros will no longer return 0. Fixes: a85a6c86c25b ("driver core: platform: Clarify that IRQ 0 is invalid") Signed-off-by: Hans de Goede Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201221185647.226146-1-hdegoede@redhat.com --- include/linux/irq.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index 4aeb1c4c7e07..2efde6a79b7e 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -928,7 +928,7 @@ int __devm_irq_alloc_descs(struct device *dev, int irq, unsigned int from, __irq_alloc_descs(irq, from, cnt, node, THIS_MODULE, NULL) #define irq_alloc_desc(node) \ - irq_alloc_descs(-1, 0, 1, node) + irq_alloc_descs(-1, 1, 1, node) #define irq_alloc_desc_at(at, node) \ irq_alloc_descs(at, at, 1, node) @@ -943,7 +943,7 @@ int __devm_irq_alloc_descs(struct device *dev, int irq, unsigned int from, __devm_irq_alloc_descs(dev, irq, from, cnt, node, THIS_MODULE, NULL) #define devm_irq_alloc_desc(dev, node) \ - devm_irq_alloc_descs(dev, -1, 0, 1, node) + devm_irq_alloc_descs(dev, -1, 1, 1, node) #define devm_irq_alloc_desc_at(dev, at, node) \ devm_irq_alloc_descs(dev, at, at, 1, node) From 256cfdd6fdf70c6fcf0f7c8ddb0ebd73ce8f3bc9 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Fri, 5 Feb 2021 15:40:04 -0500 Subject: [PATCH 237/284] tracing: Do not count ftrace events in top level enable output The file /sys/kernel/tracing/events/enable is used to enable all events by echoing in "1", or disabling all events when echoing in "0". To know if all events are enabled, disabled, or some are enabled but not all of them, cating the file should show either "1" (all enabled), "0" (all disabled), or "X" (some enabled but not all of them). This works the same as the "enable" files in the individule system directories (like tracing/events/sched/enable). But when all events are enabled, the top level "enable" file shows "X". The reason is that its checking the "ftrace" events, which are special events that only exist for their format files. These include the format for the function tracer events, that are enabled when the function tracer is enabled, but not by the "enable" file. The check includes these events, which will always be disabled, and even though all true events are enabled, the top level "enable" file will show "X" instead of "1". To fix this, have the check test the event's flags to see if it has the "IGNORE_ENABLE" flag set, and if so, not test it. Cc: stable@vger.kernel.org Fixes: 553552ce1796c ("tracing: Combine event filter_active and enable into single flags field") Reported-by: "Yordan Karadzhov (VMware)" Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_events.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index e9d28eeccb7e..d387b774ceeb 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -1212,7 +1212,8 @@ system_enable_read(struct file *filp, char __user *ubuf, size_t cnt, mutex_lock(&event_mutex); list_for_each_entry(file, &tr->events, list) { call = file->event_call; - if (!trace_event_name(call) || !call->class || !call->class->reg) + if ((call->flags & TRACE_EVENT_FL_IGNORE_ENABLE) || + !trace_event_name(call) || !call->class || !call->class->reg) continue; if (system && strcmp(call->class->system, system->name) != 0) From 2452483d9546de1c540f330469dc4042ff089731 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 5 Feb 2021 23:28:29 +0100 Subject: [PATCH 238/284] Revert "lib: Restrict cpumask_local_spread to houskeeping CPUs" This reverts commit 1abdfe706a579a702799fce465bceb9fb01d407c. This change is broken and not solving any problem it claims to solve. Robin reported that cpumask_local_spread() now returns any cpu out of cpu_possible_mask in case that NOHZ_FULL is disabled (runtime or compile time). It can also return any offline or not-present CPU in the housekeeping mask. Before that it was returning a CPU out of online_cpu_mask. While the function is racy against CPU hotplug if the caller does not protect against it, the actual use cases are not caring much about it as they use it mostly as hint for: - the user space affinity hint which is unused by the kernel - memory node selection which is just suboptimal - network queue affinity which might fail but is handled gracefully But the occasional fail vs. hotplug is very different from returning anything from possible_cpu_mask which can have a large amount of offline CPUs obviously. The changelog of the commit claims: "The current implementation of cpumask_local_spread() does not respect the isolated CPUs, i.e., even if a CPU has been isolated for Real-Time task, it will return it to the caller for pinning of its IRQ threads. Having these unwanted IRQ threads on an isolated CPU adds up to a latency overhead." The only correct part of this changelog is: "The current implementation of cpumask_local_spread() does not respect the isolated CPUs." Everything else is just disjunct from reality. Reported-by: Robin Murphy Signed-off-by: Thomas Gleixner Cc: Nitesh Narayan Lal Cc: Marcelo Tosatti Cc: abelits@marvell.com Cc: davem@davemloft.net Link: https://lore.kernel.org/r/87y2g26tnt.fsf@nanos.tec.linutronix.de --- lib/cpumask.c | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/lib/cpumask.c b/lib/cpumask.c index 35924025097b..c3c76b833384 100644 --- a/lib/cpumask.c +++ b/lib/cpumask.c @@ -6,7 +6,6 @@ #include #include #include -#include /** * cpumask_next - get the next cpu in a cpumask @@ -206,27 +205,22 @@ void __init free_bootmem_cpumask_var(cpumask_var_t mask) */ unsigned int cpumask_local_spread(unsigned int i, int node) { - int cpu, hk_flags; - const struct cpumask *mask; + int cpu; - hk_flags = HK_FLAG_DOMAIN | HK_FLAG_MANAGED_IRQ; - mask = housekeeping_cpumask(hk_flags); /* Wrap: we always want a cpu. */ - i %= cpumask_weight(mask); + i %= num_online_cpus(); if (node == NUMA_NO_NODE) { - for_each_cpu(cpu, mask) { + for_each_cpu(cpu, cpu_online_mask) if (i-- == 0) return cpu; - } } else { /* NUMA first. */ - for_each_cpu_and(cpu, cpumask_of_node(node), mask) { + for_each_cpu_and(cpu, cpumask_of_node(node), cpu_online_mask) if (i-- == 0) return cpu; - } - for_each_cpu(cpu, mask) { + for_each_cpu(cpu, cpu_online_mask) { /* Skip NUMA nodes, done above. */ if (cpumask_test_cpu(cpu, cpumask_of_node(node))) continue; From 6342adcaa683c2b705c24ed201dc11b35854c88d Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Wed, 3 Feb 2021 13:00:48 -0500 Subject: [PATCH 239/284] entry: Ensure trap after single-step on system call return Commit 299155244770 ("entry: Drop usage of TIF flags in the generic syscall code") introduced a bug on architectures using the generic syscall entry code, in which processes stopped by PTRACE_SYSCALL do not trap on syscall return after receiving a TIF_SINGLESTEP. The reason is that the meaning of TIF_SINGLESTEP flag is overloaded to cause the trap after a system call is executed, but since the above commit, the syscall call handler only checks for the SYSCALL_WORK flags on the exit work. Split the meaning of TIF_SINGLESTEP such that it only means single-step mode, and create a new type of SYSCALL_WORK to request a trap immediately after a syscall in single-step mode. In the current implementation, the SYSCALL_WORK flag shadows the TIF_SINGLESTEP flag for simplicity. Update x86 to flip this bit when a tracer enables single stepping. Fixes: 299155244770 ("entry: Drop usage of TIF flags in the generic syscall code") Suggested-by: Linus Torvalds Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Thomas Gleixner Tested-by: Kyle Huey Link: https://lore.kernel.org/r/87h7mtc9pr.fsf_-_@collabora.com --- arch/x86/include/asm/entry-common.h | 2 -- arch/x86/kernel/step.c | 10 ++++++++-- include/linux/entry-common.h | 1 + include/linux/thread_info.h | 2 ++ kernel/entry/common.c | 12 ++---------- 5 files changed, 13 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h index 6fe54b2813c1..2b87b191b3b8 100644 --- a/arch/x86/include/asm/entry-common.h +++ b/arch/x86/include/asm/entry-common.h @@ -43,8 +43,6 @@ static __always_inline void arch_check_user_regs(struct pt_regs *regs) } #define arch_check_user_regs arch_check_user_regs -#define ARCH_SYSCALL_EXIT_WORK (_TIF_SINGLESTEP) - static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs, unsigned long ti_work) { diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c index 60d2c3798ba2..0f3c307b37b3 100644 --- a/arch/x86/kernel/step.c +++ b/arch/x86/kernel/step.c @@ -127,12 +127,17 @@ static int enable_single_step(struct task_struct *child) regs->flags |= X86_EFLAGS_TF; /* - * Always set TIF_SINGLESTEP - this guarantees that - * we single-step system calls etc.. This will also + * Always set TIF_SINGLESTEP. This will also * cause us to set TF when returning to user mode. */ set_tsk_thread_flag(child, TIF_SINGLESTEP); + /* + * Ensure that a trap is triggered once stepping out of a system + * call prior to executing any user instruction. + */ + set_task_syscall_work(child, SYSCALL_EXIT_TRAP); + oflags = regs->flags; /* Set TF on the kernel stack.. */ @@ -230,6 +235,7 @@ void user_disable_single_step(struct task_struct *child) /* Always clear TIF_SINGLESTEP... */ clear_tsk_thread_flag(child, TIF_SINGLESTEP); + clear_task_syscall_work(child, SYSCALL_EXIT_TRAP); /* But touch TF only if it was set by us.. */ if (test_and_clear_tsk_thread_flag(child, TIF_FORCED_TF)) diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index ca86a00abe86..a104b298019a 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -46,6 +46,7 @@ SYSCALL_WORK_SYSCALL_TRACE | \ SYSCALL_WORK_SYSCALL_AUDIT | \ SYSCALL_WORK_SYSCALL_USER_DISPATCH | \ + SYSCALL_WORK_SYSCALL_EXIT_TRAP | \ ARCH_SYSCALL_WORK_EXIT) /* diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index c8a974cead73..9b2158c69275 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -43,6 +43,7 @@ enum syscall_work_bit { SYSCALL_WORK_BIT_SYSCALL_EMU, SYSCALL_WORK_BIT_SYSCALL_AUDIT, SYSCALL_WORK_BIT_SYSCALL_USER_DISPATCH, + SYSCALL_WORK_BIT_SYSCALL_EXIT_TRAP, }; #define SYSCALL_WORK_SECCOMP BIT(SYSCALL_WORK_BIT_SECCOMP) @@ -51,6 +52,7 @@ enum syscall_work_bit { #define SYSCALL_WORK_SYSCALL_EMU BIT(SYSCALL_WORK_BIT_SYSCALL_EMU) #define SYSCALL_WORK_SYSCALL_AUDIT BIT(SYSCALL_WORK_BIT_SYSCALL_AUDIT) #define SYSCALL_WORK_SYSCALL_USER_DISPATCH BIT(SYSCALL_WORK_BIT_SYSCALL_USER_DISPATCH) +#define SYSCALL_WORK_SYSCALL_EXIT_TRAP BIT(SYSCALL_WORK_BIT_SYSCALL_EXIT_TRAP) #endif #include diff --git a/kernel/entry/common.c b/kernel/entry/common.c index 6dd82be60df8..f9d491b17b78 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -209,15 +209,9 @@ static void exit_to_user_mode_prepare(struct pt_regs *regs) lockdep_sys_exit(); } -#ifndef _TIF_SINGLESTEP -static inline bool report_single_step(unsigned long work) -{ - return false; -} -#else /* * If SYSCALL_EMU is set, then the only reason to report is when - * TIF_SINGLESTEP is set (i.e. PTRACE_SYSEMU_SINGLESTEP). This syscall + * SINGLESTEP is set (i.e. PTRACE_SYSEMU_SINGLESTEP). This syscall * instruction has been already reported in syscall_enter_from_user_mode(). */ static inline bool report_single_step(unsigned long work) @@ -225,10 +219,8 @@ static inline bool report_single_step(unsigned long work) if (work & SYSCALL_WORK_SYSCALL_EMU) return false; - return !!(current_thread_info()->flags & _TIF_SINGLESTEP); + return work & SYSCALL_WORK_SYSCALL_EXIT_TRAP; } -#endif - static void syscall_exit_work(struct pt_regs *regs, unsigned long work) { From 36a6c843fd0d8e02506681577e96dabd203dd8e8 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Fri, 5 Feb 2021 13:43:21 -0500 Subject: [PATCH 240/284] entry: Use different define for selector variable in SUD Michael Kerrisk suggested that, from an API perspective, it is a bad idea to share the PR_SYS_DISPATCH_ defines between the prctl operation and the selector variable. Therefore, define two new constants to be used by SUD's selector variable and update the corresponding documentation and test cases. While this changes the API syscall user dispatch has never been part of a Linux release, it will show up for the first time in 5.11. Suggested-by: Michael Kerrisk (man-pages) Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20210205184321.2062251-1-krisman@collabora.com --- .../admin-guide/syscall-user-dispatch.rst | 4 ++-- include/uapi/linux/prctl.h | 3 +++ kernel/entry/syscall_user_dispatch.c | 4 ++-- .../syscall_user_dispatch/sud_benchmark.c | 8 +++++--- .../selftests/syscall_user_dispatch/sud_test.c | 14 ++++++++------ 5 files changed, 20 insertions(+), 13 deletions(-) diff --git a/Documentation/admin-guide/syscall-user-dispatch.rst b/Documentation/admin-guide/syscall-user-dispatch.rst index a380d6515774..60314953c728 100644 --- a/Documentation/admin-guide/syscall-user-dispatch.rst +++ b/Documentation/admin-guide/syscall-user-dispatch.rst @@ -70,8 +70,8 @@ trampoline code on the vDSO, that trampoline is never intercepted. [selector] is a pointer to a char-sized region in the process memory region, that provides a quick way to enable disable syscall redirection thread-wide, without the need to invoke the kernel directly. selector -can be set to PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF. Any other -value should terminate the program with a SIGSYS. +can be set to SYSCALL_DISPATCH_FILTER_ALLOW or SYSCALL_DISPATCH_FILTER_BLOCK. +Any other value should terminate the program with a SIGSYS. Security Notes -------------- diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 90deb41c8a34..667f1aed091c 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -251,5 +251,8 @@ struct prctl_mm_map { #define PR_SET_SYSCALL_USER_DISPATCH 59 # define PR_SYS_DISPATCH_OFF 0 # define PR_SYS_DISPATCH_ON 1 +/* The control values for the user space selector when dispatch is enabled */ +# define SYSCALL_DISPATCH_FILTER_ALLOW 0 +# define SYSCALL_DISPATCH_FILTER_BLOCK 1 #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c index b0338a5625d9..c240302f56e2 100644 --- a/kernel/entry/syscall_user_dispatch.c +++ b/kernel/entry/syscall_user_dispatch.c @@ -50,10 +50,10 @@ bool syscall_user_dispatch(struct pt_regs *regs) if (unlikely(__get_user(state, sd->selector))) do_exit(SIGSEGV); - if (likely(state == PR_SYS_DISPATCH_OFF)) + if (likely(state == SYSCALL_DISPATCH_FILTER_ALLOW)) return false; - if (state != PR_SYS_DISPATCH_ON) + if (state != SYSCALL_DISPATCH_FILTER_BLOCK) do_exit(SIGSYS); } diff --git a/tools/testing/selftests/syscall_user_dispatch/sud_benchmark.c b/tools/testing/selftests/syscall_user_dispatch/sud_benchmark.c index 6689f1183dbf..073a03702ff5 100644 --- a/tools/testing/selftests/syscall_user_dispatch/sud_benchmark.c +++ b/tools/testing/selftests/syscall_user_dispatch/sud_benchmark.c @@ -22,6 +22,8 @@ # define PR_SET_SYSCALL_USER_DISPATCH 59 # define PR_SYS_DISPATCH_OFF 0 # define PR_SYS_DISPATCH_ON 1 +# define SYSCALL_DISPATCH_FILTER_ALLOW 0 +# define SYSCALL_DISPATCH_FILTER_BLOCK 1 #endif #ifdef __NR_syscalls @@ -55,8 +57,8 @@ unsigned long trapped_call_count = 0; unsigned long native_call_count = 0; char selector; -#define SYSCALL_BLOCK (selector = PR_SYS_DISPATCH_ON) -#define SYSCALL_UNBLOCK (selector = PR_SYS_DISPATCH_OFF) +#define SYSCALL_BLOCK (selector = SYSCALL_DISPATCH_FILTER_BLOCK) +#define SYSCALL_UNBLOCK (selector = SYSCALL_DISPATCH_FILTER_ALLOW) #define CALIBRATION_STEP 100000 #define CALIBRATE_TO_SECS 5 @@ -170,7 +172,7 @@ int main(void) syscall(MAGIC_SYSCALL_1); #ifdef TEST_BLOCKED_RETURN - if (selector == PR_SYS_DISPATCH_OFF) { + if (selector == SYSCALL_DISPATCH_FILTER_ALLOW) { fprintf(stderr, "Failed to return with selector blocked.\n"); exit(-1); } diff --git a/tools/testing/selftests/syscall_user_dispatch/sud_test.c b/tools/testing/selftests/syscall_user_dispatch/sud_test.c index 6498b050ef89..b5d592d4099e 100644 --- a/tools/testing/selftests/syscall_user_dispatch/sud_test.c +++ b/tools/testing/selftests/syscall_user_dispatch/sud_test.c @@ -18,6 +18,8 @@ # define PR_SET_SYSCALL_USER_DISPATCH 59 # define PR_SYS_DISPATCH_OFF 0 # define PR_SYS_DISPATCH_ON 1 +# define SYSCALL_DISPATCH_FILTER_ALLOW 0 +# define SYSCALL_DISPATCH_FILTER_BLOCK 1 #endif #ifndef SYS_USER_DISPATCH @@ -30,8 +32,8 @@ # define MAGIC_SYSCALL_1 (0xff00) /* Bad Linux syscall number */ #endif -#define SYSCALL_DISPATCH_ON(x) ((x) = 1) -#define SYSCALL_DISPATCH_OFF(x) ((x) = 0) +#define SYSCALL_DISPATCH_ON(x) ((x) = SYSCALL_DISPATCH_FILTER_BLOCK) +#define SYSCALL_DISPATCH_OFF(x) ((x) = SYSCALL_DISPATCH_FILTER_ALLOW) /* Test Summary: * @@ -56,7 +58,7 @@ TEST_SIGNAL(dispatch_trigger_sigsys, SIGSYS) { - char sel = 0; + char sel = SYSCALL_DISPATCH_FILTER_ALLOW; struct sysinfo info; int ret; @@ -79,7 +81,7 @@ TEST_SIGNAL(dispatch_trigger_sigsys, SIGSYS) TEST(bad_prctl_param) { - char sel = 0; + char sel = SYSCALL_DISPATCH_FILTER_ALLOW; int op; /* Invalid op */ @@ -220,7 +222,7 @@ TEST_SIGNAL(bad_selector, SIGSYS) sigset_t mask; struct sysinfo info; - glob_sel = 0; + glob_sel = SYSCALL_DISPATCH_FILTER_ALLOW; nr_syscalls_emulated = 0; si_code = 0; si_errno = 0; @@ -288,7 +290,7 @@ TEST(direct_dispatch_range) { int ret = 0; struct sysinfo info; - char sel = 0; + char sel = SYSCALL_DISPATCH_FILTER_ALLOW; /* * Instead of calculating libc addresses; allow the entire From 8dc1c444df193701910f5e80b5d4caaf705a8fb0 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 4 Feb 2021 13:31:46 -0800 Subject: [PATCH 241/284] net: gro: do not keep too many GRO packets in napi->rx_list Commit c80794323e82 ("net: Fix packet reordering caused by GRO and listified RX cooperation") had the unfortunate effect of adding latencies in common workloads. Before the patch, GRO packets were immediately passed to upper stacks. After the patch, we can accumulate quite a lot of GRO packets (depdending on NAPI budget). My fix is counting in napi->rx_count number of segments instead of number of logical packets. Fixes: c80794323e82 ("net: Fix packet reordering caused by GRO and listified RX cooperation") Signed-off-by: Eric Dumazet Bisected-by: John Sperbeck Tested-by: Jian Yang Cc: Maxim Mikityanskiy Reviewed-by: Saeed Mahameed Reviewed-by: Edward Cree Reviewed-by: Alexander Lobakin Link: https://lore.kernel.org/r/20210204213146.4192368-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski --- net/core/dev.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index a979b86dbacd..449b45b843d4 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5735,10 +5735,11 @@ static void gro_normal_list(struct napi_struct *napi) /* Queue one GRO_NORMAL SKB up for list processing. If batch size exceeded, * pass the whole batch up to the stack. */ -static void gro_normal_one(struct napi_struct *napi, struct sk_buff *skb) +static void gro_normal_one(struct napi_struct *napi, struct sk_buff *skb, int segs) { list_add_tail(&skb->list, &napi->rx_list); - if (++napi->rx_count >= gro_normal_batch) + napi->rx_count += segs; + if (napi->rx_count >= gro_normal_batch) gro_normal_list(napi); } @@ -5777,7 +5778,7 @@ static int napi_gro_complete(struct napi_struct *napi, struct sk_buff *skb) } out: - gro_normal_one(napi, skb); + gro_normal_one(napi, skb, NAPI_GRO_CB(skb)->count); return NET_RX_SUCCESS; } @@ -6067,7 +6068,7 @@ static gro_result_t napi_skb_finish(struct napi_struct *napi, { switch (ret) { case GRO_NORMAL: - gro_normal_one(napi, skb); + gro_normal_one(napi, skb, 1); break; case GRO_DROP: @@ -6155,7 +6156,7 @@ static gro_result_t napi_frags_finish(struct napi_struct *napi, __skb_push(skb, ETH_HLEN); skb->protocol = eth_type_trans(skb, skb->dev); if (ret == GRO_NORMAL) - gro_normal_one(napi, skb); + gro_normal_one(napi, skb, 1); break; case GRO_DROP: From 275a9c72b420e5051b0e92e49b26bef06c196f29 Mon Sep 17 00:00:00 2001 From: Camelia Groza Date: Thu, 4 Feb 2021 18:49:26 +0200 Subject: [PATCH 242/284] dpaa_eth: reserve space for the xdp_frame under the A050385 erratum When the erratum workaround is triggered, the newly created xdp_frame structure is stored at the start of the newly allocated buffer. Avoid the structure from being overwritten by explicitly reserving enough space in the buffer for storing it. Account for the fact that the structure's size might increase in time by aligning the headroom to DPAA_FD_DATA_ALIGNMENT bytes, thus guaranteeing the data's alignment. Fixes: ae680bcbd06a ("dpaa_eth: implement the A050385 erratum workaround for XDP") Signed-off-by: Camelia Groza Acked-by: Maciej Fijalkowski Acked-by: Madalin Bucur Signed-off-by: Jakub Kicinski --- .../net/ethernet/freescale/dpaa/dpaa_eth.c | 20 +++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c index 4360ce4d3fb6..f3a879937d8d 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -2182,6 +2182,7 @@ static int dpaa_a050385_wa_xdpf(struct dpaa_priv *priv, struct xdp_frame *new_xdpf, *xdpf = *init_xdpf; void *new_buff; struct page *p; + int headroom; /* Check the data alignment and make sure the headroom is large * enough to store the xdpf backpointer. Use an aligned headroom @@ -2197,19 +2198,34 @@ static int dpaa_a050385_wa_xdpf(struct dpaa_priv *priv, return 0; } + /* The new xdp_frame is stored in the new buffer. Reserve enough space + * in the headroom for storing it along with the driver's private + * info. The headroom needs to be aligned to DPAA_FD_DATA_ALIGNMENT to + * guarantee the data's alignment in the buffer. + */ + headroom = ALIGN(sizeof(*new_xdpf) + priv->tx_headroom, + DPAA_FD_DATA_ALIGNMENT); + + /* Assure the extended headroom and data don't overflow the buffer, + * while maintaining the mandatory tailroom. + */ + if (headroom + xdpf->len > DPAA_BP_RAW_SIZE - + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) + return -ENOMEM; + p = dev_alloc_pages(0); if (unlikely(!p)) return -ENOMEM; /* Copy the data to the new buffer at a properly aligned offset */ new_buff = page_address(p); - memcpy(new_buff + priv->tx_headroom, xdpf->data, xdpf->len); + memcpy(new_buff + headroom, xdpf->data, xdpf->len); /* Create an XDP frame around the new buffer in a similar fashion * to xdp_convert_buff_to_frame. */ new_xdpf = new_buff; - new_xdpf->data = new_buff + priv->tx_headroom; + new_xdpf->data = new_buff + headroom; new_xdpf->len = xdpf->len; new_xdpf->headroom = priv->tx_headroom; new_xdpf->frame_sz = DPAA_BP_RAW_SIZE; From c2b0e8455eb76135f505dda81a8869e60f37a861 Mon Sep 17 00:00:00 2001 From: Camelia Groza Date: Thu, 4 Feb 2021 18:49:27 +0200 Subject: [PATCH 243/284] dpaa_eth: reduce data alignment requirements for the A050385 erratum The 256 byte data alignment is required for preventing DMA transaction splits when crossing 4K page boundaries. Since XDP deals only with page sized buffers or less, this restriction isn't needed. Instead, the data only needs to be aligned to 64 bytes to prevent DMA transaction splits. These lessened restrictions can increase performance by widening the pool of permitted data alignments and preventing unnecessary realignments. Fixes: ae680bcbd06a ("dpaa_eth: implement the A050385 erratum workaround for XDP") Signed-off-by: Camelia Groza Acked-by: Maciej Fijalkowski Acked-by: Madalin Bucur Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c index f3a879937d8d..2a2c7db23407 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -2192,7 +2192,7 @@ static int dpaa_a050385_wa_xdpf(struct dpaa_priv *priv, * byte frame headroom. If the XDP program uses all of it, copy the * data to a new buffer and make room for storing the backpointer. */ - if (PTR_IS_ALIGNED(xdpf->data, DPAA_A050385_ALIGN) && + if (PTR_IS_ALIGNED(xdpf->data, DPAA_FD_DATA_ALIGNMENT) && xdpf->headroom >= priv->tx_headroom) { xdpf->headroom = priv->tx_headroom; return 0; From 0a9946cca1a30b7236a86757da9df2222eb73ee0 Mon Sep 17 00:00:00 2001 From: Camelia Groza Date: Thu, 4 Feb 2021 18:49:28 +0200 Subject: [PATCH 244/284] dpaa_eth: try to move the data in place for the A050385 erratum The XDP frame's headroom might be large enough to accommodate the xdpf backpointer as well as shifting the data to an aligned address. Try this first before resorting to allocating a new buffer and copying the data. Suggested-by: Maciej Fijalkowski Signed-off-by: Camelia Groza Acked-by: Maciej Fijalkowski Acked-by: Madalin Bucur Signed-off-by: Jakub Kicinski --- .../net/ethernet/freescale/dpaa/dpaa_eth.c | 20 ++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c index 2a2c7db23407..6faa20bed488 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -2180,8 +2180,9 @@ static int dpaa_a050385_wa_xdpf(struct dpaa_priv *priv, struct xdp_frame **init_xdpf) { struct xdp_frame *new_xdpf, *xdpf = *init_xdpf; - void *new_buff; + void *new_buff, *aligned_data; struct page *p; + u32 data_shift; int headroom; /* Check the data alignment and make sure the headroom is large @@ -2198,6 +2199,23 @@ static int dpaa_a050385_wa_xdpf(struct dpaa_priv *priv, return 0; } + /* Try to move the data inside the buffer just enough to align it and + * store the xdpf backpointer. If the available headroom isn't large + * enough, resort to allocating a new buffer and copying the data. + */ + aligned_data = PTR_ALIGN_DOWN(xdpf->data, DPAA_FD_DATA_ALIGNMENT); + data_shift = xdpf->data - aligned_data; + + /* The XDP frame's headroom needs to be large enough to accommodate + * shifting the data as well as storing the xdpf backpointer. + */ + if (xdpf->headroom >= data_shift + priv->tx_headroom) { + memmove(aligned_data, xdpf->data, xdpf->len); + xdpf->data = aligned_data; + xdpf->headroom = priv->tx_headroom; + return 0; + } + /* The new xdp_frame is stored in the new buffer. Reserve enough space * in the headroom for storing it along with the driver's private * info. The headroom needs to be aligned to DPAA_FD_DATA_ALIGNMENT to From f317e2ea8c88737aa36228167b2292baef3f0430 Mon Sep 17 00:00:00 2001 From: Mohammad Athari Bin Ismail Date: Thu, 4 Feb 2021 22:03:16 +0800 Subject: [PATCH 245/284] net: stmmac: set TxQ mode back to DCB after disabling CBS When disable CBS, mode_to_use parameter is not updated even the operation mode of Tx Queue is changed to Data Centre Bridging (DCB). Therefore, when tc_setup_cbs() function is called to re-enable CBS, the operation mode of Tx Queue remains at DCB, which causing CBS fails to work. This patch updates the value of mode_to_use parameter to MTL_QUEUE_DCB after operation mode of Tx Queue is changed to DCB in stmmac_dma_qmode() callback function. Fixes: 1f705bc61aee ("net: stmmac: Add support for CBS QDISC") Suggested-by: Vinicius Costa Gomes Signed-off-by: Mohammad Athari Bin Ismail Signed-off-by: Song, Yoong Siang Reviewed-by: Jesse Brandeburg Acked-by: Vinicius Costa Gomes Link: https://lore.kernel.org/r/1612447396-20351-1-git-send-email-yoong.siang.song@intel.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c index 8ed3b2c834a0..56985542e202 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c @@ -324,7 +324,12 @@ static int tc_setup_cbs(struct stmmac_priv *priv, priv->plat->tx_queues_cfg[queue].mode_to_use = MTL_QUEUE_AVB; } else if (!qopt->enable) { - return stmmac_dma_qmode(priv, priv->ioaddr, queue, MTL_QUEUE_DCB); + ret = stmmac_dma_qmode(priv, priv->ioaddr, queue, + MTL_QUEUE_DCB); + if (ret) + return ret; + + priv->plat->tx_queues_cfg[queue].mode_to_use = MTL_QUEUE_DCB; } /* Port Transmit Rate and Speed Divider */ From 816ef8d7a2c4182e19bc06ab65751cb9e3951e94 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Fri, 5 Feb 2021 11:31:31 +0100 Subject: [PATCH 246/284] x86/efi: Remove EFI PGD build time checks With CONFIG_X86_5LEVEL, CONFIG_UBSAN and CONFIG_UBSAN_UNSIGNED_OVERFLOW enabled, clang fails the build with x86_64-linux-ld: arch/x86/platform/efi/efi_64.o: in function `efi_sync_low_kernel_mappings': efi_64.c:(.text+0x22c): undefined reference to `__compiletime_assert_354' which happens due to -fsanitize=unsigned-integer-overflow being enabled: -fsanitize=unsigned-integer-overflow: Unsigned integer overflow, where the result of an unsigned integer computation cannot be represented in its type. Unlike signed integer overflow, this is not undefined behavior, but it is often unintentional. This sanitizer does not check for lossy implicit conversions performed before such a computation (see -fsanitize=implicit-conversion). and that fires when the (intentional) EFI_VA_START/END defines overflow an unsigned long, leading to the assertion expressions not getting optimized away (on GCC they do)... However, those checks are superfluous: the runtime services mapping code already makes sure the ranges don't overshoot EFI_VA_END as the EFI mapping range is hardcoded. On each runtime services call, it is switched to the EFI-specific PGD and even if mappings manage to escape that last PGD, this won't remain unnoticed for long. So rip them out. See https://github.com/ClangBuiltLinux/linux/issues/256 for more info. Reported-by: Arnd Bergmann Signed-off-by: Borislav Petkov Reviewed-by: Nathan Chancellor Acked-by: Ard Biesheuvel Tested-by: Nick Desaulniers Tested-by: Nathan Chancellor Link: http://lkml.kernel.org/r/20210107223424.4135538-1-arnd@kernel.org --- arch/x86/platform/efi/efi_64.c | 19 ------------------- 1 file changed, 19 deletions(-) diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index e1e8d4e3a213..8efd003540ca 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -115,31 +115,12 @@ void efi_sync_low_kernel_mappings(void) pud_t *pud_k, *pud_efi; pgd_t *efi_pgd = efi_mm.pgd; - /* - * We can share all PGD entries apart from the one entry that - * covers the EFI runtime mapping space. - * - * Make sure the EFI runtime region mappings are guaranteed to - * only span a single PGD entry and that the entry also maps - * other important kernel regions. - */ - MAYBE_BUILD_BUG_ON(pgd_index(EFI_VA_END) != pgd_index(MODULES_END)); - MAYBE_BUILD_BUG_ON((EFI_VA_START & PGDIR_MASK) != - (EFI_VA_END & PGDIR_MASK)); - pgd_efi = efi_pgd + pgd_index(PAGE_OFFSET); pgd_k = pgd_offset_k(PAGE_OFFSET); num_entries = pgd_index(EFI_VA_END) - pgd_index(PAGE_OFFSET); memcpy(pgd_efi, pgd_k, sizeof(pgd_t) * num_entries); - /* - * As with PGDs, we share all P4D entries apart from the one entry - * that covers the EFI runtime mapping space. - */ - BUILD_BUG_ON(p4d_index(EFI_VA_END) != p4d_index(MODULES_END)); - BUILD_BUG_ON((EFI_VA_START & P4D_MASK) != (EFI_VA_END & P4D_MASK)); - pgd_efi = efi_pgd + pgd_index(EFI_VA_END); pgd_k = pgd_offset_k(EFI_VA_END); p4d_efi = p4d_offset(pgd_efi, 0); From ef66a1eace968ff22a35f45e6e8ec36b668b6116 Mon Sep 17 00:00:00 2001 From: Sukadev Bhattiprolu Date: Tue, 2 Feb 2021 21:08:02 -0800 Subject: [PATCH 247/284] ibmvnic: Clear failover_pending if unable to schedule Normally we clear the failover_pending flag when processing the reset. But if we are unable to schedule a failover reset we must clear the flag ourselves. We could fail to schedule the reset if we are in PROBING state (eg: when booting via kexec) or because we could not allocate memory. Thanks to Cris Forno for helping isolate the problem and for testing. Fixes: 1d8504937478 ("powerpc/vnic: Extend "failover pending" window") Signed-off-by: Sukadev Bhattiprolu Tested-by: Cristobal Forno Link: https://lore.kernel.org/r/20210203050802.680772-1-sukadev@linux.ibm.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/ibm/ibmvnic.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index f79034c786c8..a536fdbf05e1 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -4918,7 +4918,22 @@ static void ibmvnic_handle_crq(union ibmvnic_crq *crq, complete(&adapter->init_done); adapter->init_done_rc = -EIO; } - ibmvnic_reset(adapter, VNIC_RESET_FAILOVER); + rc = ibmvnic_reset(adapter, VNIC_RESET_FAILOVER); + if (rc && rc != -EBUSY) { + /* We were unable to schedule the failover + * reset either because the adapter was still + * probing (eg: during kexec) or we could not + * allocate memory. Clear the failover_pending + * flag since no one else will. We ignore + * EBUSY because it means either FAILOVER reset + * is already scheduled or the adapter is + * being removed. + */ + netdev_err(netdev, + "Error %ld scheduling failover reset\n", + rc); + adapter->failover_pending = false; + } break; case IBMVNIC_CRQ_INIT_COMPLETE: dev_info(dev, "Partner initialization complete\n"); From 5d1cbcc990f18edaddddef26677073c4e6fad7b7 Mon Sep 17 00:00:00 2001 From: Norbert Slusarek Date: Fri, 5 Feb 2021 13:12:06 +0100 Subject: [PATCH 248/284] net/vmw_vsock: fix NULL pointer dereference In vsock_stream_connect(), a thread will enter schedule_timeout(). While being scheduled out, another thread can enter vsock_stream_connect() as well and set vsk->transport to NULL. In case a signal was sent, the first thread can leave schedule_timeout() and vsock_transport_cancel_pkt() will be called right after. Inside vsock_transport_cancel_pkt(), a null dereference will happen on transport->cancel_pkt. Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Signed-off-by: Norbert Slusarek Reviewed-by: Stefano Garzarella Link: https://lore.kernel.org/r/trinity-c2d6cede-bfb1-44e2-85af-1fbc7f541715-1612535117028@3c-app-gmx-bap12 Signed-off-by: Jakub Kicinski --- net/vmw_vsock/af_vsock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 6894f21dc147..cb81cfb47a78 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1233,7 +1233,7 @@ static int vsock_transport_cancel_pkt(struct vsock_sock *vsk) { const struct vsock_transport *transport = vsk->transport; - if (!transport->cancel_pkt) + if (!transport || !transport->cancel_pkt) return -EOPNOTSUPP; return transport->cancel_pkt(vsk); From 3d0bc44d39bca615b72637e340317b7899b7f911 Mon Sep 17 00:00:00 2001 From: Norbert Slusarek Date: Fri, 5 Feb 2021 13:14:05 +0100 Subject: [PATCH 249/284] net/vmw_vsock: improve locking in vsock_connect_timeout() A possible locking issue in vsock_connect_timeout() was recognized by Eric Dumazet which might cause a null pointer dereference in vsock_transport_cancel_pkt(). This patch assures that vsock_transport_cancel_pkt() will be called within the lock, so a race condition won't occur which could result in vsk->transport to be set to NULL. Fixes: 380feae0def7 ("vsock: cancel packets when failing to connect") Reported-by: Eric Dumazet Signed-off-by: Norbert Slusarek Reviewed-by: Stefano Garzarella Link: https://lore.kernel.org/r/trinity-f8e0937a-cf0e-4d80-a76e-d9a958ba3ef1-1612535522360@3c-app-gmx-bap12 Signed-off-by: Jakub Kicinski --- net/vmw_vsock/af_vsock.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index cb81cfb47a78..4ea301fc2bf0 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1243,7 +1243,6 @@ static void vsock_connect_timeout(struct work_struct *work) { struct sock *sk; struct vsock_sock *vsk; - int cancel = 0; vsk = container_of(work, struct vsock_sock, connect_work.work); sk = sk_vsock(vsk); @@ -1254,11 +1253,9 @@ static void vsock_connect_timeout(struct work_struct *work) sk->sk_state = TCP_CLOSE; sk->sk_err = ETIMEDOUT; sk->sk_error_report(sk); - cancel = 1; + vsock_transport_cancel_pkt(vsk); } release_sock(sk); - if (cancel) - vsock_transport_cancel_pkt(vsk); sock_put(sk); } From 225353c070fda18a23785e34e1eec2be508a3a3c Mon Sep 17 00:00:00 2001 From: Shay Agroskin Date: Fri, 5 Feb 2021 21:51:14 +0200 Subject: [PATCH 250/284] net: ena: Update XDP verdict upon failure The verdict returned from ena_xdp_execute() is used to determine the fate of the RX buffer's page. In case of XDP Redirect/TX error the verdict should be set to XDP_ABORTED, otherwise the page won't be freed. Fixes: a318c70ad152 ("net: ena: introduce XDP redirect implementation") Signed-off-by: Shay Agroskin Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 06596fa1f9fe..a0596c073ddd 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -404,6 +404,7 @@ static int ena_xdp_execute(struct ena_ring *rx_ring, struct xdp_buff *xdp) if (unlikely(!xdpf)) { trace_xdp_exception(rx_ring->netdev, xdp_prog, verdict); xdp_stat = &rx_ring->rx_stats.xdp_aborted; + verdict = XDP_ABORTED; break; } @@ -424,7 +425,10 @@ static int ena_xdp_execute(struct ena_ring *rx_ring, struct xdp_buff *xdp) xdp_stat = &rx_ring->rx_stats.xdp_redirect; break; } - fallthrough; + trace_xdp_exception(rx_ring->netdev, xdp_prog, verdict); + xdp_stat = &rx_ring->rx_stats.xdp_aborted; + verdict = XDP_ABORTED; + break; case XDP_ABORTED: trace_xdp_exception(rx_ring->netdev, xdp_prog, verdict); xdp_stat = &rx_ring->rx_stats.xdp_aborted; From 92bf22614b21a2706f4993b278017e437f7785b3 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 7 Feb 2021 13:57:38 -0800 Subject: [PATCH 251/284] Linux 5.11-rc7 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 70035c522bd3..ade44ac4cc2f 100644 --- a/Makefile +++ b/Makefile @@ -2,7 +2,7 @@ VERSION = 5 PATCHLEVEL = 11 SUBLEVEL = 0 -EXTRAVERSION = -rc6 +EXTRAVERSION = -rc7 NAME = Kleptomaniac Octopus # *DOCUMENTATION* From b6c14d7a83802046f7098e9bae78fbde23affa74 Mon Sep 17 00:00:00 2001 From: Cezary Rojewski Date: Wed, 3 Feb 2021 20:19:24 +0100 Subject: [PATCH 252/284] dmaengine dw: Revert "dmaengine: dw: Enable runtime PM" This reverts commit 842067940a3e3fc008a60fee388e000219b32632. For some solutions e.g. sound/soc/intel/catpt, DW DMA is part of a compound device (in that very example, domains: ADSP, SSP0, SSP1, DMA0 and DMA1 are part of a single entity) rather than being a standalone one. Driver for said device may enlist DMA to transfer data during suspend or resume sequences. Manipulating RPM explicitly in dw's DMA request and release channel functions causes suspend() to also invoke resume() for the exact same device. Similar situation occurs for resume() sequence. Effectively renders device dysfunctional after first suspend() attempt. Revert the change to address the problem. Fixes: 842067940a3e ("dmaengine: dw: Enable runtime PM") Cc: Andy Shevchenko Signed-off-by: Cezary Rojewski Acked-by: Andy Shevchenko Link: https://lore.kernel.org/r/20210203191924.15706-1-cezary.rojewski@intel.com Signed-off-by: Vinod Koul --- drivers/dma/dw/core.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c index 19a23767533a..7ab83fe601ed 100644 --- a/drivers/dma/dw/core.c +++ b/drivers/dma/dw/core.c @@ -982,11 +982,8 @@ static int dwc_alloc_chan_resources(struct dma_chan *chan) dev_vdbg(chan2dev(chan), "%s\n", __func__); - pm_runtime_get_sync(dw->dma.dev); - /* ASSERT: channel is idle */ if (dma_readl(dw, CH_EN) & dwc->mask) { - pm_runtime_put_sync_suspend(dw->dma.dev); dev_dbg(chan2dev(chan), "DMA channel not idle?\n"); return -EIO; } @@ -1003,7 +1000,6 @@ static int dwc_alloc_chan_resources(struct dma_chan *chan) * We need controller-specific data to set up slave transfers. */ if (chan->private && !dw_dma_filter(chan, chan->private)) { - pm_runtime_put_sync_suspend(dw->dma.dev); dev_warn(chan2dev(chan), "Wrong controller-specific data\n"); return -EINVAL; } @@ -1047,8 +1043,6 @@ static void dwc_free_chan_resources(struct dma_chan *chan) if (!dw->in_use) do_dw_dma_off(dw); - pm_runtime_put_sync_suspend(dw->dma.dev); - dev_vdbg(chan2dev(chan), "%s: done\n", __func__); } From 3c55e94c0adea4a5389c4b80f6ae9927dd6a4501 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 4 Feb 2021 18:25:37 +0100 Subject: [PATCH 253/284] cpufreq: ACPI: Extend frequency tables to cover boost frequencies A severe performance regression on AMD EPYC processors when using the schedutil scaling governor was discovered by Phoronix.com and attributed to the following commits: 41ea667227ba ("x86, sched: Calculate frequency invariance for AMD systems") 976df7e5730e ("x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC") The source of the problem is that the maximum performance level taken for computing the arch_max_freq_ratio value used in the x86 scale- invariance code is higher than the one corresponding to the cpuinfo.max_freq value coming from the acpi_cpufreq driver. This effectively causes the scale-invariant utilization to fall below 100% even if the CPU runs at cpuinfo.max_freq or slightly faster, so the schedutil governor selects a frequency below cpuinfo.max_freq then. That frequency corresponds to a frequency table entry below the maximum performance level necessary to get to the "boost" range of CPU frequencies. However, if the cpuinfo.max_freq value coming from acpi_cpufreq was higher, the schedutil governor would select higher frequencies which in turn would allow acpi_cpufreq to set more adequate performance levels and to get to the "boost" range of CPU frequencies more often. This issue affects any systems where acpi_cpufreq is used and the "boost" (or "turbo") frequencies are enabled, not just AMD EPYC. Moreover, commit db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate") from the 5.10 development cycle made it extremely easy to default to schedutil even if the preferred driver is acpi_cpufreq as long as intel_pstate is built too, because the mere presence of the latter effectively removes the ondemand governor from the defaults. Distro kernels are likely to include both intel_pstate and acpi_cpufreq on x86, so their users who cannot use intel_pstate or choose to use acpi_cpufreq may easily be affectecd by this issue. To address this issue, extend the frequency table constructed by acpi_cpufreq for each CPU to cover the entire range of available frequencies (including the "boost" ones) if CPPC is available and indicates that "boost" (or "turbo") frequencies are enabled. That causes cpuinfo.max_freq to become the maximum "boost" frequency of the given CPU (instead of the maximum frequency returned by the ACPI _PSS object that corresponds to the "nominal" performance level). Fixes: 41ea667227ba ("x86, sched: Calculate frequency invariance for AMD systems") Fixes: 976df7e5730e ("x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC") Fixes: db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate") Link: https://www.phoronix.com/scan.php?page=article&item=linux511-amd-schedutil&num=1 Link: https://lore.kernel.org/linux-pm/20210203135321.12253-2-ggherdovich@suse.cz/ Reported-by: Michael Larabel Diagnosed-by: Giovanni Gherdovich Signed-off-by: Rafael J. Wysocki Tested-by: Giovanni Gherdovich Reviewed-by: Giovanni Gherdovich Tested-by: Michael Larabel --- drivers/cpufreq/acpi-cpufreq.c | 109 +++++++++++++++++++++++++++++---- 1 file changed, 96 insertions(+), 13 deletions(-) diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c index 1e4fbb002a31..4614f1c6f50a 100644 --- a/drivers/cpufreq/acpi-cpufreq.c +++ b/drivers/cpufreq/acpi-cpufreq.c @@ -26,6 +26,7 @@ #include #include +#include #include #include @@ -53,6 +54,7 @@ struct acpi_cpufreq_data { unsigned int resume; unsigned int cpu_feature; unsigned int acpi_perf_cpu; + unsigned int first_perf_state; cpumask_var_t freqdomain_cpus; void (*cpu_freq_write)(struct acpi_pct_register *reg, u32 val); u32 (*cpu_freq_read)(struct acpi_pct_register *reg); @@ -221,10 +223,10 @@ static unsigned extract_msr(struct cpufreq_policy *policy, u32 msr) perf = to_perf_data(data); - cpufreq_for_each_entry(pos, policy->freq_table) + cpufreq_for_each_entry(pos, policy->freq_table + data->first_perf_state) if (msr == perf->states[pos->driver_data].status) return pos->frequency; - return policy->freq_table[0].frequency; + return policy->freq_table[data->first_perf_state].frequency; } static unsigned extract_freq(struct cpufreq_policy *policy, u32 val) @@ -363,6 +365,7 @@ static unsigned int get_cur_freq_on_cpu(unsigned int cpu) struct cpufreq_policy *policy; unsigned int freq; unsigned int cached_freq; + unsigned int state; pr_debug("%s (%d)\n", __func__, cpu); @@ -374,7 +377,11 @@ static unsigned int get_cur_freq_on_cpu(unsigned int cpu) if (unlikely(!data || !policy->freq_table)) return 0; - cached_freq = policy->freq_table[to_perf_data(data)->state].frequency; + state = to_perf_data(data)->state; + if (state < data->first_perf_state) + state = data->first_perf_state; + + cached_freq = policy->freq_table[state].frequency; freq = extract_freq(policy, get_cur_val(cpumask_of(cpu), data)); if (freq != cached_freq) { /* @@ -628,16 +635,54 @@ static int acpi_cpufreq_blacklist(struct cpuinfo_x86 *c) } #endif +#ifdef CONFIG_ACPI_CPPC_LIB +static u64 get_max_boost_ratio(unsigned int cpu) +{ + struct cppc_perf_caps perf_caps; + u64 highest_perf, nominal_perf; + int ret; + + if (acpi_pstate_strict) + return 0; + + ret = cppc_get_perf_caps(cpu, &perf_caps); + if (ret) { + pr_debug("CPU%d: Unable to get performance capabilities (%d)\n", + cpu, ret); + return 0; + } + + highest_perf = perf_caps.highest_perf; + nominal_perf = perf_caps.nominal_perf; + + if (!highest_perf || !nominal_perf) { + pr_debug("CPU%d: highest or nominal performance missing\n", cpu); + return 0; + } + + if (highest_perf < nominal_perf) { + pr_debug("CPU%d: nominal performance above highest\n", cpu); + return 0; + } + + return div_u64(highest_perf << SCHED_CAPACITY_SHIFT, nominal_perf); +} +#else +static inline u64 get_max_boost_ratio(unsigned int cpu) { return 0; } +#endif + static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy) { - unsigned int i; - unsigned int valid_states = 0; - unsigned int cpu = policy->cpu; - struct acpi_cpufreq_data *data; - unsigned int result = 0; - struct cpuinfo_x86 *c = &cpu_data(policy->cpu); - struct acpi_processor_performance *perf; struct cpufreq_frequency_table *freq_table; + struct acpi_processor_performance *perf; + struct acpi_cpufreq_data *data; + unsigned int cpu = policy->cpu; + struct cpuinfo_x86 *c = &cpu_data(cpu); + unsigned int valid_states = 0; + unsigned int result = 0; + unsigned int state_count; + u64 max_boost_ratio; + unsigned int i; #ifdef CONFIG_SMP static int blacklisted; #endif @@ -750,8 +795,20 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy) goto err_unreg; } - freq_table = kcalloc(perf->state_count + 1, sizeof(*freq_table), - GFP_KERNEL); + state_count = perf->state_count + 1; + + max_boost_ratio = get_max_boost_ratio(cpu); + if (max_boost_ratio) { + /* + * Make a room for one more entry to represent the highest + * available "boost" frequency. + */ + state_count++; + valid_states++; + data->first_perf_state = valid_states; + } + + freq_table = kcalloc(state_count, sizeof(*freq_table), GFP_KERNEL); if (!freq_table) { result = -ENOMEM; goto err_unreg; @@ -785,6 +842,30 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy) valid_states++; } freq_table[valid_states].frequency = CPUFREQ_TABLE_END; + + if (max_boost_ratio) { + unsigned int state = data->first_perf_state; + unsigned int freq = freq_table[state].frequency; + + /* + * Because the loop above sorts the freq_table entries in the + * descending order, freq is the maximum frequency in the table. + * Assume that it corresponds to the CPPC nominal frequency and + * use it to populate the frequency field of the extra "boost" + * frequency entry. + */ + freq_table[0].frequency = freq * max_boost_ratio >> SCHED_CAPACITY_SHIFT; + /* + * The purpose of the extra "boost" frequency entry is to make + * the rest of cpufreq aware of the real maximum frequency, but + * the way to request it is the same as for the first_perf_state + * entry that is expected to cover the entire range of "boost" + * frequencies of the CPU, so copy the driver_data value from + * that entry. + */ + freq_table[0].driver_data = freq_table[state].driver_data; + } + policy->freq_table = freq_table; perf->state = 0; @@ -858,8 +939,10 @@ static void acpi_cpufreq_cpu_ready(struct cpufreq_policy *policy) { struct acpi_processor_performance *perf = per_cpu_ptr(acpi_perf_data, policy->cpu); + struct acpi_cpufreq_data *data = policy->driver_data; + unsigned int freq = policy->freq_table[data->first_perf_state].frequency; - if (perf->states[0].core_frequency * 1000 != policy->cpuinfo.max_freq) + if (perf->states[0].core_frequency * 1000 != freq) pr_warn(FW_WARN "P-state 0 is not max freq\n"); } From d11a1d08a082a7dc0ada423d2b2e26e9b6f2525c Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 4 Feb 2021 18:34:32 +0100 Subject: [PATCH 254/284] cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there If the maximum performance level taken for computing the arch_max_freq_ratio value used in the x86 scale-invariance code is higher than the one corresponding to the cpuinfo.max_freq value coming from the acpi_cpufreq driver, the scale-invariant utilization falls below 100% even if the CPU runs at cpuinfo.max_freq or slightly faster, which causes the schedutil governor to select a frequency below cpuinfo.max_freq. That frequency corresponds to a frequency table entry below the maximum performance level necessary to get to the "boost" range of CPU frequencies which prevents "boost" frequencies from being used in some workloads. While this issue is related to scale-invariance, it may be amplified by commit db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate") from the 5.10 development cycle which made it extremely easy to default to schedutil even if the preferred driver is acpi_cpufreq as long as intel_pstate is built too, because the mere presence of the latter effectively removes the ondemand governor from the defaults. Distro kernels are likely to include both intel_pstate and acpi_cpufreq on x86, so their users who cannot use intel_pstate or choose to use acpi_cpufreq may easily be affectecd by this issue. If CPPC is available, it can be used to address this issue by extending the frequency tables created by acpi_cpufreq to cover the entire available frequency range (including "boost" frequencies) for each CPU, but if CPPC is not there, acpi_cpufreq has no idea what the maximum "boost" frequency is and the frequency tables created by it cannot be extended in a meaningful way, so in that case make it ask the arch scale-invariance code to to use the "nominal" performance level for CPU utilization scaling in order to avoid the issue at hand. Fixes: db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate") Signed-off-by: Rafael J. Wysocki Reviewed-by: Giovanni Gherdovich Acked-by: Peter Zijlstra (Intel) --- arch/x86/kernel/smpboot.c | 1 + drivers/cpufreq/acpi-cpufreq.c | 8 ++++++++ 2 files changed, 9 insertions(+) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 117e24fbfd8a..02813a7f3a7c 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1833,6 +1833,7 @@ void arch_set_max_freq_ratio(bool turbo_disabled) arch_max_freq_ratio = turbo_disabled ? SCHED_CAPACITY_SCALE : arch_turbo_freq_ratio; } +EXPORT_SYMBOL_GPL(arch_set_max_freq_ratio); static bool turbo_disabled(void) { diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c index 4614f1c6f50a..d3e5a6fceb61 100644 --- a/drivers/cpufreq/acpi-cpufreq.c +++ b/drivers/cpufreq/acpi-cpufreq.c @@ -806,6 +806,14 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy) state_count++; valid_states++; data->first_perf_state = valid_states; + } else { + /* + * If the maximum "boost" frequency is unknown, ask the arch + * scale-invariance code to use the "nominal" performance for + * CPU utilization scaling so as to prevent the schedutil + * governor from selecting inadequate CPU frequencies. + */ + arch_set_max_freq_ratio(true); } freq_table = kcalloc(state_count, sizeof(*freq_table), GFP_KERNEL); From fe0af09074bfeb46a35357e67635eefe33cdfc49 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Sat, 6 Feb 2021 09:49:37 +0100 Subject: [PATCH 255/284] Revert "ACPICA: Interpreter: fix memory leak by using existing buffer" This reverts commit 32cf1a12cad43358e47dac8014379c2f33dfbed4. The 'exisitng buffer' in this case is the firmware provided table, and we should not modify that in place. This fixes a crash on arm64 with initrd table overrides, in which case the DSDT is not mapped with read/write permissions. Reported-by: Shawn Guo Signed-off-by: Ard Biesheuvel Tested-by: Shawn Guo Signed-off-by: Rafael J. Wysocki --- drivers/acpi/acpica/nsrepair2.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/acpi/acpica/nsrepair2.c b/drivers/acpi/acpica/nsrepair2.c index d2c8d8279e7a..24c197d91f29 100644 --- a/drivers/acpi/acpica/nsrepair2.c +++ b/drivers/acpi/acpica/nsrepair2.c @@ -495,8 +495,9 @@ acpi_ns_repair_HID(struct acpi_evaluate_info *info, union acpi_operand_object **return_object_ptr) { union acpi_operand_object *return_object = *return_object_ptr; - char *dest; + union acpi_operand_object *new_string; char *source; + char *dest; ACPI_FUNCTION_NAME(ns_repair_HID); @@ -517,6 +518,13 @@ acpi_ns_repair_HID(struct acpi_evaluate_info *info, return_ACPI_STATUS(AE_OK); } + /* It is simplest to always create a new string object */ + + new_string = acpi_ut_create_string_object(return_object->string.length); + if (!new_string) { + return_ACPI_STATUS(AE_NO_MEMORY); + } + /* * Remove a leading asterisk if present. For some unknown reason, there * are many machines in the field that contains IDs like this. @@ -526,7 +534,7 @@ acpi_ns_repair_HID(struct acpi_evaluate_info *info, source = return_object->string.pointer; if (*source == '*') { source++; - return_object->string.length--; + new_string->string.length--; ACPI_DEBUG_PRINT((ACPI_DB_REPAIR, "%s: Removed invalid leading asterisk\n", @@ -541,11 +549,12 @@ acpi_ns_repair_HID(struct acpi_evaluate_info *info, * "NNNN####" where N is an uppercase letter or decimal digit, and * # is a hex digit. */ - for (dest = return_object->string.pointer; *source; dest++, source++) { + for (dest = new_string->string.pointer; *source; dest++, source++) { *dest = (char)toupper((int)*source); } - return_object->string.pointer[return_object->string.length] = 0; + acpi_ut_remove_reference(return_object); + *return_object_ptr = new_string; return_ACPI_STATUS(AE_OK); } From af8085f3a4712c57d0dd415ad543bac85780375c Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Fri, 5 Feb 2021 11:36:30 +1100 Subject: [PATCH 256/284] net: fix iteration for sctp transport seq_files The sctp transport seq_file iterators take a reference to the transport in the ->start and ->next functions and releases the reference in the ->show function. The preferred handling for such resources is to release them in the subsequent ->next or ->stop function call. Since Commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") there is no guarantee that ->show will be called after ->next, so this function can now leak references. So move the sctp_transport_put() call to ->next and ->stop. Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") Reported-by: Xin Long Signed-off-by: NeilBrown Acked-by: Marcelo Ricardo Leitner Signed-off-by: Jakub Kicinski --- net/sctp/proc.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/net/sctp/proc.c b/net/sctp/proc.c index f7da88ae20a5..982a87b3e11f 100644 --- a/net/sctp/proc.c +++ b/net/sctp/proc.c @@ -215,6 +215,12 @@ static void sctp_transport_seq_stop(struct seq_file *seq, void *v) { struct sctp_ht_iter *iter = seq->private; + if (v && v != SEQ_START_TOKEN) { + struct sctp_transport *transport = v; + + sctp_transport_put(transport); + } + sctp_transport_walk_stop(&iter->hti); } @@ -222,6 +228,12 @@ static void *sctp_transport_seq_next(struct seq_file *seq, void *v, loff_t *pos) { struct sctp_ht_iter *iter = seq->private; + if (v && v != SEQ_START_TOKEN) { + struct sctp_transport *transport = v; + + sctp_transport_put(transport); + } + ++*pos; return sctp_transport_get_next(seq_file_net(seq), &iter->hti); @@ -277,8 +289,6 @@ static int sctp_assocs_seq_show(struct seq_file *seq, void *v) sk->sk_rcvbuf); seq_printf(seq, "\n"); - sctp_transport_put(transport); - return 0; } @@ -354,8 +364,6 @@ static int sctp_remaddr_seq_show(struct seq_file *seq, void *v) seq_printf(seq, "\n"); } - sctp_transport_put(transport); - return 0; } From ce7536bc7398e2ae552d2fabb7e0e371a9f1fe46 Mon Sep 17 00:00:00 2001 From: Stefano Garzarella Date: Mon, 8 Feb 2021 15:44:54 +0100 Subject: [PATCH 257/284] vsock/virtio: update credit only if socket is not closed If the socket is closed or is being released, some resources used by virtio_transport_space_update() such as 'vsk->trans' may be released. To avoid a use after free bug we should only update the available credit when we are sure the socket is still open and we have the lock held. Fixes: 06a8fc78367d ("VSOCK: Introduce virtio_vsock_common.ko") Signed-off-by: Stefano Garzarella Acked-by: Michael S. Tsirkin Link: https://lore.kernel.org/r/20210208144454.84438-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski --- net/vmw_vsock/virtio_transport_common.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index 5956939eebb7..e4370b1b7494 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -1130,8 +1130,6 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, vsk = vsock_sk(sk); - space_available = virtio_transport_space_update(sk, pkt); - lock_sock(sk); /* Check if sk has been closed before lock_sock */ @@ -1142,6 +1140,8 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, goto free_pkt; } + space_available = virtio_transport_space_update(sk, pkt); + /* Update CID in case it has changed after a transport reset event */ vsk->local_addr.svm_cid = dst.svm_cid; From 07998281c268592963e1cd623fe6ab0270b65ae4 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Fri, 5 Feb 2021 12:56:43 +0100 Subject: [PATCH 258/284] netfilter: conntrack: skip identical origin tuple in same zone only The origin skip check needs to re-test the zone. Else, we might skip a colliding tuple in the reply direction. This only occurs when using 'directional zones' where origin tuples reside in different zones but the reply tuples share the same zone. This causes the new conntrack entry to be dropped at confirmation time because NAT clash resolution was elided. Fixes: 4e35c1cb9460240 ("netfilter: nf_nat: skip nat clash resolution for same-origin entries") Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 234b7cab37c3..ff0168736f6e 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1229,7 +1229,8 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple, * Let nf_ct_resolve_clash() deal with this later. */ if (nf_ct_tuple_equal(&ignored_conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple, - &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple)) + &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple) && + nf_ct_zone_equal(ct, zone, IP_CT_DIR_ORIGINAL)) continue; NF_CT_STAT_INC_ATOMIC(net, found); From 664899e85c1312e51d2761e7f8b2f25d053e8489 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Mon, 8 Feb 2021 13:20:47 +0100 Subject: [PATCH 259/284] netfilter: nftables: relax check for stateful expressions in set definition Restore the original behaviour where users are allowed to add an element with any stateful expression if the set definition specifies no stateful expressions. Make sure upper maximum number of stateful expressions of NFT_SET_EXPR_MAX is not reached. Fixes: 8cfd9b0f8515 ("netfilter: nftables: generalize set expressions support") Fixes: 48b0ae046ee9 ("netfilter: nftables: netlink support for several set element expressions") Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_tables_api.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 43fe80f10313..8ee9f40cc0ea 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -5281,6 +5281,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, struct nft_expr *expr_array[NFT_SET_EXPR_MAX] = {}; struct nlattr *nla[NFTA_SET_ELEM_MAX + 1]; u8 genmask = nft_genmask_next(ctx->net); + u32 flags = 0, size = 0, num_exprs = 0; struct nft_set_ext_tmpl tmpl; struct nft_set_ext *ext, *ext2; struct nft_set_elem elem; @@ -5290,7 +5291,6 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, struct nft_data_desc desc; enum nft_registers dreg; struct nft_trans *trans; - u32 flags = 0, size = 0; u64 timeout; u64 expiration; int err, i; @@ -5356,7 +5356,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, if (nla[NFTA_SET_ELEM_EXPR]) { struct nft_expr *expr; - if (set->num_exprs != 1) + if (set->num_exprs && set->num_exprs != 1) return -EOPNOTSUPP; expr = nft_set_elem_expr_alloc(ctx, set, @@ -5365,8 +5365,9 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, return PTR_ERR(expr); expr_array[0] = expr; + num_exprs = 1; - if (set->exprs[0] && set->exprs[0]->ops != expr->ops) { + if (set->num_exprs && set->exprs[0]->ops != expr->ops) { err = -EOPNOTSUPP; goto err_set_elem_expr; } @@ -5375,12 +5376,10 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, struct nlattr *tmp; int left; - if (set->num_exprs == 0) - return -EOPNOTSUPP; - i = 0; nla_for_each_nested(tmp, nla[NFTA_SET_ELEM_EXPRESSIONS], left) { - if (i == set->num_exprs) { + if (i == NFT_SET_EXPR_MAX || + (set->num_exprs && set->num_exprs == i)) { err = -E2BIG; goto err_set_elem_expr; } @@ -5394,14 +5393,15 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, goto err_set_elem_expr; } expr_array[i] = expr; + num_exprs++; - if (expr->ops != set->exprs[i]->ops) { + if (set->num_exprs && expr->ops != set->exprs[i]->ops) { err = -EOPNOTSUPP; goto err_set_elem_expr; } i++; } - if (set->num_exprs != i) { + if (set->num_exprs && set->num_exprs != i) { err = -EOPNOTSUPP; goto err_set_elem_expr; } @@ -5409,6 +5409,8 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, err = nft_set_elem_expr_clone(ctx, set, expr_array); if (err < 0) goto err_set_elem_expr_clone; + + num_exprs = set->num_exprs; } err = nft_setelem_parse_key(ctx, set, &elem.key.val, @@ -5433,8 +5435,8 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, nft_set_ext_add(&tmpl, NFT_SET_EXT_TIMEOUT); } - if (set->num_exprs) { - for (i = 0; i < set->num_exprs; i++) + if (num_exprs) { + for (i = 0; i < num_exprs; i++) size += expr_array[i]->ops->size; nft_set_ext_add_length(&tmpl, NFT_SET_EXT_EXPRESSIONS, @@ -5522,7 +5524,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, *nft_set_ext_obj(ext) = obj; obj->use++; } - for (i = 0; i < set->num_exprs; i++) + for (i = 0; i < num_exprs; i++) nft_set_elem_expr_setup(ext, i, expr_array); trans = nft_trans_elem_alloc(ctx, NFT_MSG_NEWSETELEM, set); @@ -5584,7 +5586,7 @@ err_parse_key_end: err_parse_key: nft_data_release(&elem.key.val, NFT_DATA_VALUE); err_set_elem_expr: - for (i = 0; i < set->num_exprs && expr_array[i]; i++) + for (i = 0; i < num_exprs && expr_array[i]; i++) nft_expr_destroy(ctx, expr_array[i]); err_set_elem_expr_clone: return err; From 3aa6bce9af0e25b735c9c1263739a5639a336ae8 Mon Sep 17 00:00:00 2001 From: Edwin Peer Date: Fri, 5 Feb 2021 17:37:32 -0800 Subject: [PATCH 260/284] net: watchdog: hold device global xmit lock during tx disable Prevent netif_tx_disable() running concurrently with dev_watchdog() by taking the device global xmit lock. Otherwise, the recommended: netif_carrier_off(dev); netif_tx_disable(dev); driver shutdown sequence can happen after the watchdog has already checked carrier, resulting in possible false alarms. This is because netif_tx_lock() only sets the frozen bit without maintaining the locks on the individual queues. Fixes: c3f26a269c24 ("netdev: Fix lockdep warnings in multiqueue configurations.") Signed-off-by: Edwin Peer Reviewed-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/netdevice.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 259be67644e3..5ff27c12ce68 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4352,6 +4352,7 @@ static inline void netif_tx_disable(struct net_device *dev) local_bh_disable(); cpu = smp_processor_id(); + spin_lock(&dev->tx_global_lock); for (i = 0; i < dev->num_tx_queues; i++) { struct netdev_queue *txq = netdev_get_tx_queue(dev, i); @@ -4359,6 +4360,7 @@ static inline void netif_tx_disable(struct net_device *dev) netif_tx_stop_queue(txq); __netif_tx_unlock(txq); } + spin_unlock(&dev->tx_global_lock); local_bh_enable(); } From b2bdba1cbc84cadb14393d0101a5bfd38d342e0a Mon Sep 17 00:00:00 2001 From: Horatiu Vultur Date: Sat, 6 Feb 2021 22:47:33 +0100 Subject: [PATCH 261/284] bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_state The function br_mrp_port_switchdev_set_state was called both with MRP port state and STP port state, which is an issue because they don't match exactly. Therefore, update the function to be used only with STP port state and use the id SWITCHDEV_ATTR_ID_PORT_STP_STATE. The choice of using STP over MRP is that the drivers already implement SWITCHDEV_ATTR_ID_PORT_STP_STATE and already in SW we update the port STP state. Fixes: 9a9f26e8f7ea30 ("bridge: mrp: Connect MRP API with the switchdev API") Fixes: fadd409136f0f2 ("bridge: switchdev: mrp: Implement MRP API for switchdev") Fixes: 2f1a11ae11d222 ("bridge: mrp: Add MRP interface.") Reported-by: Rasmus Villemoes Signed-off-by: Horatiu Vultur Signed-off-by: David S. Miller --- net/bridge/br_mrp.c | 9 ++++++--- net/bridge/br_mrp_switchdev.c | 7 +++---- net/bridge/br_private_mrp.h | 3 +-- 3 files changed, 10 insertions(+), 9 deletions(-) diff --git a/net/bridge/br_mrp.c b/net/bridge/br_mrp.c index cec2c4e4561d..5aeae6ad17b3 100644 --- a/net/bridge/br_mrp.c +++ b/net/bridge/br_mrp.c @@ -557,19 +557,22 @@ int br_mrp_del(struct net_bridge *br, struct br_mrp_instance *instance) int br_mrp_set_port_state(struct net_bridge_port *p, enum br_mrp_port_state_type state) { + u32 port_state; + if (!p || !(p->flags & BR_MRP_AWARE)) return -EINVAL; spin_lock_bh(&p->br->lock); if (state == BR_MRP_PORT_STATE_FORWARDING) - p->state = BR_STATE_FORWARDING; + port_state = BR_STATE_FORWARDING; else - p->state = BR_STATE_BLOCKING; + port_state = BR_STATE_BLOCKING; + p->state = port_state; spin_unlock_bh(&p->br->lock); - br_mrp_port_switchdev_set_state(p, state); + br_mrp_port_switchdev_set_state(p, port_state); return 0; } diff --git a/net/bridge/br_mrp_switchdev.c b/net/bridge/br_mrp_switchdev.c index ed547e03ace1..75a7e8d0a268 100644 --- a/net/bridge/br_mrp_switchdev.c +++ b/net/bridge/br_mrp_switchdev.c @@ -169,13 +169,12 @@ int br_mrp_switchdev_send_in_test(struct net_bridge *br, struct br_mrp *mrp, return err; } -int br_mrp_port_switchdev_set_state(struct net_bridge_port *p, - enum br_mrp_port_state_type state) +int br_mrp_port_switchdev_set_state(struct net_bridge_port *p, u32 state) { struct switchdev_attr attr = { .orig_dev = p->dev, - .id = SWITCHDEV_ATTR_ID_MRP_PORT_STATE, - .u.mrp_port_state = state, + .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE, + .u.stp_state = state, }; int err; diff --git a/net/bridge/br_private_mrp.h b/net/bridge/br_private_mrp.h index 32a48e5418da..2514954c1431 100644 --- a/net/bridge/br_private_mrp.h +++ b/net/bridge/br_private_mrp.h @@ -72,8 +72,7 @@ int br_mrp_switchdev_set_ring_state(struct net_bridge *br, struct br_mrp *mrp, int br_mrp_switchdev_send_ring_test(struct net_bridge *br, struct br_mrp *mrp, u32 interval, u8 max_miss, u32 period, bool monitor); -int br_mrp_port_switchdev_set_state(struct net_bridge_port *p, - enum br_mrp_port_state_type state); +int br_mrp_port_switchdev_set_state(struct net_bridge_port *p, u32 state); int br_mrp_port_switchdev_set_role(struct net_bridge_port *p, enum br_mrp_port_role_type role); int br_mrp_switchdev_set_in_role(struct net_bridge *br, struct br_mrp *mrp, From 059d2a1004981dce19f0127dabc1b4ec927d202a Mon Sep 17 00:00:00 2001 From: Horatiu Vultur Date: Sat, 6 Feb 2021 22:47:34 +0100 Subject: [PATCH 262/284] switchdev: mrp: Remove SWITCHDEV_ATTR_ID_MRP_PORT_STAT Now that MRP started to use also SWITCHDEV_ATTR_ID_PORT_STP_STATE to notify HW, then SWITCHDEV_ATTR_ID_MRP_PORT_STAT is not used anywhere else, therefore we can remove it. Fixes: c284b545900830 ("switchdev: mrp: Extend switchdev API to offload MRP") Signed-off-by: Horatiu Vultur Signed-off-by: David S. Miller --- include/net/switchdev.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/net/switchdev.h b/include/net/switchdev.h index 99cd538d6519..afdf8bd1b4fe 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -42,7 +42,6 @@ enum switchdev_attr_id { SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED, SWITCHDEV_ATTR_ID_BRIDGE_MROUTER, #if IS_ENABLED(CONFIG_BRIDGE_MRP) - SWITCHDEV_ATTR_ID_MRP_PORT_STATE, SWITCHDEV_ATTR_ID_MRP_PORT_ROLE, #endif }; @@ -62,7 +61,6 @@ struct switchdev_attr { u16 vlan_protocol; /* BRIDGE_VLAN_PROTOCOL */ bool mc_disabled; /* MC_DISABLED */ #if IS_ENABLED(CONFIG_BRIDGE_MRP) - u8 mrp_port_state; /* MRP_PORT_STATE */ u8 mrp_port_role; /* MRP_PORT_ROLE */ #endif } u; From eb4733d7cffc547e08fe5a216e4f03663bb71108 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Mon, 8 Feb 2021 19:36:27 +0200 Subject: [PATCH 263/284] net: dsa: felix: implement port flushing on .phylink_mac_link_down There are several issues which may be seen when the link goes down while forwarding traffic, all of which can be attributed to the fact that the port flushing procedure from the reference manual was not closely followed. With flow control enabled on both the ingress port and the egress port, it may happen when a link goes down that Ethernet packets are in flight. In flow control mode, frames are held back and not dropped. When there is enough traffic in flight (example: iperf3 TCP), then the ingress port might enter congestion and never exit that state. This is a problem, because it is the egress port's link that went down, and that has caused the inability of the ingress port to send packets to any other port. This is solved by flushing the egress port's queues when it goes down. There is also a problem when performing stream splitting for IEEE 802.1CB traffic (not yet upstream, but a sort of multicast, basically). There, if one port from the destination ports mask goes down, splitting the stream towards the other destinations will no longer be performed. This can be traced down to this line: ocelot_port_writel(ocelot_port, 0, DEV_MAC_ENA_CFG); which should have been instead, as per the reference manual: ocelot_port_rmwl(ocelot_port, 0, DEV_MAC_ENA_CFG_RX_ENA, DEV_MAC_ENA_CFG); Basically only DEV_MAC_ENA_CFG_RX_ENA should be disabled, but not DEV_MAC_ENA_CFG_TX_ENA - I don't have further insight into why that is the case, but apparently multicasting to several ports will cause issues if at least one of them doesn't have DEV_MAC_ENA_CFG_TX_ENA set. I am not sure what the state of the Ocelot VSC7514 driver is, but probably not as bad as Felix/Seville, since VSC7514 uses phylib and has the following in ocelot_adjust_link: if (!phydev->link) return; therefore the port is not really put down when the link is lost, unlike the DSA drivers which use .phylink_mac_link_down for that. Nonetheless, I put ocelot_port_flush() in the common ocelot.c because it needs to access some registers from drivers/net/ethernet/mscc/ocelot_rew.h which are not exported in include/soc/mscc/ and a bugfix patch should probably not move headers around. Fixes: bdeced75b13f ("net: dsa: felix: Add PCS operations for PHYLINK") Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- drivers/net/dsa/ocelot/felix.c | 17 ++++++++- drivers/net/ethernet/mscc/ocelot.c | 54 +++++++++++++++++++++++++++ drivers/net/ethernet/mscc/ocelot_io.c | 8 ++++ include/soc/mscc/ocelot.h | 2 + 4 files changed, 80 insertions(+), 1 deletion(-) diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c index 7dc230677b78..45fdb1256dbf 100644 --- a/drivers/net/dsa/ocelot/felix.c +++ b/drivers/net/dsa/ocelot/felix.c @@ -233,9 +233,24 @@ static void felix_phylink_mac_link_down(struct dsa_switch *ds, int port, { struct ocelot *ocelot = ds->priv; struct ocelot_port *ocelot_port = ocelot->ports[port]; + int err; + + ocelot_port_rmwl(ocelot_port, 0, DEV_MAC_ENA_CFG_RX_ENA, + DEV_MAC_ENA_CFG); - ocelot_port_writel(ocelot_port, 0, DEV_MAC_ENA_CFG); ocelot_fields_write(ocelot, port, QSYS_SWITCH_PORT_MODE_PORT_ENA, 0); + + err = ocelot_port_flush(ocelot, port); + if (err) + dev_err(ocelot->dev, "failed to flush port %d: %d\n", + port, err); + + /* Put the port in reset. */ + ocelot_port_writel(ocelot_port, + DEV_CLOCK_CFG_MAC_TX_RST | + DEV_CLOCK_CFG_MAC_RX_RST | + DEV_CLOCK_CFG_LINK_SPEED(OCELOT_SPEED_1000), + DEV_CLOCK_CFG); } static void felix_phylink_mac_link_up(struct dsa_switch *ds, int port, diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index ff87a0bc089c..c072eb5c0764 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -375,6 +375,60 @@ static void ocelot_vlan_init(struct ocelot *ocelot) } } +static u32 ocelot_read_eq_avail(struct ocelot *ocelot, int port) +{ + return ocelot_read_rix(ocelot, QSYS_SW_STATUS, port); +} + +int ocelot_port_flush(struct ocelot *ocelot, int port) +{ + int err, val; + + /* Disable dequeuing from the egress queues */ + ocelot_rmw_rix(ocelot, QSYS_PORT_MODE_DEQUEUE_DIS, + QSYS_PORT_MODE_DEQUEUE_DIS, + QSYS_PORT_MODE, port); + + /* Disable flow control */ + ocelot_fields_write(ocelot, port, SYS_PAUSE_CFG_PAUSE_ENA, 0); + + /* Disable priority flow control */ + ocelot_fields_write(ocelot, port, + QSYS_SWITCH_PORT_MODE_TX_PFC_ENA, 0); + + /* Wait at least the time it takes to receive a frame of maximum length + * at the port. + * Worst-case delays for 10 kilobyte jumbo frames are: + * 8 ms on a 10M port + * 800 μs on a 100M port + * 80 μs on a 1G port + * 32 μs on a 2.5G port + */ + usleep_range(8000, 10000); + + /* Disable half duplex backpressure. */ + ocelot_rmw_rix(ocelot, 0, SYS_FRONT_PORT_MODE_HDX_MODE, + SYS_FRONT_PORT_MODE, port); + + /* Flush the queues associated with the port. */ + ocelot_rmw_gix(ocelot, REW_PORT_CFG_FLUSH_ENA, REW_PORT_CFG_FLUSH_ENA, + REW_PORT_CFG, port); + + /* Enable dequeuing from the egress queues. */ + ocelot_rmw_rix(ocelot, 0, QSYS_PORT_MODE_DEQUEUE_DIS, QSYS_PORT_MODE, + port); + + /* Wait until flushing is complete. */ + err = read_poll_timeout(ocelot_read_eq_avail, val, !val, + 100, 2000000, false, ocelot, port); + + /* Clear flushing again. */ + ocelot_rmw_gix(ocelot, 0, REW_PORT_CFG_FLUSH_ENA, REW_PORT_CFG, port); + + return err; +} +EXPORT_SYMBOL(ocelot_port_flush); + void ocelot_adjust_link(struct ocelot *ocelot, int port, struct phy_device *phydev) { diff --git a/drivers/net/ethernet/mscc/ocelot_io.c b/drivers/net/ethernet/mscc/ocelot_io.c index 0acb45948418..ea4e83410fe4 100644 --- a/drivers/net/ethernet/mscc/ocelot_io.c +++ b/drivers/net/ethernet/mscc/ocelot_io.c @@ -71,6 +71,14 @@ void ocelot_port_writel(struct ocelot_port *port, u32 val, u32 reg) } EXPORT_SYMBOL(ocelot_port_writel); +void ocelot_port_rmwl(struct ocelot_port *port, u32 val, u32 mask, u32 reg) +{ + u32 cur = ocelot_port_readl(port, reg); + + ocelot_port_writel(port, (cur & (~mask)) | val, reg); +} +EXPORT_SYMBOL(ocelot_port_rmwl); + u32 __ocelot_target_read_ix(struct ocelot *ocelot, enum ocelot_target target, u32 reg, u32 offset) { diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h index 2f4cd3288bcc..c34b9ccb6472 100644 --- a/include/soc/mscc/ocelot.h +++ b/include/soc/mscc/ocelot.h @@ -709,6 +709,7 @@ struct ocelot_policer { /* I/O */ u32 ocelot_port_readl(struct ocelot_port *port, u32 reg); void ocelot_port_writel(struct ocelot_port *port, u32 val, u32 reg); +void ocelot_port_rmwl(struct ocelot_port *port, u32 val, u32 mask, u32 reg); u32 __ocelot_read_ix(struct ocelot *ocelot, u32 reg, u32 offset); void __ocelot_write_ix(struct ocelot *ocelot, u32 val, u32 reg, u32 offset); void __ocelot_rmw_ix(struct ocelot *ocelot, u32 val, u32 mask, u32 reg, @@ -737,6 +738,7 @@ int ocelot_get_sset_count(struct ocelot *ocelot, int port, int sset); int ocelot_get_ts_info(struct ocelot *ocelot, int port, struct ethtool_ts_info *info); void ocelot_set_ageing_time(struct ocelot *ocelot, unsigned int msecs); +int ocelot_port_flush(struct ocelot *ocelot, int port); void ocelot_adjust_link(struct ocelot *ocelot, int port, struct phy_device *phydev); int ocelot_port_vlan_filtering(struct ocelot *ocelot, int port, bool enabled, From 67a69f84cab60484f02eb8cbc7a76edffbb28a25 Mon Sep 17 00:00:00 2001 From: Yufeng Mo Date: Tue, 9 Feb 2021 17:03:05 +0800 Subject: [PATCH 264/284] net: hns3: add a check for queue_id in hclge_reset_vf_queue() The queue_id is received from vf, if use it directly, an out-of-bound issue may be caused, so add a check for this queue_id before using it in hclge_reset_vf_queue(). Fixes: 1a426f8b40fc ("net: hns3: fix the VF queue reset flow error") Signed-off-by: Yufeng Mo Signed-off-by: Huazhong Tan Signed-off-by: David S. Miller --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index c242883fea5d..48549db23c52 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -9813,12 +9813,19 @@ int hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id) void hclge_reset_vf_queue(struct hclge_vport *vport, u16 queue_id) { + struct hnae3_handle *handle = &vport->nic; struct hclge_dev *hdev = vport->back; int reset_try_times = 0; int reset_status; u16 queue_gid; int ret; + if (queue_id >= handle->kinfo.num_tqps) { + dev_warn(&hdev->pdev->dev, "Invalid vf queue id(%u)\n", + queue_id); + return; + } + queue_gid = hclge_covert_handle_qid_global(&vport->nic, queue_id); ret = hclge_send_reset_tqp_cmd(hdev, queue_gid, true); From 326334aad024a60f46dc5e7dbe1efe32da3ca66f Mon Sep 17 00:00:00 2001 From: Yufeng Mo Date: Tue, 9 Feb 2021 17:03:06 +0800 Subject: [PATCH 265/284] net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx() The tqp_index is received from vf, if use it directly, an out-of-bound issue may be caused, so add a check for this tqp_index before using it in hclge_get_ring_chain_from_mbx(). Fixes: 84e095d64ed9 ("net: hns3: Change PF to add ring-vect binding & resetQ to mailbox") Signed-off-by: Yufeng Mo Signed-off-by: Huazhong Tan Signed-off-by: David S. Miller --- .../ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c index 754c09ada901..ea2dea990283 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c @@ -158,21 +158,31 @@ static int hclge_get_ring_chain_from_mbx( struct hclge_vport *vport) { struct hnae3_ring_chain_node *cur_chain, *new_chain; + struct hclge_dev *hdev = vport->back; int ring_num; - int i = 0; + int i; ring_num = req->msg.ring_num; if (ring_num > HCLGE_MBX_MAX_RING_CHAIN_PARAM_NUM) return -ENOMEM; + for (i = 0; i < ring_num; i++) { + if (req->msg.param[i].tqp_index >= vport->nic.kinfo.rss_size) { + dev_err(&hdev->pdev->dev, "tqp index(%u) is out of range(0-%u)\n", + req->msg.param[i].tqp_index, + vport->nic.kinfo.rss_size - 1); + return -EINVAL; + } + } + hnae3_set_bit(ring_chain->flag, HNAE3_RING_TYPE_B, - req->msg.param[i].ring_type); + req->msg.param[0].ring_type); ring_chain->tqp_index = hclge_get_queue_id(vport->nic.kinfo.tqp - [req->msg.param[i].tqp_index]); + [req->msg.param[0].tqp_index]); hnae3_set_field(ring_chain->int_gl_idx, HNAE3_RING_GL_IDX_M, - HNAE3_RING_GL_IDX_S, req->msg.param[i].int_gl_index); + HNAE3_RING_GL_IDX_S, req->msg.param[0].int_gl_index); cur_chain = ring_chain; From 532cfc0df1e4d68e74522ef4a0dcbf6ebbe68287 Mon Sep 17 00:00:00 2001 From: Yufeng Mo Date: Tue, 9 Feb 2021 17:03:07 +0800 Subject: [PATCH 266/284] net: hns3: add a check for index in hclge_get_rss_key() The index is received from vf, if use it directly, an out-of-bound issue may be caused, so add a check for this index before using it in hclge_get_rss_key(). Fixes: a638b1d8cc87 ("net: hns3: fix get VF RSS issue") Signed-off-by: Yufeng Mo Signed-off-by: Huazhong Tan Signed-off-by: David S. Miller --- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c index ea2dea990283..ffb416e088a9 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c @@ -607,6 +607,17 @@ static void hclge_get_rss_key(struct hclge_vport *vport, index = mbx_req->msg.data[0]; + /* Check the query index of rss_hash_key from VF, make sure no + * more than the size of rss_hash_key. + */ + if (((index + 1) * HCLGE_RSS_MBX_RESP_LEN) > + sizeof(vport[0].rss_hash_key)) { + dev_warn(&hdev->pdev->dev, + "failed to get the rss hash key, the index(%u) invalid !\n", + index); + return; + } + memcpy(resp_msg->data, &hdev->vport[0].rss_hash_key[index * HCLGE_RSS_MBX_RESP_LEN], HCLGE_RSS_MBX_RESP_LEN); From 1c5fae9c9a092574398a17facc31c533791ef232 Mon Sep 17 00:00:00 2001 From: Stefano Garzarella Date: Tue, 9 Feb 2021 09:52:19 +0100 Subject: [PATCH 267/284] vsock: fix locking in vsock_shutdown() In vsock_shutdown() we touched some socket fields without holding the socket lock, such as 'state' and 'sk_flags'. Also, after the introduction of multi-transport, we are accessing 'vsk->transport' in vsock_send_shutdown() without holding the lock and this call can be made while the connection is in progress, so the transport can change in the meantime. To avoid issues, we hold the socket lock when we enter in vsock_shutdown() and release it when we leave. Among the transports that implement the 'shutdown' callback, only hyperv_transport acquired the lock. Since the caller now holds it, we no longer take it. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella Signed-off-by: David S. Miller --- net/vmw_vsock/af_vsock.c | 8 +++++--- net/vmw_vsock/hyperv_transport.c | 4 ---- 2 files changed, 5 insertions(+), 7 deletions(-) diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 4ea301fc2bf0..5546710d8ac1 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -943,10 +943,12 @@ static int vsock_shutdown(struct socket *sock, int mode) */ sk = sock->sk; + + lock_sock(sk); if (sock->state == SS_UNCONNECTED) { err = -ENOTCONN; if (sk->sk_type == SOCK_STREAM) - return err; + goto out; } else { sock->state = SS_DISCONNECTING; err = 0; @@ -955,10 +957,8 @@ static int vsock_shutdown(struct socket *sock, int mode) /* Receive and send shutdowns are treated alike. */ mode = mode & (RCV_SHUTDOWN | SEND_SHUTDOWN); if (mode) { - lock_sock(sk); sk->sk_shutdown |= mode; sk->sk_state_change(sk); - release_sock(sk); if (sk->sk_type == SOCK_STREAM) { sock_reset_flag(sk, SOCK_DONE); @@ -966,6 +966,8 @@ static int vsock_shutdown(struct socket *sock, int mode) } } +out: + release_sock(sk); return err; } diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index 630b851f8150..cc3bae2659e7 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -474,14 +474,10 @@ static void hvs_shutdown_lock_held(struct hvsock *hvs, int mode) static int hvs_shutdown(struct vsock_sock *vsk, int mode) { - struct sock *sk = sk_vsock(vsk); - if (!(mode & SEND_SHUTDOWN)) return 0; - lock_sock(sk); hvs_shutdown_lock_held(vsk->trans, mode); - release_sock(sk); return 0; } From ee114dd64c0071500345439fc79dd5e0f9d106ed Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 5 Feb 2021 17:20:14 +0100 Subject: [PATCH 268/284] bpf: Fix verifier jsgt branch analysis on max bound Fix incorrect is_branch{32,64}_taken() analysis for the jsgt case. The return code for both will tell the caller whether a given conditional jump is taken or not, e.g. 1 means branch will be taken [for the involved registers] and the goto target will be executed, 0 means branch will not be taken and instead we fall-through to the next insn, and last but not least a -1 denotes that it is not known at verification time whether a branch will be taken or not. Now while the jsgt has the branch-taken case correct with reg->s32_min_value > sval, the branch-not-taken case is off-by-one when testing for reg->s32_max_value < sval since the branch will also be taken for reg->s32_max_value == sval. The jgt branch analysis, for example, gets this right. Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking") Fixes: 4f7b3e82589e ("bpf: improve verifier branch analysis") Signed-off-by: Daniel Borkmann Reviewed-by: John Fastabend Acked-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index e7368c5eacb7..67ea0ac3333c 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -6877,7 +6877,7 @@ static int is_branch32_taken(struct bpf_reg_state *reg, u32 val, u8 opcode) case BPF_JSGT: if (reg->s32_min_value > sval) return 1; - else if (reg->s32_max_value < sval) + else if (reg->s32_max_value <= sval) return 0; break; case BPF_JLT: @@ -6950,7 +6950,7 @@ static int is_branch64_taken(struct bpf_reg_state *reg, u64 val, u8 opcode) case BPF_JSGT: if (reg->smin_value > sval) return 1; - else if (reg->smax_value < sval) + else if (reg->smax_value <= sval) return 0; break; case BPF_JLT: From fd675184fc7abfd1e1c52d23e8e900676b5a1c1a Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 5 Feb 2021 20:48:21 +0100 Subject: [PATCH 269/284] bpf: Fix verifier jmp32 pruning decision logic Anatoly has been fuzzing with kBdysch harness and reported a hang in one of the outcomes: func#0 @0 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = 808464450 1: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b4) w4 = 808464432 2: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP808464432 R10=fp0 2: (9c) w4 %= w0 3: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 3: (66) if w4 s> 0x30303030 goto pc+0 R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 4: (7f) r0 >>= r0 5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 5: (9c) w4 %= w0 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 9: (95) exit propagating r0 from 6 to 7: safe 4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0 4: (7f) r0 >>= r0 5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0 5: (9c) w4 %= w0 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 propagating r0 7: safe propagating r0 from 6 to 7: safe processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 The underlying program was xlated as follows: # bpftool p d x i 10 0: (b7) r0 = 808464450 1: (b4) w4 = 808464432 2: (bc) w0 = w0 3: (15) if r0 == 0x0 goto pc+1 4: (9c) w4 %= w0 5: (66) if w4 s> 0x30303030 goto pc+0 6: (7f) r0 >>= r0 7: (bc) w0 = w0 8: (15) if r0 == 0x0 goto pc+1 9: (9c) w4 %= w0 10: (66) if w0 s> 0x3030 goto pc+0 11: (d6) if w0 s<= 0x303030 goto pc+1 12: (05) goto pc-1 13: (95) exit The verifier rewrote original instructions it recognized as dead code with 'goto pc-1', but reality differs from verifier simulation in that we are actually able to trigger a hang due to hitting the 'goto pc-1' instructions. Taking a closer look at the verifier analysis, the reason is that it misjudges its pruning decision at the first 'from 6 to 7: safe' occasion. What happens is that while both old/cur registers are marked as precise, they get misjudged for the jmp32 case as range_within() yields true, meaning that the prior verification path with a wider register bound could be verified successfully and therefore the current path with a narrower register bound is deemed safe as well whereas in reality it's not. R0 old/cur path's bounds compare as follows: old: smin_value=0x8000000000000000,smax_value=0x7fffffffffffffff,umin_value=0x0,umax_value=0xffffffffffffffff,var_off=(0x0; 0xffffffffffffffff) cur: smin_value=0x8000000000000000,smax_value=0x7fffffff7fffffff,umin_value=0x0,umax_value=0xffffffff7fffffff,var_off=(0x0; 0xffffffff7fffffff) old: s32_min_value=0x80000000,s32_max_value=0x00003030,u32_min_value=0x00000000,u32_max_value=0xffffffff cur: s32_min_value=0x00003031,s32_max_value=0x7fffffff,u32_min_value=0x00003031,u32_max_value=0x7fffffff The 64 bit bounds generally look okay and while the information that got propagated from 32 to 64 bit looks correct as well, it's not precise enough for judging a conditional jmp32. Given the latter only operates on subregisters we also need to take these into account as well for a range_within() probe in order to be able to prune paths. Extending the range_within() constraint to both bounds will be able to tell us that the old signed 32 bit bounds are not wider than the cur signed 32 bit bounds. With the fix in place, the program will now verify the 'goto' branch case as it should have been: [...] 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 9: (95) exit 7: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=12337,u32_min_value=12337,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 8: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 8: (30) r0 = *(u8 *)skb[808464432] BPF_LD_[ABS|IND] uses reserved fields processed 11 insns (limit 1000000) max_states_per_insn 1 total_states 1 peak_states 1 mark_read 1 The bug is quite subtle in the sense that when verifier would determine that a given branch is dead code, it would (here: wrongly) remove these instructions from the program and hard-wire the taken branch for privileged programs instead of the 'goto pc-1' rewrites which will cause hard to debug problems. Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking") Reported-by: Anatoly Trosinenko Signed-off-by: Daniel Borkmann Reviewed-by: John Fastabend Acked-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 67ea0ac3333c..24c0e9224f3c 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -8590,7 +8590,11 @@ static bool range_within(struct bpf_reg_state *old, return old->umin_value <= cur->umin_value && old->umax_value >= cur->umax_value && old->smin_value <= cur->smin_value && - old->smax_value >= cur->smax_value; + old->smax_value >= cur->smax_value && + old->u32_min_value <= cur->u32_min_value && + old->u32_max_value >= cur->u32_max_value && + old->s32_min_value <= cur->s32_min_value && + old->s32_max_value >= cur->s32_max_value; } /* Maximum number of register states that can exist at once */ From e88b2c6e5a4d9ce30d75391e4d950da74bb2bd90 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Tue, 9 Feb 2021 18:46:10 +0000 Subject: [PATCH 270/284] bpf: Fix 32 bit src register truncation on div/mod While reviewing a different fix, John and I noticed an oddity in one of the BPF program dumps that stood out, for example: # bpftool p d x i 13 0: (b7) r0 = 808464450 1: (b4) w4 = 808464432 2: (bc) w0 = w0 3: (15) if r0 == 0x0 goto pc+1 4: (9c) w4 %= w0 [...] In line 2 we noticed that the mov32 would 32 bit truncate the original src register for the div/mod operation. While for the two operations the dst register is typically marked unknown e.g. from adjust_scalar_min_max_vals() the src register is not, and thus verifier keeps tracking original bounds, simplified: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = -1 1: R0_w=invP-1 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b7) r1 = -1 2: R0_w=invP-1 R1_w=invP-1 R10=fp0 2: (3c) w0 /= w1 3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP-1 R10=fp0 3: (77) r1 >>= 32 4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP4294967295 R10=fp0 4: (bf) r0 = r1 5: R0_w=invP4294967295 R1_w=invP4294967295 R10=fp0 5: (95) exit processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 Runtime result of r0 at exit is 0 instead of expected -1. Remove the verifier mov32 src rewrite in div/mod and replace it with a jmp32 test instead. After the fix, we result in the following code generation when having dividend r1 and divisor r6: div, 64 bit: div, 32 bit: 0: (b7) r6 = 8 0: (b7) r6 = 8 1: (b7) r1 = 8 1: (b7) r1 = 8 2: (55) if r6 != 0x0 goto pc+2 2: (56) if w6 != 0x0 goto pc+2 3: (ac) w1 ^= w1 3: (ac) w1 ^= w1 4: (05) goto pc+1 4: (05) goto pc+1 5: (3f) r1 /= r6 5: (3c) w1 /= w6 6: (b7) r0 = 0 6: (b7) r0 = 0 7: (95) exit 7: (95) exit mod, 64 bit: mod, 32 bit: 0: (b7) r6 = 8 0: (b7) r6 = 8 1: (b7) r1 = 8 1: (b7) r1 = 8 2: (15) if r6 == 0x0 goto pc+1 2: (16) if w6 == 0x0 goto pc+1 3: (9f) r1 %= r6 3: (9c) w1 %= w6 4: (b7) r0 = 0 4: (b7) r0 = 0 5: (95) exit 5: (95) exit x86 in particular can throw a 'divide error' exception for div instruction not only for divisor being zero, but also for the case when the quotient is too large for the designated register. For the edx:eax and rdx:rax dividend pair it is not an issue in x86 BPF JIT since we always zero edx (rdx). Hence really the only protection needed is against divisor being zero. Fixes: 68fda450a7df ("bpf: fix 32-bit divide by zero") Co-developed-by: John Fastabend Signed-off-by: John Fastabend Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 24c0e9224f3c..37581919e050 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -11003,30 +11003,28 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env) insn->code == (BPF_ALU | BPF_MOD | BPF_X) || insn->code == (BPF_ALU | BPF_DIV | BPF_X)) { bool is64 = BPF_CLASS(insn->code) == BPF_ALU64; - struct bpf_insn mask_and_div[] = { - BPF_MOV32_REG(insn->src_reg, insn->src_reg), + bool isdiv = BPF_OP(insn->code) == BPF_DIV; + struct bpf_insn *patchlet; + struct bpf_insn chk_and_div[] = { /* Rx div 0 -> 0 */ - BPF_JMP_IMM(BPF_JNE, insn->src_reg, 0, 2), + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) | + BPF_JNE | BPF_K, insn->src_reg, + 0, 2, 0), BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg), BPF_JMP_IMM(BPF_JA, 0, 0, 1), *insn, }; - struct bpf_insn mask_and_mod[] = { - BPF_MOV32_REG(insn->src_reg, insn->src_reg), + struct bpf_insn chk_and_mod[] = { /* Rx mod 0 -> Rx */ - BPF_JMP_IMM(BPF_JEQ, insn->src_reg, 0, 1), + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) | + BPF_JEQ | BPF_K, insn->src_reg, + 0, 1, 0), *insn, }; - struct bpf_insn *patchlet; - if (insn->code == (BPF_ALU64 | BPF_DIV | BPF_X) || - insn->code == (BPF_ALU | BPF_DIV | BPF_X)) { - patchlet = mask_and_div + (is64 ? 1 : 0); - cnt = ARRAY_SIZE(mask_and_div) - (is64 ? 1 : 0); - } else { - patchlet = mask_and_mod + (is64 ? 1 : 0); - cnt = ARRAY_SIZE(mask_and_mod) - (is64 ? 1 : 0); - } + patchlet = isdiv ? chk_and_div : chk_and_mod; + cnt = isdiv ? ARRAY_SIZE(chk_and_div) : + ARRAY_SIZE(chk_and_mod); new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt); if (!new_prog) From e812cbbbbbb15adbbbee176baa1e8bda53059bf0 Mon Sep 17 00:00:00 2001 From: Phillip Lougher Date: Tue, 9 Feb 2021 13:41:50 -0800 Subject: [PATCH 271/284] squashfs: avoid out of bounds writes in decompressors Patch series "Squashfs: fix BIO migration regression and add sanity checks". Patch [1/4] fixes a regression introduced by the "migrate from ll_rw_block usage to BIO" patch, which has produced a number of Sysbot/Syzkaller reports. Patches [2/4], [3/4], and [4/4] fix a number of filesystem corruption issues which have produced Sysbot reports in the id, inode and xattr lookup code. Each patch has been tested against the Sysbot reproducers using the given kernel configuration. They have the appropriate "Reported-by:" lines added. Additionally, all of the reproducer filesystems are indirectly fixed by patch [4/4] due to the fact they all have xattr corruption which is now detected there. Additional testing with other configurations and architectures (32bit, big endian), and normal filesystems has also been done to trap any inadvertent regressions caused by the additional sanity checks. This patch (of 4): This is a regression introduced by the patch "migrate from ll_rw_block usage to BIO". Sysbot/Syskaller has reported a number of "out of bounds writes" and "unable to handle kernel paging request in squashfs_decompress" errors which have been identified as a regression introduced by the above patch. Specifically, the patch removed the following sanity check if (length < 0 || length > output->length || (index + length) > msblk->bytes_used) This check did two things: 1. It ensured any reads were not beyond the end of the filesystem 2. It ensured that the "length" field read from the filesystem was within the expected maximum length. Without this any corrupted values can over-run allocated buffers. Link: https://lkml.kernel.org/r/20210204130249.4495-1-phillip@squashfs.org.uk Link: https://lkml.kernel.org/r/20210204130249.4495-2-phillip@squashfs.org.uk Fixes: 93e72b3c612adc ("squashfs: migrate from ll_rw_block usage to BIO") Reported-by: syzbot+6fba78f99b9afd4b5634@syzkaller.appspotmail.com Signed-off-by: Phillip Lougher Cc: Philippe Liard Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/squashfs/block.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/squashfs/block.c b/fs/squashfs/block.c index 8a19773b5a0b..45f44425d856 100644 --- a/fs/squashfs/block.c +++ b/fs/squashfs/block.c @@ -196,9 +196,15 @@ int squashfs_read_data(struct super_block *sb, u64 index, int length, length = SQUASHFS_COMPRESSED_SIZE(length); index += 2; - TRACE("Block @ 0x%llx, %scompressed size %d\n", index, + TRACE("Block @ 0x%llx, %scompressed size %d\n", index - 2, compressed ? "" : "un", length); } + if (length < 0 || length > output->length || + (index + length) > msblk->bytes_used) { + res = -EIO; + goto out; + } + if (next_index) *next_index = index + length; From f37aa4c7366e23f91b81d00bafd6a7ab54e4a381 Mon Sep 17 00:00:00 2001 From: Phillip Lougher Date: Tue, 9 Feb 2021 13:41:53 -0800 Subject: [PATCH 272/284] squashfs: add more sanity checks in id lookup Sysbot has reported a number of "slab-out-of-bounds reads" and "use-after-free read" errors which has been identified as being caused by a corrupted index value read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This patch adds additional sanity checks to detect this, and the following corruption. 1. It checks against corruption of the ids count. This can either lead to a larger table to be read, or a smaller than expected table to be read. In the case of a too large ids count, this would often have been trapped by the existing sanity checks, but this patch introduces a more exact check, which can identify too small values. 2. It checks the contents of the index table for corruption. Link: https://lkml.kernel.org/r/20210204130249.4495-3-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher Reported-by: syzbot+b06d57ba83f604522af2@syzkaller.appspotmail.com Reported-by: syzbot+c021ba012da41ee9807c@syzkaller.appspotmail.com Reported-by: syzbot+5024636e8b5fd19f0f19@syzkaller.appspotmail.com Reported-by: syzbot+bcbc661df46657d0fa4f@syzkaller.appspotmail.com Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/squashfs/id.c | 40 ++++++++++++++++++++++++++++-------- fs/squashfs/squashfs_fs_sb.h | 1 + fs/squashfs/super.c | 6 +++--- fs/squashfs/xattr.h | 10 ++++++++- 4 files changed, 45 insertions(+), 12 deletions(-) diff --git a/fs/squashfs/id.c b/fs/squashfs/id.c index 6be5afe7287d..11581bf31af4 100644 --- a/fs/squashfs/id.c +++ b/fs/squashfs/id.c @@ -35,10 +35,15 @@ int squashfs_get_id(struct super_block *sb, unsigned int index, struct squashfs_sb_info *msblk = sb->s_fs_info; int block = SQUASHFS_ID_BLOCK(index); int offset = SQUASHFS_ID_BLOCK_OFFSET(index); - u64 start_block = le64_to_cpu(msblk->id_table[block]); + u64 start_block; __le32 disk_id; int err; + if (index >= msblk->ids) + return -EINVAL; + + start_block = le64_to_cpu(msblk->id_table[block]); + err = squashfs_read_metadata(sb, &disk_id, &start_block, &offset, sizeof(disk_id)); if (err < 0) @@ -56,7 +61,10 @@ __le64 *squashfs_read_id_index_table(struct super_block *sb, u64 id_table_start, u64 next_table, unsigned short no_ids) { unsigned int length = SQUASHFS_ID_BLOCK_BYTES(no_ids); + unsigned int indexes = SQUASHFS_ID_BLOCKS(no_ids); + int n; __le64 *table; + u64 start, end; TRACE("In read_id_index_table, length %d\n", length); @@ -67,20 +75,36 @@ __le64 *squashfs_read_id_index_table(struct super_block *sb, return ERR_PTR(-EINVAL); /* - * length bytes should not extend into the next table - this check - * also traps instances where id_table_start is incorrectly larger - * than the next table start + * The computed size of the index table (length bytes) should exactly + * match the table start and end points */ - if (id_table_start + length > next_table) + if (length != (next_table - id_table_start)) return ERR_PTR(-EINVAL); table = squashfs_read_table(sb, id_table_start, length); + if (IS_ERR(table)) + return table; /* - * table[0] points to the first id lookup table metadata block, this - * should be less than id_table_start + * table[0], table[1], ... table[indexes - 1] store the locations + * of the compressed id blocks. Each entry should be less than + * the next (i.e. table[0] < table[1]), and the difference between them + * should be SQUASHFS_METADATA_SIZE or less. table[indexes - 1] + * should be less than id_table_start, and again the difference + * should be SQUASHFS_METADATA_SIZE or less */ - if (!IS_ERR(table) && le64_to_cpu(table[0]) >= id_table_start) { + for (n = 0; n < (indexes - 1); n++) { + start = le64_to_cpu(table[n]); + end = le64_to_cpu(table[n + 1]); + + if (start >= end || (end - start) > SQUASHFS_METADATA_SIZE) { + kfree(table); + return ERR_PTR(-EINVAL); + } + } + + start = le64_to_cpu(table[indexes - 1]); + if (start >= id_table_start || (id_table_start - start) > SQUASHFS_METADATA_SIZE) { kfree(table); return ERR_PTR(-EINVAL); } diff --git a/fs/squashfs/squashfs_fs_sb.h b/fs/squashfs/squashfs_fs_sb.h index 34c21ffb6df3..166e98806265 100644 --- a/fs/squashfs/squashfs_fs_sb.h +++ b/fs/squashfs/squashfs_fs_sb.h @@ -64,5 +64,6 @@ struct squashfs_sb_info { unsigned int inodes; unsigned int fragments; int xattr_ids; + unsigned int ids; }; #endif diff --git a/fs/squashfs/super.c b/fs/squashfs/super.c index d6c6593ec169..88cc94be1076 100644 --- a/fs/squashfs/super.c +++ b/fs/squashfs/super.c @@ -166,6 +166,7 @@ static int squashfs_fill_super(struct super_block *sb, struct fs_context *fc) msblk->directory_table = le64_to_cpu(sblk->directory_table_start); msblk->inodes = le32_to_cpu(sblk->inodes); msblk->fragments = le32_to_cpu(sblk->fragments); + msblk->ids = le16_to_cpu(sblk->no_ids); flags = le16_to_cpu(sblk->flags); TRACE("Found valid superblock on %pg\n", sb->s_bdev); @@ -177,7 +178,7 @@ static int squashfs_fill_super(struct super_block *sb, struct fs_context *fc) TRACE("Block size %d\n", msblk->block_size); TRACE("Number of inodes %d\n", msblk->inodes); TRACE("Number of fragments %d\n", msblk->fragments); - TRACE("Number of ids %d\n", le16_to_cpu(sblk->no_ids)); + TRACE("Number of ids %d\n", msblk->ids); TRACE("sblk->inode_table_start %llx\n", msblk->inode_table); TRACE("sblk->directory_table_start %llx\n", msblk->directory_table); TRACE("sblk->fragment_table_start %llx\n", @@ -236,8 +237,7 @@ static int squashfs_fill_super(struct super_block *sb, struct fs_context *fc) allocate_id_index_table: /* Allocate and read id index table */ msblk->id_table = squashfs_read_id_index_table(sb, - le64_to_cpu(sblk->id_table_start), next_table, - le16_to_cpu(sblk->no_ids)); + le64_to_cpu(sblk->id_table_start), next_table, msblk->ids); if (IS_ERR(msblk->id_table)) { errorf(fc, "unable to read id index table"); err = PTR_ERR(msblk->id_table); diff --git a/fs/squashfs/xattr.h b/fs/squashfs/xattr.h index 184129afd456..d8a270d3ac4c 100644 --- a/fs/squashfs/xattr.h +++ b/fs/squashfs/xattr.h @@ -17,8 +17,16 @@ extern int squashfs_xattr_lookup(struct super_block *, unsigned int, int *, static inline __le64 *squashfs_read_xattr_id_table(struct super_block *sb, u64 start, u64 *xattr_table_start, int *xattr_ids) { + struct squashfs_xattr_id_table *id_table; + + id_table = squashfs_read_table(sb, start, sizeof(*id_table)); + if (IS_ERR(id_table)) + return (__le64 *) id_table; + + *xattr_table_start = le64_to_cpu(id_table->xattr_table_start); + kfree(id_table); + ERROR("Xattrs in filesystem, these will be ignored\n"); - *xattr_table_start = start; return ERR_PTR(-ENOTSUPP); } From eabac19e40c095543def79cb6ffeb3a8588aaff4 Mon Sep 17 00:00:00 2001 From: Phillip Lougher Date: Tue, 9 Feb 2021 13:41:56 -0800 Subject: [PATCH 273/284] squashfs: add more sanity checks in inode lookup Sysbot has reported an "slab-out-of-bounds read" error which has been identified as being caused by a corrupted "ino_num" value read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This patch adds additional sanity checks to detect this, and the following corruption. 1. It checks against corruption of the inodes count. This can either lead to a larger table to be read, or a smaller than expected table to be read. In the case of a too large inodes count, this would often have been trapped by the existing sanity checks, but this patch introduces a more exact check, which can identify too small values. 2. It checks the contents of the index table for corruption. [phillip@squashfs.org.uk: fix checkpatch issue] Link: https://lkml.kernel.org/r/527909353.754618.1612769948607@webmail.123-reg.co.uk Link: https://lkml.kernel.org/r/20210204130249.4495-4-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher Reported-by: syzbot+04419e3ff19d2970ea28@syzkaller.appspotmail.com Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/squashfs/export.c | 41 +++++++++++++++++++++++++++++++++-------- 1 file changed, 33 insertions(+), 8 deletions(-) diff --git a/fs/squashfs/export.c b/fs/squashfs/export.c index ae2c87bb0fbe..eb02072d28dd 100644 --- a/fs/squashfs/export.c +++ b/fs/squashfs/export.c @@ -41,12 +41,17 @@ static long long squashfs_inode_lookup(struct super_block *sb, int ino_num) struct squashfs_sb_info *msblk = sb->s_fs_info; int blk = SQUASHFS_LOOKUP_BLOCK(ino_num - 1); int offset = SQUASHFS_LOOKUP_BLOCK_OFFSET(ino_num - 1); - u64 start = le64_to_cpu(msblk->inode_lookup_table[blk]); + u64 start; __le64 ino; int err; TRACE("Entered squashfs_inode_lookup, inode_number = %d\n", ino_num); + if (ino_num == 0 || (ino_num - 1) >= msblk->inodes) + return -EINVAL; + + start = le64_to_cpu(msblk->inode_lookup_table[blk]); + err = squashfs_read_metadata(sb, &ino, &start, &offset, sizeof(ino)); if (err < 0) return err; @@ -111,7 +116,10 @@ __le64 *squashfs_read_inode_lookup_table(struct super_block *sb, u64 lookup_table_start, u64 next_table, unsigned int inodes) { unsigned int length = SQUASHFS_LOOKUP_BLOCK_BYTES(inodes); + unsigned int indexes = SQUASHFS_LOOKUP_BLOCKS(inodes); + int n; __le64 *table; + u64 start, end; TRACE("In read_inode_lookup_table, length %d\n", length); @@ -121,20 +129,37 @@ __le64 *squashfs_read_inode_lookup_table(struct super_block *sb, if (inodes == 0) return ERR_PTR(-EINVAL); - /* length bytes should not extend into the next table - this check - * also traps instances where lookup_table_start is incorrectly larger - * than the next table start + /* + * The computed size of the lookup table (length bytes) should exactly + * match the table start and end points */ - if (lookup_table_start + length > next_table) + if (length != (next_table - lookup_table_start)) return ERR_PTR(-EINVAL); table = squashfs_read_table(sb, lookup_table_start, length); + if (IS_ERR(table)) + return table; /* - * table[0] points to the first inode lookup table metadata block, - * this should be less than lookup_table_start + * table0], table[1], ... table[indexes - 1] store the locations + * of the compressed inode lookup blocks. Each entry should be + * less than the next (i.e. table[0] < table[1]), and the difference + * between them should be SQUASHFS_METADATA_SIZE or less. + * table[indexes - 1] should be less than lookup_table_start, and + * again the difference should be SQUASHFS_METADATA_SIZE or less */ - if (!IS_ERR(table) && le64_to_cpu(table[0]) >= lookup_table_start) { + for (n = 0; n < (indexes - 1); n++) { + start = le64_to_cpu(table[n]); + end = le64_to_cpu(table[n + 1]); + + if (start >= end || (end - start) > SQUASHFS_METADATA_SIZE) { + kfree(table); + return ERR_PTR(-EINVAL); + } + } + + start = le64_to_cpu(table[indexes - 1]); + if (start >= lookup_table_start || (lookup_table_start - start) > SQUASHFS_METADATA_SIZE) { kfree(table); return ERR_PTR(-EINVAL); } From 506220d2ba21791314af569211ffd8870b8208fa Mon Sep 17 00:00:00 2001 From: Phillip Lougher Date: Tue, 9 Feb 2021 13:42:00 -0800 Subject: [PATCH 274/284] squashfs: add more sanity checks in xattr id lookup Sysbot has reported a warning where a kmalloc() attempt exceeds the maximum limit. This has been identified as corruption of the xattr_ids count when reading the xattr id lookup table. This patch adds a number of additional sanity checks to detect this corruption and others. 1. It checks for a corrupted xattr index read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This would cause an out of bounds read. 2. It checks against corruption of the xattr_ids count. This can either lead to the above kmalloc failure, or a smaller than expected table to be read. 3. It checks the contents of the index table for corruption. [phillip@squashfs.org.uk: fix checkpatch issue] Link: https://lkml.kernel.org/r/270245655.754655.1612770082682@webmail.123-reg.co.uk Link: https://lkml.kernel.org/r/20210204130249.4495-5-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher Reported-by: syzbot+2ccea6339d368360800d@syzkaller.appspotmail.com Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/squashfs/xattr_id.c | 66 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 57 insertions(+), 9 deletions(-) diff --git a/fs/squashfs/xattr_id.c b/fs/squashfs/xattr_id.c index d99e08464554..ead66670b41a 100644 --- a/fs/squashfs/xattr_id.c +++ b/fs/squashfs/xattr_id.c @@ -31,10 +31,15 @@ int squashfs_xattr_lookup(struct super_block *sb, unsigned int index, struct squashfs_sb_info *msblk = sb->s_fs_info; int block = SQUASHFS_XATTR_BLOCK(index); int offset = SQUASHFS_XATTR_BLOCK_OFFSET(index); - u64 start_block = le64_to_cpu(msblk->xattr_id_table[block]); + u64 start_block; struct squashfs_xattr_id id; int err; + if (index >= msblk->xattr_ids) + return -EINVAL; + + start_block = le64_to_cpu(msblk->xattr_id_table[block]); + err = squashfs_read_metadata(sb, &id, &start_block, &offset, sizeof(id)); if (err < 0) @@ -50,13 +55,17 @@ int squashfs_xattr_lookup(struct super_block *sb, unsigned int index, /* * Read uncompressed xattr id lookup table indexes from disk into memory */ -__le64 *squashfs_read_xattr_id_table(struct super_block *sb, u64 start, +__le64 *squashfs_read_xattr_id_table(struct super_block *sb, u64 table_start, u64 *xattr_table_start, int *xattr_ids) { - unsigned int len; + struct squashfs_sb_info *msblk = sb->s_fs_info; + unsigned int len, indexes; struct squashfs_xattr_id_table *id_table; + __le64 *table; + u64 start, end; + int n; - id_table = squashfs_read_table(sb, start, sizeof(*id_table)); + id_table = squashfs_read_table(sb, table_start, sizeof(*id_table)); if (IS_ERR(id_table)) return (__le64 *) id_table; @@ -70,13 +79,52 @@ __le64 *squashfs_read_xattr_id_table(struct super_block *sb, u64 start, if (*xattr_ids == 0) return ERR_PTR(-EINVAL); - /* xattr_table should be less than start */ - if (*xattr_table_start >= start) + len = SQUASHFS_XATTR_BLOCK_BYTES(*xattr_ids); + indexes = SQUASHFS_XATTR_BLOCKS(*xattr_ids); + + /* + * The computed size of the index table (len bytes) should exactly + * match the table start and end points + */ + start = table_start + sizeof(*id_table); + end = msblk->bytes_used; + + if (len != (end - start)) return ERR_PTR(-EINVAL); - len = SQUASHFS_XATTR_BLOCK_BYTES(*xattr_ids); + table = squashfs_read_table(sb, start, len); + if (IS_ERR(table)) + return table; - TRACE("In read_xattr_index_table, length %d\n", len); + /* table[0], table[1], ... table[indexes - 1] store the locations + * of the compressed xattr id blocks. Each entry should be less than + * the next (i.e. table[0] < table[1]), and the difference between them + * should be SQUASHFS_METADATA_SIZE or less. table[indexes - 1] + * should be less than table_start, and again the difference + * shouls be SQUASHFS_METADATA_SIZE or less. + * + * Finally xattr_table_start should be less than table[0]. + */ + for (n = 0; n < (indexes - 1); n++) { + start = le64_to_cpu(table[n]); + end = le64_to_cpu(table[n + 1]); - return squashfs_read_table(sb, start + sizeof(*id_table), len); + if (start >= end || (end - start) > SQUASHFS_METADATA_SIZE) { + kfree(table); + return ERR_PTR(-EINVAL); + } + } + + start = le64_to_cpu(table[indexes - 1]); + if (start >= table_start || (table_start - start) > SQUASHFS_METADATA_SIZE) { + kfree(table); + return ERR_PTR(-EINVAL); + } + + if (*xattr_table_start >= le64_to_cpu(table[0])) { + kfree(table); + return ERR_PTR(-EINVAL); + } + + return table; } From 1cc4cdb521f9689183474bc89eefc451ac44fa1c Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Tue, 9 Feb 2021 13:42:03 -0800 Subject: [PATCH 275/284] kasan: fix stack traces dependency for HW_TAGS Currently, whether the alloc/free stack traces collection is enabled by default for hardware tag-based KASAN depends on CONFIG_DEBUG_KERNEL. The intention for this dependency was to only enable collection on slow debug kernels due to a significant perf and memory impact. As it turns out, CONFIG_DEBUG_KERNEL is not considered a debug option and is enabled on many productions kernels including Android and Ubuntu. As the result, this dependency is pointless and only complicates the code and documentation. Having stack traces collection disabled by default would make the hardware mode work differently to to the software ones, which is confusing. This change removes the dependency and enables stack traces collection by default. Looking into the future, this default might makes sense for production kernels, assuming we implement a fast stack trace collection approach. Link: https://lkml.kernel.org/r/6678d77ceffb71f1cff2cf61560e2ffe7bb6bfe9.1612808820.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Reviewed-by: Marco Elver Cc: Catalin Marinas Cc: Vincenzo Frascino Cc: Dmitry Vyukov Cc: Alexander Potapenko Cc: Will Deacon Cc: Andrey Ryabinin Cc: Peter Collingbourne Cc: Evgenii Stepanov Cc: Branislav Rankov Cc: Kevin Brodsky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/dev-tools/kasan.rst | 3 +-- mm/kasan/hw_tags.c | 8 ++------ 2 files changed, 3 insertions(+), 8 deletions(-) diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst index 1651d961f06a..a248ac3941be 100644 --- a/Documentation/dev-tools/kasan.rst +++ b/Documentation/dev-tools/kasan.rst @@ -163,8 +163,7 @@ particular KASAN features. - ``kasan=off`` or ``=on`` controls whether KASAN is enabled (default: ``on``). - ``kasan.stacktrace=off`` or ``=on`` disables or enables alloc and free stack - traces collection (default: ``on`` for ``CONFIG_DEBUG_KERNEL=y``, otherwise - ``off``). + traces collection (default: ``on``). - ``kasan.fault=report`` or ``=panic`` controls whether to only print a KASAN report or also panic the kernel (default: ``report``). diff --git a/mm/kasan/hw_tags.c b/mm/kasan/hw_tags.c index e529428e7a11..d558799b25b3 100644 --- a/mm/kasan/hw_tags.c +++ b/mm/kasan/hw_tags.c @@ -134,12 +134,8 @@ void __init kasan_init_hw_tags(void) switch (kasan_arg_stacktrace) { case KASAN_ARG_STACKTRACE_DEFAULT: - /* - * Default to enabling stack trace collection for - * debug kernels. - */ - if (IS_ENABLED(CONFIG_DEBUG_KERNEL)) - static_branch_enable(&kasan_flag_stacktrace); + /* Default to enabling stack trace collection. */ + static_branch_enable(&kasan_flag_stacktrace); break; case KASAN_ARG_STACKTRACE_OFF: /* Do nothing, kasan_flag_stacktrace keeps its default value. */ From 793f49a87aae24e5bcf92ad98d764153fc936570 Mon Sep 17 00:00:00 2001 From: Fangrui Song Date: Tue, 9 Feb 2021 13:42:07 -0800 Subject: [PATCH 276/284] firmware_loader: align .builtin_fw to 8 arm64 references the start address of .builtin_fw (__start_builtin_fw) with a pair of R_AARCH64_ADR_PREL_PG_HI21/R_AARCH64_LDST64_ABS_LO12_NC relocations. The compiler is allowed to emit the R_AARCH64_LDST64_ABS_LO12_NC relocation because struct builtin_fw in include/linux/firmware.h is 8-byte aligned. The R_AARCH64_LDST64_ABS_LO12_NC relocation requires the address to be a multiple of 8, which may not be the case if .builtin_fw is empty. Unconditionally align .builtin_fw to fix the linker error. 32-bit architectures could use ALIGN(4) but that would add unnecessary complexity, so just use ALIGN(8). Link: https://lkml.kernel.org/r/20201208054646.2913063-1-maskray@google.com Link: https://github.com/ClangBuiltLinux/linux/issues/1204 Fixes: 5658c76 ("firmware: allow firmware files to be built into kernel image") Signed-off-by: Fangrui Song Reported-by: kernel test robot Acked-by: Arnd Bergmann Reviewed-by: Nick Desaulniers Tested-by: Nick Desaulniers Tested-by: Douglas Anderson Acked-by: Nathan Chancellor Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-generic/vmlinux.lds.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index b2b3d81b1535..b97c628ad91f 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -459,7 +459,7 @@ } \ \ /* Built-in firmware blobs */ \ - .builtin_fw : AT(ADDR(.builtin_fw) - LOAD_OFFSET) { \ + .builtin_fw : AT(ADDR(.builtin_fw) - LOAD_OFFSET) ALIGN(8) { \ __start_builtin_fw = .; \ KEEP(*(.builtin_fw)) \ __end_builtin_fw = .; \ From a30a29091b5a6d4c64b5fc77040720a65e2dd4e6 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 9 Feb 2021 13:42:10 -0800 Subject: [PATCH 277/284] mm/mremap: fix BUILD_BUG_ON() error in get_extent clang can't evaluate this function argument at compile time when the function is not inlined, which leads to a link time failure: ld.lld: error: undefined symbol: __compiletime_assert_414 >>> referenced by mremap.c >>> mremap.o:(get_extent) in archive mm/built-in.a Mark the function as __always_inline to avoid it. Link: https://lkml.kernel.org/r/20201230154104.522605-1-arnd@kernel.org Fixes: 9ad9718bfa41 ("mm/mremap: calculate extent in one place") Signed-off-by: Arnd Bergmann Tested-by: Nick Desaulniers Reviewed-by: Nathan Chancellor Tested-by: Sedat Dilek Cc: Kirill A. Shutemov" Cc: Wei Yang Cc: Vlastimil Babka Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Brian Geffon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/mremap.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/mremap.c b/mm/mremap.c index f554320281cc..aa63bfd3cad2 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -336,8 +336,9 @@ enum pgt_entry { * valid. Else returns a smaller extent bounded by the end of the source and * destination pgt_entry. */ -static unsigned long get_extent(enum pgt_entry entry, unsigned long old_addr, - unsigned long old_end, unsigned long new_addr) +static __always_inline unsigned long get_extent(enum pgt_entry entry, + unsigned long old_addr, unsigned long old_end, + unsigned long new_addr) { unsigned long next, extent, mask, size; From b85a7a8bb5736998b8a681937a9749b350c17988 Mon Sep 17 00:00:00 2001 From: Seth Forshee Date: Tue, 9 Feb 2021 13:42:14 -0800 Subject: [PATCH 278/284] tmpfs: disallow CONFIG_TMPFS_INODE64 on s390 Currently there is an assumption in tmpfs that 64-bit architectures also have a 64-bit ino_t. This is not true on s390 which has a 32-bit ino_t. With CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and display "inode64" in the mount options, but passing the "inode64" mount option will fail. This leads to the following behavior: # mkdir mnt # mount -t tmpfs nodev mnt # mount -o remount,rw mnt mount: /home/ubuntu/mnt: mount point not mounted or bad option. As mount sees "inode64" in the mount options and thus passes it in the options for the remount. So prevent CONFIG_TMPFS_INODE64 from being selected on s390. Link: https://lkml.kernel.org/r/20210205230620.518245-1-seth.forshee@canonical.com Fixes: ea3271f7196c ("tmpfs: support 64-bit inums per-sb") Signed-off-by: Seth Forshee Acked-by: Hugh Dickins Cc: Chris Down Cc: Hugh Dickins Cc: Amir Goldstein Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Christian Borntraeger Cc: [5.9+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/Kconfig b/fs/Kconfig index aa4c12282301..3347ec7bd837 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -203,7 +203,7 @@ config TMPFS_XATTR config TMPFS_INODE64 bool "Use 64-bit ino_t by default in tmpfs" - depends on TMPFS && 64BIT + depends on TMPFS && 64BIT && !S390 default n help tmpfs has historically used only inode numbers as wide as an unsigned From ad69c389ec110ea54f8b0c0884b255340ef1c736 Mon Sep 17 00:00:00 2001 From: Seth Forshee Date: Tue, 9 Feb 2021 13:42:17 -0800 Subject: [PATCH 279/284] tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha As with s390, alpha is a 64-bit architecture with a 32-bit ino_t. With CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and display "inode64" in the mount options, whereas passing "inode64" in the mount options will fail. This leads to erroneous behaviours such as this: # mkdir mnt # mount -t tmpfs nodev mnt # mount -o remount,rw mnt mount: /home/ubuntu/mnt: mount point not mounted or bad option. Prevent CONFIG_TMPFS_INODE64 from being selected on alpha. Link: https://lkml.kernel.org/r/20210208215726.608197-1-seth.forshee@canonical.com Fixes: ea3271f7196c ("tmpfs: support 64-bit inums per-sb") Signed-off-by: Seth Forshee Acked-by: Hugh Dickins Cc: Chris Down Cc: Amir Goldstein Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Cc: [5.9+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/Kconfig b/fs/Kconfig index 3347ec7bd837..da524c4d7b7e 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -203,7 +203,7 @@ config TMPFS_XATTR config TMPFS_INODE64 bool "Use 64-bit ino_t by default in tmpfs" - depends on TMPFS && 64BIT && !S390 + depends on TMPFS && 64BIT && !(S390 || ALPHA) default n help tmpfs has historically used only inode numbers as wide as an unsigned From d52db800846f66d98a4e14c39cf88a06bcd9985f Mon Sep 17 00:00:00 2001 From: Rong Chen Date: Tue, 9 Feb 2021 13:42:21 -0800 Subject: [PATCH 280/284] selftests/vm: rename file run_vmtests to run_vmtests.sh Commit c2aa8afc36fa has renamed run_vmtests in Makefile, but the file still uses the old name. The kernel test robot reported the following issue: # selftests: vm: run_vmtests.sh # Warning: file run_vmtests.sh is missing! not ok 1 selftests: vm: run_vmtests.sh Link: https://lkml.kernel.org/r/20210205085507.1479894-1-rong.a.chen@intel.com Fixes: c2aa8afc36fa (selftests/vm: rename run_vmtests --> run_vmtests.sh) Signed-off-by: Rong Chen Reported-by: kernel test robot Reviewed-by: John Hubbard Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- tools/testing/selftests/vm/{run_vmtests => run_vmtests.sh} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename tools/testing/selftests/vm/{run_vmtests => run_vmtests.sh} (100%) diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests.sh similarity index 100% rename from tools/testing/selftests/vm/run_vmtests rename to tools/testing/selftests/vm/run_vmtests.sh From a0c2eb0a4387322ebc629c01f5adb2d957c343fe Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Tue, 9 Feb 2021 13:42:24 -0800 Subject: [PATCH 281/284] MAINTAINERS: update Andrey Ryabinin's email address Update my email, @virtuozzo.com will stop working shortly. Link: https://lkml.kernel.org/r/20210204223904.3824-1-ryabinin.a.a@gmail.com Signed-off-by: Andrey Ryabinin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- .mailmap | 1 + MAINTAINERS | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/.mailmap b/.mailmap index d674968df008..7fdf87b24fe8 100644 --- a/.mailmap +++ b/.mailmap @@ -37,6 +37,7 @@ Andrew Murray Andrew Murray Andrew Vasquez Andrey Ryabinin +Andrey Ryabinin Andy Adamson Antoine Tenart Antoine Tenart diff --git a/MAINTAINERS b/MAINTAINERS index 667d03852191..64c7169db617 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9559,7 +9559,7 @@ F: Documentation/hwmon/k8temp.rst F: drivers/hwmon/k8temp.c KASAN -M: Andrey Ryabinin +M: Andrey Ryabinin R: Alexander Potapenko R: Dmitry Vyukov L: kasan-dev@googlegroups.com From e82553c10b0899994153f9bf0af333c0a1550fd7 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Tue, 9 Feb 2021 13:42:28 -0800 Subject: [PATCH 282/284] Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This reverts commit 536d3bf261a2fc3b05b3e91e7eef7383443015cf, as it can cause writers to memory.high to get stuck in the kernel forever, performing page reclaim and consuming excessive amounts of CPU cycles. Before the patch, a write to memory.high would first put the new limit in place for the workload, and then reclaim the requested delta. After the patch, the kernel tries to reclaim the delta before putting the new limit into place, in order to not overwhelm the workload with a sudden, large excess over the limit. However, if reclaim is actively racing with new allocations from the uncurbed workload, it can keep the write() working inside the kernel indefinitely. This is causing problems in Facebook production. A privileged system-level daemon that adjusts memory.high for various workloads running on a host can get unexpectedly stuck in the kernel and essentially turn into a sort of involuntary kswapd for one of the workloads. We've observed that daemon busy-spin in a write() for minutes at a time, neglecting its other duties on the system, and expending privileged system resources on behalf of a workload. To remedy this, we have first considered changing the reclaim logic to break out after a couple of loops - whether the workload has converged to the new limit or not - and bound the write() call this way. However, the root cause that inspired the sequence change in the first place has been fixed through other means, and so a revert back to the proven limit-setting sequence, also used by memory.max, is preferable. The sequence was changed to avoid extreme latencies in the workload when the limit was lowered: the sudden, large excess created by the limit lowering would erroneously trigger the penalty sleeping code that is meant to throttle excessive growth from below. Allocating threads could end up sleeping long after the write() had already reclaimed the delta for which they were being punished. However, erroneous throttling also caused problems in other scenarios at around the same time. This resulted in commit b3ff92916af3 ("mm, memcg: reclaim more aggressively before high allocator throttling"), included in the same release as the offending commit. When allocating threads now encounter large excess caused by a racing write() to memory.high, instead of entering punitive sleeps, they will simply be tasked with helping reclaim down the excess, and will be held no longer than it takes to accomplish that. This is in line with regular limit enforcement - i.e. if the workload allocates up against or over an otherwise unchanged limit from below. With the patch breaking userspace, and the root cause addressed by other means already, revert it again. Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org Fixes: 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering memory.high") Signed-off-by: Johannes Weiner Reported-by: Tejun Heo Acked-by: Chris Down Acked-by: Michal Hocko Cc: Roman Gushchin Cc: Shakeel Butt Cc: Michal Koutný Cc: [5.8+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memcontrol.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e2de77b5bcc2..913c2b9e5c72 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6271,6 +6271,8 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, if (err) return err; + page_counter_set_high(&memcg->memory, high); + for (;;) { unsigned long nr_pages = page_counter_read(&memcg->memory); unsigned long reclaimed; @@ -6294,10 +6296,7 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, break; } - page_counter_set_high(&memcg->memory, high); - memcg_wb_domain_size_changed(memcg); - return nbytes; } From 3286222fc609dea27bd16ac02c55d3f1c3190063 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Tue, 9 Feb 2021 13:42:32 -0800 Subject: [PATCH 283/284] mm, slub: better heuristic for number of cpus when calculating slab order When creating a new kmem cache, SLUB determines how large the slab pages will based on number of inputs, including the number of CPUs in the system. Larger slab pages mean that more objects can be allocated/free from per-cpu slabs before accessing shared structures, but also potentially more memory can be wasted due to low slab usage and fragmentation. The rough idea of using number of CPUs is that larger systems will be more likely to benefit from reduced contention, and also should have enough memory to spare. Number of CPUs used to be determined as nr_cpu_ids, which is number of possible cpus, but on some systems many will never be onlined, thus commit 045ab8c9487b ("mm/slub: let number of online CPUs determine the slub page order") changed it to nr_online_cpus(). However, for kmem caches created early before CPUs are onlined, this may lead to permamently low slab page sizes. Vincent reports a regression [1] of hackbench on arm64 systems: "I'm facing significant performances regression on a large arm64 server system (224 CPUs). Regressions is also present on small arm64 system (8 CPUs) but in a far smaller order of magnitude On 224 CPUs system : 9 iterations of hackbench -l 16000 -g 16 v5.11-rc4 : 9.135sec (+/- 0.45%) v5.11-rc4 + revert this patch: 3.173sec (+/- 0.48%) v5.10: 3.136sec (+/- 0.40%)" Mel reports a regression [2] of hackbench on x86_64, with lockstat suggesting page allocator contention: "i.e. the patch incurs a 7% to 32% performance penalty. This bisected cleanly yesterday when I was looking for the regression and then found the thread. Numerous caches change size. For example, kmalloc-512 goes from order-0 (vanilla) to order-2 with the revert. So mostly this is down to the number of times SLUB calls into the page allocator which only caches order-0 pages on a per-cpu basis" Clearly num_online_cpus() doesn't work too early in bootup. We could change the order dynamically in a memory hotplug callback, but runtime order changing for existing kmem caches has been already shown as dangerous, and removed in 32a6f409b693 ("mm, slub: remove runtime allocation order changes"). It could be resurrected in a safe manner with some effort, but to fix the regression we need something simpler. We could use num_present_cpus() that should be the number of physically present CPUs even before they are onlined. That would work for PowerPC [3], which triggered the original commit, but that still doesn't work on arm64 [4] as explained in [5]. So this patch tries to determine the best available value without specific arch knowledge. - num_present_cpus() if the number is larger than 1, as that means the arch is likely setting it properly - nr_cpu_ids otherwise This should fix the reported regressions while also keeping the effect of 045ab8c9487b for PowerPC systems. It's possible there are configurations where num_present_cpus() is 1 during boot while nr_cpu_ids is at the same time bloated, so these (if they exist) would keep the large orders based on nr_cpu_ids as was before 045ab8c9487b. [1] https://lore.kernel.org/linux-mm/CAKfTPtA_JgMf_+zdFbcb_V9rM7JBWNPjAz9irgwFj7Rou=xzZg@mail.gmail.com/ [2] https://lore.kernel.org/linux-mm/20210128134512.GF3592@techsingularity.net/ [3] https://lore.kernel.org/linux-mm/20210123051607.GC2587010@in.ibm.com/ [4] https://lore.kernel.org/linux-mm/CAKfTPtAjyVmS5VYvU6DBxg4-JEo5bdmWbngf-03YsY18cmWv_g@mail.gmail.com/ [5] https://lore.kernel.org/linux-mm/20210126230305.GD30941@willie-the-truck/ Link: https://lkml.kernel.org/r/20210208134108.22286-1-vbabka@suse.cz Fixes: 045ab8c9487b ("mm/slub: let number of online CPUs determine the slub page order") Signed-off-by: Vlastimil Babka Reported-by: Vincent Guittot Reported-by: Mel Gorman Tested-by: Mel Gorman Tested-by: Vincent Guittot Cc: Catalin Marinas Cc: Aneesh Kumar K.V Cc: Bharata B Rao Cc: Christoph Lameter Cc: Roman Gushchin Cc: Johannes Weiner Cc: Joonsoo Kim Cc: Jann Horn Cc: Michal Hocko Cc: David Rientjes Cc: Shakeel Butt Cc: Will Deacon Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/slub.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 7ecbbbe5bc0c..b22a4b101c84 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3423,6 +3423,7 @@ static inline int calculate_order(unsigned int size) unsigned int order; unsigned int min_objects; unsigned int max_objects; + unsigned int nr_cpus; /* * Attempt to find best configuration for a slab. This @@ -3433,8 +3434,21 @@ static inline int calculate_order(unsigned int size) * we reduce the minimum objects required in a slab. */ min_objects = slub_min_objects; - if (!min_objects) - min_objects = 4 * (fls(num_online_cpus()) + 1); + if (!min_objects) { + /* + * Some architectures will only update present cpus when + * onlining them, so don't trust the number if it's just 1. But + * we also don't want to use nr_cpu_ids always, as on some other + * architectures, there can be many possible cpus, but never + * onlined. Here we compromise between trying to avoid too high + * order on systems that appear larger than they are, and too + * low order on systems that appear smaller than they are. + */ + nr_cpus = num_present_cpus(); + if (nr_cpus <= 1) + nr_cpus = nr_cpu_ids; + min_objects = 4 * (fls(nr_cpus) + 1); + } max_objects = order_objects(slub_max_order, size); min_objects = min(min_objects, max_objects); From a35d8f016e0b68634035217d06d1c53863456b50 Mon Sep 17 00:00:00 2001 From: Joachim Henke Date: Tue, 9 Feb 2021 13:42:36 -0800 Subject: [PATCH 284/284] nilfs2: make splice write available again Since 5.10, splice() or sendfile() to NILFS2 return EINVAL. This was caused by commit 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops"). This patch initializes the splice_write field in file_operations, like most file systems do, to restore the functionality. Link: https://lkml.kernel.org/r/1612784101-14353-1-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Joachim Henke Signed-off-by: Ryusuke Konishi Tested-by: Ryusuke Konishi Cc: [5.10+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/nilfs2/file.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c index 64bc81363c6c..e1bd592ce700 100644 --- a/fs/nilfs2/file.c +++ b/fs/nilfs2/file.c @@ -141,6 +141,7 @@ const struct file_operations nilfs_file_operations = { /* .release = nilfs_release_file, */ .fsync = nilfs_sync_file, .splice_read = generic_file_splice_read, + .splice_write = iter_file_splice_write, }; const struct inode_operations nilfs_file_inode_operations = {